- Bump minimum Python version from 3.11 to 3.13 across all apps - Add Python 3.11-3.13 test matrix to CLI workflow - Document Python 3.11+ requirement in .env.example - Fix Starlette Broadcast removal with in-process fallback implementation - Add _InProcessBroadcast class for tests when Starlette Broadcast is unavailable - Refactor API key validators to read live settings instead of cached values - Update database models with explicit
2.1 KiB
2.1 KiB
GPU Acceleration Research for ZK Circuits
Current GPU Hardware
- GPU: NVIDIA GeForce RTX 4060 Ti
- Memory: 16GB GDDR6
- CUDA Capability: 8.9 (Ada Lovelace architecture)
Potential GPU-Accelerated ZK Libraries
1. Halo2 (Recommended)
- Language: Rust
- GPU Support: Native CUDA acceleration
- Features:
- Lookup tables for efficient constraints
- Recursive proofs
- Multi-party computation support
- Production-ready for complex circuits
2. Arkworks
- Language: Rust
- GPU Support: Limited, but extensible
- Features:
- Modular architecture
- Multiple proof systems (Groth16, Plonk)
- Active ecosystem development
3. Plonk Variants
- Language: Rust/Zig
- GPU Support: Some implementations available
- Features:
- Efficient for large circuits
- Better constant overhead than Groth16
4. Custom CUDA Implementation
- Approach: Direct CUDA kernels for ZK operations
- Complexity: High development effort
- Benefits: Maximum performance optimization
Implementation Strategy
Phase 1: Research & Prototyping
- Set up Rust development environment
- Install Halo2 and benchmark basic operations
- Compare performance vs current CPU implementation
- Identify integration points with existing Circom circuits
Phase 2: Integration
- Create Rust bindings for existing circuits
- Implement GPU-accelerated proof generation
- Benchmark compilation speed improvements
- Test with modular ML circuits
Phase 3: Optimization
- Fine-tune CUDA kernels for ZK operations
- Implement batched proof generation
- Add support for recursive proofs
- Establish production deployment pipeline
Expected Performance Gains
- Circuit compilation: 5-10x speedup
- Proof generation: 3-5x speedup
- Memory efficiency: Better utilization of GPU resources
- Scalability: Support for larger, more complex circuits
Next Steps
- Install Rust and CUDA toolkit
- Set up Halo2 development environment
- Create performance baseline with current CPU implementation
- Begin prototyping GPU-accelerated proof generation