# Edge GPU Setup Guide ## Overview This guide covers setting up edge GPU optimization for consumer-grade hardware in the AITBC marketplace. ## Prerequisites ### Hardware Requirements - NVIDIA GPU with compute capability 7.0+ (Turing architecture or newer) - Minimum 6GB VRAM for edge optimization - Linux operating system with NVIDIA drivers ### Software Requirements - NVIDIA CUDA Toolkit 11.0+ - Ollama GPU inference engine - Python 3.8+ with required packages ## Installation ### 1. Install NVIDIA Drivers ```bash # Ubuntu/Debian sudo apt update sudo apt install nvidia-driver-470 # Verify installation nvidia-smi ``` ### 2. Install CUDA Toolkit ```bash # Download and install CUDA wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run sudo sh cuda_11.8.0_520.61.05_linux.run # Add to PATH echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc source ~/.bashrc ``` ### 3. Install Ollama ```bash # Install Ollama curl -fsSL https://ollama.ai/install.sh | sh # Start Ollama service sudo systemctl start ollama sudo systemctl enable ollama ``` ### 4. Configure GPU Miner ```bash # Clone and setup AITBC git clone https://github.com/aitbc/aitbc.git cd aitbc # Configure GPU miner cp scripts/gpu/gpu_miner_host.py.example scripts/gpu/gpu_miner_host.py # Edit configuration with your miner credentials ``` ## Configuration ### Edge GPU Optimization Settings ```python # In gpu_miner_host.py EDGE_CONFIG = { "enable_edge_optimization": True, "geographic_region": "us-west", # Your region "latency_target_ms": 50, "power_optimization": True, "thermal_management": True } ``` ### Ollama Model Selection ```bash # Pull edge-optimized models ollama pull llama2:7b # ~4GB, good for edge ollama pull mistral:7b # ~4GB, efficient # List available models ollama list ``` ## Testing ### GPU Discovery Test ```bash # Run GPU discovery python scripts/gpu/gpu_miner_host.py --test-discovery # Expected output: # Discovered GPU: RTX 3060 (Ampere) # Edge optimized: True # Memory: 12GB # Compatible models: llama2:7b, mistral:7b ``` ### Latency Test ```bash # Test geographic latency python scripts/gpu/gpu_miner_host.py --test-latency us-east # Expected output: # Latency to us-east: 45ms # Edge optimization: Enabled ``` ### Inference Test ```bash # Test ML inference python scripts/gpu/gpu_miner_host.py --test-inference # Expected output: # Model: llama2:7b # Inference time: 1.2s # Edge optimized: True # Privacy preserved: True ``` ## Troubleshooting ### Common Issues #### GPU Not Detected ```bash # Check NVIDIA drivers nvidia-smi # Check CUDA installation nvcc --version # Reinstall drivers if needed sudo apt purge nvidia* sudo apt autoremove sudo apt install nvidia-driver-470 ``` #### High Latency - Check network connection - Verify geographic region setting - Consider edge data center proximity #### Memory Issues - Reduce model size (use 7B instead of 13B) - Enable memory optimization in Ollama - Monitor GPU memory usage with nvidia-smi #### Thermal Throttling - Improve GPU cooling - Reduce power consumption settings - Enable thermal management in miner config ## Performance Optimization ### Memory Management ```python # Optimize memory usage OLLAMA_CONFIG = { "num_ctx": 1024, # Reduced context for edge "num_batch": 256, # Smaller batches "num_gpu": 1, # Single GPU for edge "low_vram": True # Enable low VRAM mode } ``` ### Network Optimization ```python # Optimize for edge latency NETWORK_CONFIG = { "use_websockets": True, "compression": True, "batch_size": 10, # Smaller batches for lower latency "heartbeat_interval": 30 } ``` ### Power Management ```python # Power optimization settings POWER_CONFIG = { "max_power_w": 200, # Limit power consumption "thermal_target_c": 75, # Target temperature "auto_shutdown": True # Shutdown when idle } ``` ## Monitoring ### Performance Metrics Monitor key metrics for edge optimization: - GPU utilization (%) - Memory usage (GB) - Power consumption (W) - Temperature (°C) - Network latency (ms) - Inference throughput (tokens/sec) ### Health Checks ```bash # GPU health check nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,memory.used,memory.total --format=csv # Ollama health check curl http://localhost:11434/api/tags # Miner health check python scripts/gpu/gpu_miner_host.py --health-check ``` ## Security Considerations ### GPU Isolation - Run GPU workloads in sandboxed environments - Use NVIDIA MPS for multi-process isolation - Implement resource limits per miner ### Network Security - Use TLS encryption for all communications - Implement API rate limiting - Monitor for unauthorized access attempts ### Privacy Protection - Ensure ZK proofs protect model inputs - Use FHE for sensitive data processing - Implement audit logging for all operations