Files
aitbc/docs/6_architecture/edge_gpu_setup.md
AITBC System b033923756 chore: normalize file permissions across repository
- Remove executable permissions from configuration files (.editorconfig, .env.example, .gitignore)
- Remove executable permissions from documentation files (README.md, LICENSE, SECURITY.md)
- Remove executable permissions from web assets (HTML, CSS, JS files)
- Remove executable permissions from data files (JSON, SQL, YAML, requirements.txt)
- Remove executable permissions from source code files across all apps
- Add executable permissions to Python
2026-03-08 11:26:18 +01:00

4.9 KiB

Edge GPU Setup Guide

Overview

This guide covers setting up edge GPU optimization for consumer-grade hardware in the AITBC marketplace.

Prerequisites

Hardware Requirements

  • NVIDIA GPU with compute capability 7.0+ (Turing architecture or newer)
  • Minimum 6GB VRAM for edge optimization
  • Linux operating system with NVIDIA drivers

Software Requirements

  • NVIDIA CUDA Toolkit 11.0+
  • Ollama GPU inference engine
  • Python 3.8+ with required packages

Installation

1. Install NVIDIA Drivers

# Ubuntu/Debian
sudo apt update
sudo apt install nvidia-driver-470

# Verify installation
nvidia-smi

2. Install CUDA Toolkit

# Download and install CUDA
wget https://developer.download.nvidia.com/compute/cuda/11.8.0/local_installers/cuda_11.8.0_520.61.05_linux.run
sudo sh cuda_11.8.0_520.61.05_linux.run

# Add to PATH
echo 'export PATH=/usr/local/cuda/bin${PATH:+:${PATH}}' >> ~/.bashrc
echo 'export LD_LIBRARY_PATH=/usr/local/cuda/lib64${LD_LIBRARY_PATH:+:${LD_LIBRARY_PATH}}' >> ~/.bashrc
source ~/.bashrc

3. Install Ollama

# Install Ollama
curl -fsSL https://ollama.ai/install.sh | sh

# Start Ollama service
sudo systemctl start ollama
sudo systemctl enable ollama

4. Configure GPU Miner

# Clone and setup AITBC
git clone https://github.com/aitbc/aitbc.git
cd aitbc

# Configure GPU miner
cp scripts/gpu/gpu_miner_host.py.example scripts/gpu/gpu_miner_host.py
# Edit configuration with your miner credentials

Configuration

Edge GPU Optimization Settings

# In gpu_miner_host.py
EDGE_CONFIG = {
    "enable_edge_optimization": True,
    "geographic_region": "us-west",  # Your region
    "latency_target_ms": 50,
    "power_optimization": True,
    "thermal_management": True
}

Ollama Model Selection

# Pull edge-optimized models
ollama pull llama2:7b  # ~4GB, good for edge
ollama pull mistral:7b # ~4GB, efficient

# List available models
ollama list

Testing

GPU Discovery Test

# Run GPU discovery
python scripts/gpu/gpu_miner_host.py --test-discovery

# Expected output:
# Discovered GPU: RTX 3060 (Ampere)
# Edge optimized: True
# Memory: 12GB
# Compatible models: llama2:7b, mistral:7b

Latency Test

# Test geographic latency
python scripts/gpu/gpu_miner_host.py --test-latency us-east

# Expected output:
# Latency to us-east: 45ms
# Edge optimization: Enabled

Inference Test

# Test ML inference
python scripts/gpu/gpu_miner_host.py --test-inference

# Expected output:
# Model: llama2:7b
# Inference time: 1.2s
# Edge optimized: True
# Privacy preserved: True

Troubleshooting

Common Issues

GPU Not Detected

# Check NVIDIA drivers
nvidia-smi

# Check CUDA installation
nvcc --version

# Reinstall drivers if needed
sudo apt purge nvidia*
sudo apt autoremove
sudo apt install nvidia-driver-470

High Latency

  • Check network connection
  • Verify geographic region setting
  • Consider edge data center proximity

Memory Issues

  • Reduce model size (use 7B instead of 13B)
  • Enable memory optimization in Ollama
  • Monitor GPU memory usage with nvidia-smi

Thermal Throttling

  • Improve GPU cooling
  • Reduce power consumption settings
  • Enable thermal management in miner config

Performance Optimization

Memory Management

# Optimize memory usage
OLLAMA_CONFIG = {
    "num_ctx": 1024,      # Reduced context for edge
    "num_batch": 256,     # Smaller batches
    "num_gpu": 1,         # Single GPU for edge
    "low_vram": True      # Enable low VRAM mode
}

Network Optimization

# Optimize for edge latency
NETWORK_CONFIG = {
    "use_websockets": True,
    "compression": True,
    "batch_size": 10,     # Smaller batches for lower latency
    "heartbeat_interval": 30
}

Power Management

# Power optimization settings
POWER_CONFIG = {
    "max_power_w": 200,   # Limit power consumption
    "thermal_target_c": 75,  # Target temperature
    "auto_shutdown": True    # Shutdown when idle
}

Monitoring

Performance Metrics

Monitor key metrics for edge optimization:

  • GPU utilization (%)
  • Memory usage (GB)
  • Power consumption (W)
  • Temperature (°C)
  • Network latency (ms)
  • Inference throughput (tokens/sec)

Health Checks

# GPU health check
nvidia-smi --query-gpu=temperature.gpu,utilization.gpu,memory.used,memory.total --format=csv

# Ollama health check
curl http://localhost:11434/api/tags

# Miner health check
python scripts/gpu/gpu_miner_host.py --health-check

Security Considerations

GPU Isolation

  • Run GPU workloads in sandboxed environments
  • Use NVIDIA MPS for multi-process isolation
  • Implement resource limits per miner

Network Security

  • Use TLS encryption for all communications
  • Implement API rate limiting
  • Monitor for unauthorized access attempts

Privacy Protection

  • Ensure ZK proofs protect model inputs
  • Use FHE for sensitive data processing
  • Implement audit logging for all operations