Files
aitbc/apps/gpu-service
aitbc 4a1e39dd3c
Some checks failed
Cross-Node Transaction Testing / transaction-test (push) Has been cancelled
Deploy to Testnet / deploy-testnet (push) Has been cancelled
Integration Tests / test-service-integration (push) Has been cancelled
Multi-Node Stress Testing / stress-test (push) Has been cancelled
Python Tests / test-python (push) Has been cancelled
Security Scanning / security-scan (push) Has been cancelled
Fix async issues in seed_profiles and resolve GPU service port conflict
- Make seed_profiles async to match AsyncSession and await session operations
- Disable aitbc-gpu.service to resolve port 8101 conflict with gpu-service
- GPU service now runs successfully without coroutine warnings
2026-05-14 18:38:35 +02:00
..

AITBC GPU Service

Manages GPU resource operations.

Installation

cd /opt/aitbc
poetry install --with gpu-service

Database Setup

Create a separate database for the GPU service:

sudo -u postgres psql -f apps/gpu-service/scripts/setup-database.sql

Or manually:

CREATE DATABASE aitbc_gpu;
CREATE USER aitbc_gpu WITH PASSWORD 'password';
GRANT ALL PRIVILEGES ON DATABASE aitbc_gpu TO aitbc_gpu;

Running

# Development
python -m gpu_service.main

# Production (systemd)
sudo systemctl start gpu-service
sudo systemctl enable gpu-service

Endpoints

  • GET /health - Health check
  • GET /gpu/status - Get GPU status
  • GET /v1/marketplace/edge-gpu/profiles - Get consumer GPU profiles
  • GET /v1/marketplace/edge-gpu/metrics/{gpu_id} - Get edge GPU metrics
  • POST /v1/marketplace/edge-gpu/scan/{miner_id} - Scan and register edge GPUs
  • POST /v1/marketplace/edge-gpu/optimize/inference/{gpu_id} - Optimize ML inference

Testing

Prerequisites

  • PostgreSQL running and aitbc_gpu database created
  • Poetry dependencies installed

Database Setup

sudo -u postgres psql -f scripts/setup-database.sql

Start Service (Development)

python -m gpu_service.main

Health Check

curl http://localhost:8101/health

Expected response:

{"status": "healthy", "service": "gpu-service"}

GPU Status

curl http://localhost:8101/gpu/status

Expected response:

{
  "status": "operational",
  "service": "gpu-service",
  "message": "GPU service is running"
}

Get Consumer GPU Profiles

curl http://localhost:8101/v1/marketplace/edge-gpu/profiles

Expected response:

[
  {
    "profile_id": "consumer_nvidia_a100",
    "name": "NVIDIA A100",
    "architecture": "NVIDIA",
    "memory_gb": 80,
    "cuda_cores": 6912,
    "tensor_cores": 432,
    "compute_capability": "8.0",
    "typical_use_cases": ["ml_training", "inference", "hpc"]
  }
]

Test Through Gateway

  1. Start the API gateway:

    python -m api_gateway.main
    
  2. Test GPU endpoints through the gateway:

    curl http://localhost:8080/gpu/health
    curl http://localhost:8080/gpu/v1/marketplace/edge-gpu/profiles
    

For comprehensive testing procedures, see MICROSERVICES_TESTING_GUIDE.md.

Service Configuration

  • Port: 8101
  • Database: aitbc_gpu
  • Gateway route: /gpu/*

Migration Status

Completed:

  • Extracted GPU domain models (GPUArchitecture, GPURegistry, ConsumerGPUProfile, EdgeGPUMetrics, GPUBooking, GPUReview)
  • Extracted GPU services (EdgeGPUService)
  • Extracted GPU data (consumer_gpu_profiles)
  • Set up database session management
  • Extracted GPU router endpoints
  • Removed edge_gpu router from coordinator-api
  • Created systemd service configuration
  • Created database setup script

Remaining:

  • Extract additional GPU routers (gpu_multimodal_health.py, miner.py) if needed
  • Run database migration script to create aitbc_gpu database
  • Install and enable systemd service
  • End-to-end testing with gateway