aitbc/.windsurf/skills/ollama-gpu-testing-skill.md

---
description: Atomic Ollama GPU inference testing with deterministic performance validation and benchmarking
title: ollama-gpu-testing-skill
version: 1.1
---

# Ollama GPU Testing Skill

## Purpose
Test and validate Ollama GPU inference performance, GPU provider integration, payment processing, and blockchain recording with deterministic benchmarking metrics.

## Activation
Trigger when user requests Ollama GPU testing: inference performance validation, GPU provider testing, payment processing validation, or end-to-end workflow testing.

## Input
```json
{
  "operation": "test-gpu-inference|test-payment-processing|test-blockchain-recording|test-end-to-end|comprehensive",
  "model_name": "string (optional, default: llama2)",
  "test_prompt": "string (optional for inference testing)",
  "test_wallet": "string (optional, default: test-client)",
  "payment_amount": "number (optional, default: 100)",
  "gpu_provider": "string (optional, default: aitbc-host-gpu-miner)",
  "benchmark_duration": "number (optional, default: 30 seconds)",
  "inference_count": "number (optional, default: 5)"
}
```

## Output
```json
{
  "summary": "Ollama GPU testing completed successfully",
  "operation": "test-gpu-inference|test-payment-processing|test-blockchain-recording|test-end-to-end|comprehensive",
  "test_results": {
    "gpu_inference": "boolean",
    "payment_processing": "boolean",
    "blockchain_recording": "boolean",
    "end_to_end_workflow": "boolean"
  },
  "inference_metrics": {
    "model_name": "string",
    "inference_time": "number",
    "tokens_per_second": "number",
    "gpu_utilization": "number",
    "memory_usage": "number",
    "inference_success_rate": "number"
  },
  "payment_details": {
    "wallet_balance_before": "number",
    "payment_amount": "number",
    "payment_status": "success|failed",
    "transaction_id": "string",
    "miner_payout": "number"
  },
  "blockchain_details": {
    "transaction_recorded": "boolean",
    "block_height": "number",
    "confirmations": "number",
    "recording_time": "number"
  },
  "gpu_provider_status": {
    "provider_online": "boolean",
    "gpu_available": "boolean",
    "provider_response_time": "number",
    "service_health": "boolean"
  },
  "issues": [],
  "recommendations": [],
  "confidence": 1.0,
  "execution_time": "number",
  "validation_status": "success|partial|failed"
}
```

## Process

### 1. Analyze
- Validate GPU testing parameters and operation type
- Check Ollama service availability and GPU status
- Verify wallet balance for payment processing
- Assess GPU provider availability and health

### 2. Plan
- Prepare GPU inference testing scenarios
- Define payment processing validation criteria
- Set blockchain recording verification strategy
- Configure end-to-end workflow testing

### 3. Execute
- Test Ollama GPU inference performance and benchmarks
- Validate payment processing and wallet transactions
- Verify blockchain recording and transaction confirmation
- Test complete end-to-end workflow integration

### 4. Validate
- Verify GPU inference performance metrics
- Check payment processing success and miner payouts
- Validate blockchain recording and transaction confirmation
- Confirm end-to-end workflow integration and performance

## Constraints
- **MUST NOT** submit inference jobs without sufficient wallet balance
- **MUST** validate Ollama service availability before testing
- **MUST** monitor GPU utilization during inference testing
- **MUST** handle payment processing failures gracefully
- **MUST** verify blockchain recording completion
- **MUST** provide deterministic performance benchmarks

## Environment Assumptions
- Ollama service running on port 11434
- GPU provider service operational (aitbc-host-gpu-miner)
- AITBC CLI accessible for payment and blockchain operations
- Test wallets configured with sufficient balance
- GPU resources available for inference testing

## Error Handling
- Ollama service unavailable → Return service status and restart recommendations
- GPU provider offline → Return provider status and troubleshooting steps
- Payment processing failures → Return payment diagnostics and wallet status
- Blockchain recording failures → Return blockchain status and verification steps

## Example Usage Prompt

```
Run comprehensive Ollama GPU testing including inference performance, payment processing, blockchain recording, and end-to-end workflow validation
```

## Expected Output Example

```json
{
  "summary": "Comprehensive Ollama GPU testing completed with optimal performance metrics",
  "operation": "comprehensive",
  "test_results": {
    "gpu_inference": true,
    "payment_processing": true,
    "blockchain_recording": true,
    "end_to_end_workflow": true
  },
  "inference_metrics": {
    "model_name": "llama2",
    "inference_time": 2.3,
    "tokens_per_second": 45.2,
    "gpu_utilization": 78.5,
    "memory_usage": 4.2,
    "inference_success_rate": 100.0
  },
  "payment_details": {
    "wallet_balance_before": 1000.0,
    "payment_amount": 100.0,
    "payment_status": "success",
    "transaction_id": "tx_7f8a9b2c3d4e5f6",
    "miner_payout": 95.0
  },
  "blockchain_details": {
    "transaction_recorded": true,
    "block_height": 12345,
    "confirmations": 1,
    "recording_time": 5.2
  },
  "gpu_provider_status": {
    "provider_online": true,
    "gpu_available": true,
    "provider_response_time": 1.2,
    "service_health": true
  },
  "issues": [],
  "recommendations": ["GPU inference optimal", "Payment processing efficient", "Blockchain recording reliable"],
  "confidence": 1.0,
  "execution_time": 67.8,
  "validation_status": "success"
}
```

## Model Routing Suggestion

**Fast Model** (Claude Haiku, GPT-3.5-turbo)
- Basic GPU availability checking
- Simple inference performance testing
- Quick service health validation

**Reasoning Model** (Claude Sonnet, GPT-4)
- Comprehensive GPU benchmarking and performance analysis
- Payment processing validation and troubleshooting
- End-to-end workflow integration testing
- Complex GPU optimization recommendations

**Coding Model** (Claude Sonnet, GPT-4)
- GPU performance optimization algorithms
- Inference parameter tuning
- Benchmark analysis and improvement strategies

## Performance Notes
- **Execution Time**: 10-30 seconds for basic tests, 60-120 seconds for comprehensive testing
- **Memory Usage**: <300MB for GPU testing operations
- **Network Requirements**: Ollama service, GPU provider, blockchain RPC connectivity
- **Concurrency**: Safe for multiple simultaneous GPU tests with different models
- **Benchmarking**: Real-time performance metrics and optimization recommendations