Files
aitbc/docs/troubleshooting/comprehensive-guide.md
aitbc e4f1a96172
Some checks failed
Blockchain Synchronization Verification / sync-verification (push) Failing after 8s
CLI Tests / test-cli (push) Successful in 10s
Contract Performance Benchmarks / benchmark-gas-usage (push) Successful in 1m22s
Contract Performance Benchmarks / benchmark-execution-time (push) Successful in 1m11s
Contract Performance Benchmarks / benchmark-throughput (push) Successful in 1m13s
Cross-Chain Functionality Tests / test-cross-chain-sync (push) Failing after 5s
Cross-Chain Functionality Tests / test-cross-chain-transactions (push) Successful in 5s
Cross-Chain Functionality Tests / test-cross-chain-bridge (push) Has been skipped
Cross-Chain Functionality Tests / test-multi-chain-consensus (push) Failing after 3s
Cross-Chain Functionality Tests / aggregate-results (push) Has been skipped
Cross-Node Transaction Testing / transaction-test (push) Successful in 5s
Deploy to Testnet / deploy-testnet (push) Successful in 1m14s
Contract Performance Benchmarks / compare-benchmarks (push) Has been cancelled
Documentation Validation / validate-docs (push) Failing after 10s
Multi-Node Stress Testing / stress-test (push) Has been cancelled
Node Failover Simulation / failover-test (push) Has been cancelled
Security Scanning / security-scan (push) Has been cancelled
Smart Contract Tests / test-solidity (map[name:aitbc-contracts path:contracts]) (push) Has been cancelled
Smart Contract Tests / test-solidity (map[name:aitbc-token path:packages/solidity/aitbc-token]) (push) Has been cancelled
Smart Contract Tests / test-foundry (push) Has been cancelled
Smart Contract Tests / lint-solidity (push) Has been cancelled
Smart Contract Tests / deploy-contracts (push) Has been cancelled
Documentation Validation / validate-policies-strict (push) Successful in 3s
Integration Tests / test-service-integration (push) Failing after 45s
Multi-Chain Island Architecture Tests / test-multi-chain-island (push) Failing after 2s
Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 5s
P2P Network Verification / p2p-verification (push) Successful in 3s
Production Tests / Production Integration Tests (push) Failing after 7s
Python Tests / test-python (push) Failing after 46s
Staking Tests / test-staking-service (push) Failing after 2s
Staking Tests / test-staking-integration (push) Has been skipped
Staking Tests / test-staking-contract (push) Has been skipped
Staking Tests / run-staking-test-runner (push) Has been skipped
Systemd Sync / sync-systemd (push) Successful in 21s
API Endpoint Tests / test-api-endpoints (push) Failing after 12m19s
ci: standardize pytest invocation and add security scanning
- Changed pytest calls to use `venv/bin/python -m pytest` with explicit config
- Added `--rootdir "$PWD"` and `--import-mode=importlib` for consistent imports
- Fixed PYTHONPATH to use absolute paths with $PWD prefix
- Added smart contract security scanning for Solidity files
- Added Circom circuit security checks for ZK proof circuits
- Added ZK proof implementation security validation
- Added contracts/** to security scanning workflow
2026-05-11 13:46:42 +02:00

17 KiB

Comprehensive Troubleshooting Guide

This guide provides troubleshooting steps for common issues encountered when deploying and operating the AITBC platform.

Table of Contents

General Troubleshooting

Service Won't Start

Symptoms:

  • Service fails to start
  • Systemd service shows "failed" status
  • No logs available

Diagnosis:

# Check service status
sudo systemctl status aitbc-coordinator-api

# Check recent logs
sudo journalctl -u aitbc-coordinator-api -n 50

# Check for errors in logs
sudo journalctl -u aitbc-coordinator-api -f | grep -i error

Solutions:

  1. Check configuration files
# Validate configuration
python -m apps.coordinator_api.main --validate-config
  1. Check port conflicts
# Check if port is in use
sudo netstat -tulpn | grep 8011

# Kill process using the port
sudo kill -9 $(sudo lsof -t -i:8011)
  1. Check permissions
# Check file permissions
ls -la /opt/aitbc

# Fix permissions
sudo chown -R aitbc:aitbc /opt/aitbc
  1. Check dependencies
# Verify Python dependencies
source venv/bin/activate
pip list

# Install missing dependencies
pip install -r requirements.txt

High CPU Usage

Symptoms:

  • Service consuming excessive CPU
  • System sluggish
  • High load averages

Diagnosis:

# Check CPU usage
top -p $(pgrep -f coordinator-api)

# Check process details
ps aux | grep coordinator-api

# Check system load
uptime

Solutions:

  1. Profile the application
# Profile with cProfile
python -m cProfile -o profile.stats apps/coordinator_api/main.py

# Analyze profile
python -m pstats profile.stats
  1. Check for infinite loops
# Monitor process strace
sudo strace -p $(pgrep -f coordinator-api)
  1. Optimize database queries
# Enable query logging
export SQLALCHEMY_ECHO=true

# Analyze slow queries
psql -d aitbc -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"

Memory Leaks

Symptoms:

  • Memory usage increases over time
  • Service crashes with OOM killer
  • Swap usage high

Diagnosis:

# Check memory usage
free -h

# Check process memory
ps aux | grep coordinator-api

# Monitor memory over time
watch -n 1 'free -h'

Solutions:

  1. Check for memory leaks
# Use memory profiler
pip install memory-profiler
python -m memory_profiler apps/coordinator_api/main.py
  1. Check connection pooling
# Reduce pool size
engine = create_engine(
    DATABASE_URL,
    pool_size=5,
    max_overflow=10
)
  1. Restart service periodically
# Add to crontab
0 2 * * * systemctl restart aitbc-coordinator-api

Blockchain Node Issues

Node Won't Sync

Symptoms:

  • Block height not increasing
  • Sync status shows "syncing" indefinitely
  • Peers not connecting

Diagnosis:

# Check sync status
curl http://localhost:8080/v1/network

# Check peer connections
curl http://localhost:8080/v1/network/peers

# Check blockchain logs
sudo journalctl -u aitbc-blockchain -n 50

Solutions:

  1. Add bootstrap peers
# Edit configuration
echo "BOOTSTRAP_PEERS=peer1.example.com:8080,peer2.example.com:8080" >> /etc/aitbc/blockchain.env

# Restart service
sudo systemctl restart aitbc-blockchain
  1. Check network connectivity
# Test peer connectivity
telnet peer.example.com 8080

# Check firewall
sudo ufw status
  1. Reset blockchain state
# Stop service
sudo systemctl stop aitbc-blockchain

# Backup data
mv /var/lib/aitbc/blockchain /var/lib/aitbc/blockchain.backup

# Start service
sudo systemctl start aitbc-blockchain

Fork Detected

Symptoms:

  • Multiple blockchain branches
  • Consensus failures
  • Invalid blocks

Diagnosis:

# Check blockchain height
curl http://localhost:8080/v1/blocks/head

# Check for forks
curl http://localhost:8080/v1/blocks/forks

Solutions:

  1. Choose correct fork
# Revert to correct height
curl -X POST http://localhost:8080/v1/admin/revert \
  -H "Content-Type: application/json" \
  -d '{"height": 12345}'
  1. Restart with clean state
# Stop service
sudo systemctl stop aitbc-blockchain

# Clear blockchain data
rm -rf /var/lib/aitbc/blockchain

# Start service
sudo systemctl start aitbc-blockchain

Coordinator API Issues

500 Internal Server Error

Symptoms:

  • API returns 500 errors
  • Jobs fail to submit
  • Status checks fail

Diagnosis:

# Check API logs
sudo journalctl -u aitbc-coordinator-api -n 100 | grep -i error

# Check database connection
psql -d aitbc -c "SELECT 1;"

# Check health endpoint
curl http://localhost:8011/health

Solutions:

  1. Check database connectivity
# Test database connection
psql -h localhost -U aitbc -d aitbc

# Restart PostgreSQL
sudo systemctl restart postgresql
  1. Check Redis connection
# Test Redis
redis-cli ping

# Restart Redis
sudo systemctl restart redis
  1. Check datetime handling
# Check for datetime comparison errors
# Ensure all datetimes are timezone-aware or offset-naive consistently

Job Stuck in Queued State

Symptoms:

  • Jobs remain in QUEUED state
  • No miners assigned
  • Job expiration

Diagnosis:

# Check job status
curl -H "X-Api-Key: $API_KEY" \
  http://localhost:8011/v1/jobs/{job_id}

# Check miner availability
curl http://localhost:8011/v1/miners

# Check logs
sudo journalctl -u aitbc-coordinator-api -n 50

Solutions:

  1. Check miner registration
# Verify miners are registered
curl http://localhost:8011/v1/miners

# Register miner if needed
curl -X POST http://localhost:8011/v1/miners/register \
  -H "Content-Type: application/json" \
  -d '{"miner_id": "miner-123", "gpu_type": "nvidia-rtx-3090"}'
  1. Check job constraints
# Verify job constraints can be satisfied
curl -H "X-Api-Key: $API_KEY" \
  http://localhost:8011/v1/jobs/{job_id} | jq '.constraints'
  1. Increase job TTL
# Resubmit with longer TTL
curl -X POST http://localhost:8011/v1/jobs \
  -H "Content-Type: application/json" \
  -H "X-Api-Key: $API_KEY" \
  -d '{"payload": {...}, "ttl_seconds": 3600}'

Wallet Daemon Issues

Wallet Not Responding

Symptoms:

  • Wallet daemon unresponsive
  • Transactions not signing
  • Balance not updating

Diagnosis:

# Check wallet daemon status
sudo systemctl status aitbc-wallet

# Check wallet logs
sudo journalctl -u aitbc-wallet -n 50

# Test wallet endpoint
curl http://localhost:8071/health

Solutions:

  1. Check wallet file integrity
# Verify wallet file exists
ls -la /var/lib/aitbc/wallet/

# Check wallet file permissions
chmod 600 /var/lib/aitbc/wallet/wallet.dat
  1. Restart wallet daemon
sudo systemctl restart aitbc-wallet
  1. Check key derivation
# Verify key derivation path
python -c "from aitbc_crypto import Wallet; w = Wallet(); print(w.address)"

Transaction Signing Failed

Symptoms:

  • Transactions fail to sign
  • Invalid signature errors
  • Key not found errors

Diagnosis:

# Check wallet keys
curl http://localhost:8071/v1/keys

# Check transaction logs
sudo journalctl -u aitbc-wallet -n 50 | grep -i transaction

Solutions:

  1. Verify private key
# Check private key exists
ls -la /var/lib/aitbc/wallet/private_key

# Regenerate keys if needed
curl -X POST http://localhost:8071/v1/keys/regenerate
  1. Check key permissions
# Secure private key
chmod 600 /var/lib/aitbc/wallet/private_key
chown aitbc:aitbc /var/lib/aitbc/wallet/private_key

Marketplace Service Issues

Offers Not Matching

Symptoms:

  • GPU offers not matched with jobs
  • Jobs remain unassigned
  • Marketplace not updating

Diagnosis:

# Check marketplace status
curl http://localhost:8102/health

# Check offers
curl http://localhost:8102/v1/offers

# Check matching logs
sudo journalctl -u aitbc-marketplace -n 50

Solutions:

  1. Check offer constraints
# Verify offer constraints
curl http://localhost:8102/v1/offers | jq '.[].constraints'
  1. Restart matching engine
sudo systemctl restart aitbc-marketplace
  1. Clear offer cache
# Clear Redis cache
redis-cli FLUSHALL

# Restart service
sudo systemctl restart aitbc-marketplace

Database Issues

Connection Refused

Symptoms:

  • Database connection errors
  • Service unable to connect to PostgreSQL
  • "Connection refused" messages

Diagnosis:

# Check PostgreSQL status
sudo systemctl status postgresql

# Test connection
psql -h localhost -U aitbc -d aitbc

# Check PostgreSQL logs
sudo tail -f /var/log/postgresql/postgresql-*.log

Solutions:

  1. Restart PostgreSQL
sudo systemctl restart postgresql
  1. Check connection limits
# Check max connections
psql -d aitbc -c "SHOW max_connections;"

# Check active connections
psql -d aitbc -c "SELECT count(*) FROM pg_stat_activity;"
  1. Check firewall
# Check if port 5432 is open
sudo ufw status | grep 5432

# Allow PostgreSQL
sudo ufw allow 5432/tcp

Slow Queries

Symptoms:

  • API responses slow
  • Database CPU high
  • Query timeouts

Diagnosis:

# Enable query logging
psql -d aitbc -c "ALTER SYSTEM SET log_min_duration_statement = 1000;"
sudo systemctl reload postgresql

# Check slow queries
psql -d aitbc -c "SELECT * FROM pg_stat_statements ORDER BY total_time DESC LIMIT 10;"

Solutions:

  1. Add indexes
-- Add index on frequently queried columns
CREATE INDEX idx_job_state ON job(state);
CREATE INDEX idx_job_created_at ON job(created_at);
  1. Optimize queries
-- Use EXPLAIN ANALYZE
EXPLAIN ANALYZE SELECT * FROM job WHERE state = 'QUEUED';
  1. Increase work_mem
-- Increase work_mem for complex queries
ALTER SYSTEM SET work_mem = '256MB';
sudo systemctl reload postgresql

Database Corruption

Symptoms:

  • Data inconsistencies
  • Queries return wrong results
  • Database won't start

Diagnosis:

# Check database integrity
psql -d aitbc -c "VACUUM FULL ANALYZE;"

# Check for corruption
psql -d aitbc -c "SELECT * FROM pg_stat_database;"

Solutions:

  1. Restore from backup
# Stop PostgreSQL
sudo systemctl stop postgresql

# Restore from backup
psql -d aitbc < backup-20260511.sql

# Start PostgreSQL
sudo systemctl start postgresql
  1. Use WAL recovery
# Configure recovery
echo "restore_command = 'cp /var/lib/postgresql/wal/%f %p'" >> /etc/postgresql/*/main/recovery.conf

# Restart PostgreSQL
sudo systemctl restart postgresql

Network Issues

Connection Timeouts

Symptoms:

  • Services unable to connect to each other
  • Intermittent connection failures
  • High latency

Diagnosis:

# Test connectivity
ping -c 10 localhost

# Check DNS
nslookup localhost

# Check ports
telnet localhost 8011

Solutions:

  1. Check network configuration
# Check IP configuration
ip addr show

# Check routing
ip route show

# Check DNS
cat /etc/resolv.conf
  1. Check firewall rules
# Check UFW status
sudo ufw status

# Check iptables
sudo iptables -L -n
  1. Check MTU
# Check MTU
ip link show

# Adjust MTU if needed
sudo ip link set eth0 mtu 1500

DNS Issues

Symptoms:

  • Domain names not resolving
  • Services unable to connect by hostname
  • Slow DNS resolution

Diagnosis:

# Test DNS resolution
nslookup google.com

# Check DNS servers
cat /etc/resolv.conf

# Test local DNS
dig localhost

Solutions:

  1. Change DNS servers
# Use Google DNS
echo "nameserver 8.8.8.8" > /etc/resolv.conf
echo "nameserver 8.8.4.4" >> /etc/resolv.conf
  1. Clear DNS cache
# Clear systemd cache
sudo systemd-resolve --flush-caches

# Restart DNS service
sudo systemctl restart systemd-resolved

GPU Issues

GPU Not Detected

Symptoms:

  • GPU not recognized
  • CUDA errors
  • Mining fails

Diagnosis:

# Check GPU
nvidia-smi

# Check CUDA
nvcc --version

# Check driver
dmesg | grep -i nvidia

Solutions:

  1. Reinstall NVIDIA driver
# Remove old driver
sudo apt remove nvidia-* --purge

# Install new driver
sudo apt install nvidia-driver-535

# Reboot
sudo reboot
  1. Check CUDA installation
# Verify CUDA installation
nvcc --version

# Reinstall CUDA if needed
sudo apt install nvidia-cuda-toolkit
  1. Check GPU permissions
# Add user to video group
sudo usermod -aG video $USER

# Reboot
sudo reboot

GPU Memory Errors

Symptoms:

  • Out of memory errors
  • CUDA out of memory
  • Jobs failing

Diagnosis:

# Check GPU memory
nvidia-smi

# Monitor memory usage
watch -n 1 nvidia-smi

Solutions:

  1. Reduce batch size
# Reduce batch size in job configuration
batch_size = 8  # Reduce from 16
  1. Clear GPU cache
import torch
torch.cuda.empty_cache()
  1. Restart mining service
sudo systemctl restart aitbc-miner

Performance Issues

Slow API Response Times

Symptoms:

  • API requests take long to complete
  • Timeouts
  • Poor user experience

Diagnosis:

# Measure response time
time curl http://localhost:8011/v1/jobs

# Check database query times
psql -d aitbc -c "SELECT * FROM pg_stat_statements ORDER BY mean_exec_time DESC LIMIT 10;"

Solutions:

  1. Enable caching
# Add Redis caching
from functools import lru_cache

@lru_cache(maxsize=1000)
def get_job(job_id: str):
    return job_service.get_job(job_id)
  1. Optimize database queries
-- Add indexes
CREATE INDEX CONCURRENTLY idx_job_state ON job(state);
  1. Use connection pooling
# Increase pool size
engine = create_engine(
    DATABASE_URL,
    pool_size=20,
    max_overflow=40
)

High Latency

Symptoms:

  • Network latency high
  • Slow data transfer
  • Poor performance

Diagnosis:

# Measure latency
ping -c 10 localhost

# Check network throughput
iperf3 -s
iperf3 -c localhost

Solutions:

  1. Optimize network
# Check network configuration
ethtool eth0

# Adjust network settings
sudo ethtool -G eth0 rx 4096 tx 4096
  1. Use local caching
# Cache frequently accessed data
from cachetools import TTLCache

cache = TTLCache(maxsize=1000, ttl=300)

Security Issues

Unauthorized Access

Symptoms:

  • Unauthorized API calls
  • Failed authentication attempts
  • Suspicious activity

Diagnosis:

# Check authentication logs
sudo journalctl -u aitbc-coordinator-api | grep -i authentication

# Check access logs
sudo tail -f /var/log/nginx/access.log

Solutions:

  1. Review API keys
# List all API keys
curl -H "X-Admin-Key: $ADMIN_KEY" \
  http://localhost:8011/v1/admin/api-keys

# Revoke suspicious keys
curl -X DELETE http://localhost:8011/v1/admin/api-keys/{key_id}
  1. Enable rate limiting
# Add rate limiting
from slowapi import Limiter
limiter = Limiter(key_func=get_remote_address)

@app.post("/v1/jobs")
@limiter.limit("100/minute")
async def submit_job():
    pass
  1. Enable IP whitelisting
# Configure nginx
allow 192.168.1.0/24;
deny all;

Data Breach

Symptoms:

  • Data accessed without authorization
  • Logs show suspicious activity
  • Credentials compromised

Diagnosis:

# Check for suspicious activity
sudo journalctl -u aitbc-* | grep -i error

# Check access logs
sudo grep "401\|403" /var/log/nginx/access.log

Solutions:

  1. Immediate containment
# Stop all services
sudo systemctl stop aitbc-*

# Change all credentials
# Rotate API keys
# Change database passwords
  1. Investigate breach
# Preserve evidence
sudo journalctl -u aitbc-* > incident-logs.txt

# Analyze logs
grep -i "suspicious\|unauthorized" incident-logs.txt
  1. Recovery
# Restore from backup
psql -d aitbc < backup.sql

# Restart services
sudo systemctl start aitbc-*

Getting Help

Log Collection

When reporting issues, collect the following information:

# Service logs
sudo journalctl -u aitbc-coordinator-api -n 500 > coordinator.log
sudo journalctl -u aitbc-blockchain -n 500 > blockchain.log
sudo journalctl -u aitbc-marketplace -n 500 > marketplace.log

# System information
uname -a > system-info.txt
free -h >> system-info.txt
df -h >> system-info.txt

# Network information
ip addr show > network-info.txt
netstat -tulpn >> network-info.txt

# Database information
psql -d aitbc -c "\l" > database-info.txt
psql -d aitbc -c "SELECT version();" >> database-info.txt

Support Channels

Debug Mode

Enable debug mode for detailed logging:

# Edit environment
echo "DEBUG=true" >> /etc/aitbc/coordinator.env

# Restart service
sudo systemctl restart aitbc-coordinator-api

# View debug logs
sudo journalctl -u aitbc-coordinator-api -f