feat: add marketplace metrics, privacy features, and service registry endpoints

- Add Prometheus metrics for marketplace API throughput and error rates with new dashboard panels - Implement confidential transaction models with encryption support and access control - Add key management system with registration, rotation, and audit logging - Create services and registry routers for service discovery and management - Integrate ZK proof generation for privacy-preserving receipts - Add metrics instru
2025-12-22 10:33:23 +01:00
parent d98b2c7772
commit c8be9d7414
260 changed files with 59033 additions and 351 deletions
--- a/docs/operator/backup_restore.md
+++ b/docs/operator/backup_restore.md
@ -0,0 +1,316 @@
+# AITBC Backup and Restore Procedures
+
+This document outlines the backup and restore procedures for all AITBC system components including PostgreSQL, Redis, and blockchain ledger storage.
+
+## Overview
+
+The AITBC platform implements a comprehensive backup strategy with:
+- **Automated daily backups** via Kubernetes CronJobs
+- **Manual backup capabilities** for on-demand operations
+- **Incremental and full backup options** for ledger data
+- **Cloud storage integration** for off-site backups
+- **Retention policies** to manage storage efficiently
+
+## Components
+
+### 1. PostgreSQL Database
+- **Location**: Coordinator API persistent storage
+- **Data**: Jobs, marketplace offers/bids, user sessions, configuration
+- **Backup Format**: Custom PostgreSQL dump with compression
+- **Retention**: 30 days (configurable)
+
+### 2. Redis Cache
+- **Location**: In-memory cache with persistence
+- **Data**: Session cache, temporary data, rate limiting
+- **Backup Format**: RDB snapshot + AOF (if enabled)
+- **Retention**: 30 days (configurable)
+
+### 3. Ledger Storage
+- **Location**: Blockchain node persistent storage
+- **Data**: Blocks, transactions, receipts, wallet states
+- **Backup Format**: Compressed tar archives
+- **Retention**: 30 days (configurable)
+
+## Automated Backups
+
+### Kubernetes CronJob
+
+The automated backup system runs daily at 2:00 AM UTC:
+
+```bash
+# Deploy the backup CronJob
+kubectl apply -f infra/k8s/backup-cronjob.yaml
+
+# Check CronJob status
+kubectl get cronjob aitbc-backup
+
+# View backup jobs
+kubectl get jobs -l app=aitbc-backup
+
+# View backup logs
+kubectl logs job/aitbc-backup-<timestamp>
+```
+
+### Backup Schedule
+
+| Time (UTC) | Component      | Type       | Retention |
+|------------|----------------|------------|-----------|
+| 02:00      | PostgreSQL     | Full       | 30 days   |
+| 02:01      | Redis          | Full       | 30 days   |
+| 02:02      | Ledger         | Full       | 30 days   |
+
+## Manual Backups
+
+### PostgreSQL
+
+```bash
+# Create a manual backup
+./infra/scripts/backup_postgresql.sh default my-backup-$(date +%Y%m%d)
+
+# View available backups
+ls -la /tmp/postgresql-backups/
+
+# Upload to S3 manually
+aws s3 cp /tmp/postgresql-backups/my-backup.sql.gz s3://aitbc-backups-default/postgresql/
+```
+
+### Redis
+
+```bash
+# Create a manual backup
+./infra/scripts/backup_redis.sh default my-redis-backup-$(date +%Y%m%d)
+
+# Force background save before backup
+kubectl exec -n default deployment/redis -- redis-cli BGSAVE
+```
+
+### Ledger Storage
+
+```bash
+# Create a full backup
+./infra/scripts/backup_ledger.sh default my-ledger-backup-$(date +%Y%m%d)
+
+# Create incremental backup
+./infra/scripts/backup_ledger.sh default incremental-backup-$(date +%Y%m%d) true
+```
+
+## Restore Procedures
+
+### PostgreSQL Restore
+
+```bash
+# List available backups
+aws s3 ls s3://aitbc-backups-default/postgresql/
+
+# Download backup from S3
+aws s3 cp s3://aitbc-backups-default/postgresql/postgresql-backup-20231222_020000.sql.gz /tmp/
+
+# Restore database
+./infra/scripts/restore_postgresql.sh default /tmp/postgresql-backup-20231222_020000.sql.gz
+
+# Verify restore
+kubectl exec -n default deployment/coordinator-api -- curl -s http://localhost:8011/v1/health
+```
+
+### Redis Restore
+
+```bash
+# Stop Redis service
+kubectl scale deployment redis --replicas=0 -n default
+
+# Clear existing data
+kubectl exec -n default deployment/redis -- rm -f /data/dump.rdb /data/appendonly.aof
+
+# Copy backup file
+kubectl cp /tmp/redis-backup.rdb default/redis-0:/data/dump.rdb
+
+# Start Redis service
+kubectl scale deployment redis --replicas=1 -n default
+
+# Verify restore
+kubectl exec -n default deployment/redis -- redis-cli DBSIZE
+```
+
+### Ledger Restore
+
+```bash
+# Stop blockchain nodes
+kubectl scale deployment blockchain-node --replicas=0 -n default
+
+# Extract backup
+tar -xzf /tmp/ledger-backup-20231222_020000.tar.gz -C /tmp/
+
+# Copy ledger data
+kubectl cp /tmp/chain/ default/blockchain-node-0:/app/data/chain/
+kubectl cp /tmp/wallets/ default/blockchain-node-0:/app/data/wallets/
+kubectl cp /tmp/receipts/ default/blockchain-node-0:/app/data/receipts/
+
+# Start blockchain nodes
+kubectl scale deployment blockchain-node --replicas=3 -n default
+
+# Verify restore
+kubectl exec -n default deployment/blockchain-node -- curl -s http://localhost:8080/v1/blocks/head
+```
+
+## Disaster Recovery
+
+### Recovery Time Objective (RTO)
+
+| Component      | RTO Target | Notes                           |
+|----------------|------------|---------------------------------|
+| PostgreSQL     | 1 hour     | Database restore from backup     |
+| Redis          | 15 minutes | Cache rebuild from backup       |
+| Ledger         | 2 hours    | Full chain synchronization       |
+
+### Recovery Point Objective (RPO)
+
+| Component      | RPO Target | Notes                           |
+|----------------|------------|---------------------------------|
+| PostgreSQL     | 24 hours   | Daily backups                    |
+| Redis          | 24 hours   | Daily backups                    |
+| Ledger         | 24 hours   | Daily full + incremental backups|
+
+### Disaster Recovery Steps
+
+1. **Assess Impact**
+   ```bash
+   # Check component status
+   kubectl get pods -n default
+   kubectl get events --sort-by=.metadata.creationTimestamp
+   ```
+
+2. **Restore Critical Services**
+   ```bash
+   # Restore PostgreSQL first (critical for operations)
+   ./infra/scripts/restore_postgresql.sh default [latest-backup]
+   
+   # Restore Redis cache
+   ./restore_redis.sh default [latest-backup]
+   
+   # Restore ledger data
+   ./restore_ledger.sh default [latest-backup]
+   ```
+
+3. **Verify System Health**
+   ```bash
+   # Check all services
+   kubectl get pods -n default
+   
+   # Verify API endpoints
+   curl -s http://coordinator-api:8011/v1/health
+   curl -s http://blockchain-node:8080/v1/health
+   ```
+
+## Monitoring and Alerting
+
+### Backup Monitoring
+
+Prometheus metrics track backup success/failure:
+
+```yaml
+# AlertManager rules for backups
+- alert: BackupFailed
+  expr: backup_success == 0
+  for: 5m
+  labels:
+    severity: critical
+  annotations:
+    summary: "Backup failed for {{ $labels.component }}"
+    description: "Backup for {{ $labels.component }} has failed for 5 minutes"
+```
+
+### Log Monitoring
+
+```bash
+# View backup logs
+kubectl logs -l app=aitbc-backup -n default --tail=100
+
+# Monitor backup CronJob
+kubectl get cronjob aitbc-backup -w
+```
+
+## Best Practices
+
+### Backup Security
+
+1. **Encryption**: Backups uploaded to S3 use server-side encryption
+2. **Access Control**: IAM policies restrict backup access
+3. **Retention**: Automatic cleanup of old backups
+4. **Validation**: Regular restore testing
+
+### Performance Considerations
+
+1. **Off-Peak Backups**: Scheduled during low traffic (2 AM UTC)
+2. **Parallel Processing**: Components backed up sequentially
+3. **Compression**: All backups compressed to save storage
+4. **Incremental Backups**: Ledger supports incremental to reduce size
+
+### Testing
+
+1. **Monthly Restore Tests**: Validate backup integrity
+2. **Disaster Recovery Drills**: Quarterly full scenario testing
+3. **Documentation Updates**: Keep procedures current
+
+## Troubleshooting
+
+### Common Issues
+
+#### Backup Fails with "Permission Denied"
+```bash
+# Check service account permissions
+kubectl describe serviceaccount backup-service-account
+kubectl describe role backup-role
+```
+
+#### Restore Fails with "Database in Use"
+```bash
+# Scale down application before restore
+kubectl scale deployment coordinator-api --replicas=0
+# Perform restore
+# Scale up after restore
+kubectl scale deployment coordinator-api --replicas=3
+```
+
+#### Ledger Restore Incomplete
+```bash
+# Verify backup integrity
+tar -tzf ledger-backup.tar.gz
+# Check metadata.json for block height
+cat metadata.json | jq '.latest_block_height'
+```
+
+### Getting Help
+
+1. Check logs: `kubectl logs -l app=aitbc-backup`
+2. Verify storage: `df -h` on backup nodes
+3. Check network: Test S3 connectivity
+4. Review events: `kubectl get events --sort-by=.metadata.creationTimestamp`
+
+## Configuration
+
+### Environment Variables
+
+| Variable               | Default          | Description                     |
+|------------------------|------------------|---------------------------------|
+| BACKUP_RETENTION_DAYS  | 30               | Days to keep backups            |
+| BACKUP_SCHEDULE        | 0 2 * * *        | Cron schedule for backups       |
+| S3_BUCKET_PREFIX       | aitbc-backups    | S3 bucket name prefix           |
+| COMPRESSION_LEVEL      | 6                | gzip compression level          |
+
+### Customizing Backup Schedule
+
+Edit the CronJob schedule in `infra/k8s/backup-cronjob.yaml`:
+
+```yaml
+spec:
+  schedule: "0 3 * * *"  # Change to 3 AM UTC
+```
+
+### Adjusting Retention
+
+Modify retention in each backup script:
+
+```bash
+# In backup_*.sh scripts
+RETENTION_DAYS=60  # Keep for 60 days instead of 30
+```
--- a/docs/operator/beta-release-plan.md
+++ b/docs/operator/beta-release-plan.md
@ -0,0 +1,273 @@
+# AITBC Beta Release Plan
+
+## Executive Summary
+
+This document outlines the beta release plan for AITBC (AI Trusted Blockchain Computing), a blockchain platform designed for AI workloads. The release follows a phased approach: Alpha → Beta → Release Candidate (RC) → General Availability (GA).
+
+## Release Phases
+
+### Phase 1: Alpha Release (Completed)
+- **Duration**: 2 weeks
+- **Participants**: Internal team (10 members)
+- **Focus**: Core functionality validation
+- **Status**: ✅ Completed
+
+### Phase 2: Beta Release (Current)
+- **Duration**: 6 weeks
+- **Participants**: 50-100 external testers
+- **Focus**: User acceptance testing, performance validation, security assessment
+- **Start Date**: 2025-01-15
+- **End Date**: 2025-02-26
+
+### Phase 3: Release Candidate
+- **Duration**: 2 weeks
+- **Participants**: 20 selected beta testers
+- **Focus**: Final bug fixes, performance optimization
+- **Start Date**: 2025-03-04
+- **End Date**: 2025-03-18
+
+### Phase 4: General Availability
+- **Date**: 2025-03-25
+- **Target**: Public launch
+
+## Beta Release Timeline
+
+### Week 1-2: Onboarding & Basic Flows
+- **Jan 15-19**: Tester onboarding and environment setup
+- **Jan 22-26**: Basic job submission and completion flows
+- **Milestone**: 80% of testers successfully submit and complete jobs
+
+### Week 3-4: Marketplace & Explorer Testing
+- **Jan 29 - Feb 2**: Marketplace functionality testing
+- **Feb 5-9**: Explorer UI validation and transaction tracking
+- **Milestone**: 100 marketplace transactions completed
+
+### Week 5-6: Stress Testing & Feedback
+- **Feb 12-16**: Performance stress testing (1000+ concurrent jobs)
+- **Feb 19-23**: Security testing and final feedback collection
+- **Milestone**: All critical bugs resolved
+
+## User Acceptance Testing (UAT) Scenarios
+
+### 1. Core Job Lifecycle
+- **Scenario**: Submit AI inference job → Miner picks up → Execution → Results delivery → Payment
+- **Test Cases**:
+  - Job submission with various model types
+  - Job monitoring and status tracking
+  - Result retrieval and verification
+  - Payment processing and wallet updates
+- **Success Criteria**: 95% success rate across 1000 test jobs
+
+### 2. Marketplace Operations
+- **Scenario**: Create offer → Accept offer → Execute job → Complete transaction
+- **Test Cases**:
+  - Offer creation and management
+  - Bid acceptance and matching
+  - Price discovery mechanisms
+  - Dispute resolution
+- **Success Criteria**: 50 successful marketplace transactions
+
+### 3. Explorer Functionality
+- **Scenario**: Transaction lookup → Job tracking → Address analysis
+- **Test Cases**:
+  - Real-time transaction monitoring
+  - Job history and status visualization
+  - Wallet balance tracking
+  - Block explorer features
+- **Success Criteria**: All transactions visible within 5 seconds
+
+### 4. Wallet Management
+- **Scenario**: Wallet creation → Funding → Transactions → Backup/Restore
+- **Test Cases**:
+  - Multi-signature wallet creation
+  - Cross-chain transfers
+  - Backup and recovery procedures
+  - Staking and unstaking operations
+- **Success Criteria**: 100% wallet recovery success rate
+
+### 5. Mining Operations
+- **Scenario**: Miner setup → Job acceptance → Mining rewards → Pool participation
+- **Test Cases**:
+  - Miner registration and setup
+  - Job bidding and execution
+  - Reward distribution
+  - Pool mining operations
+- **Success Criteria**: 90% of submitted jobs accepted by miners
+
+### 6. Community Management
+
+### Discord Community Structure
+- **#announcements**: Official updates and milestones
+- **#beta-testers**: Private channel for testers only
+- **#bug-reports**: Structured bug reporting format
+- **#feature-feedback**: Feature requests and discussions
+- **#technical-support**: 24/7 support from the team
+
+### Regulatory Considerations
+- **KYC/AML**: Basic identity verification for testers
+- **Securities Law**: Beta tokens have no monetary value
+- **Tax Reporting**: Testnet transactions not taxable
+- **Export Controls**: Compliance with technology export laws
+
+### Geographic Restrictions
+Beta testing is not available in:
+- North Korea, Iran, Cuba, Syria, Crimea
+- Countries under US sanctions
+- Jurdictions with unclear crypto regulations
+
+### 7. Token Economics Validation
+- **Scenario**: Token issuance → Reward distribution → Staking yields → Fee mechanisms
+- **Test Cases**:
+  - Mining reward calculations match whitepaper specs
+  - Staking yields and unstaking penalties
+  - Transaction fee burning and distribution
+  - Marketplace fee structures
+  - Token inflation/deflation mechanics
+- **Success Criteria**: All token operations within 1% of theoretical values
+
+## Performance Benchmarks (Go/No-Go Criteria)
+
+### Must-Have Metrics
+- **Transaction Throughput**: ≥ 100 TPS (Transactions Per Second)
+- **Job Completion Time**: ≤ 5 minutes for standard inference jobs
+- **API Response Time**: ≤ 200ms (95th percentile)
+- **System Uptime**: ≥ 99.9% during beta period
+- **MTTR (Mean Time To Recovery)**: ≤ 2 minutes (from chaos tests)
+
+### Nice-to-Have Metrics
+- **Transaction Throughput**: ≥ 500 TPS
+- **Job Completion Time**: ≤ 2 minutes
+- **API Response Time**: ≤ 100ms (95th percentile)
+- **Concurrent Users**: ≥ 1000 simultaneous users
+
+## Security Testing
+
+### Automated Security Scans
+- **Smart Contract Audits**: Completed by [Security Firm]
+- **Penetration Testing**: OWASP Top 10 validation
+- **Dependency Scanning**: CVE scan of all dependencies
+- **Chaos Testing**: Network partition and coordinator outage scenarios
+
+### Manual Security Reviews
+- **Authorization Testing**: API key validation and permissions
+- **Data Privacy**: GDPR compliance validation
+- **Cryptography**: Proof verification and signature validation
+- **Infrastructure Security**: Kubernetes and cloud security review
+
+## Test Environment Setup
+
+### Beta Environment
+- **Network**: Separate testnet with faucet for test tokens
+- **Infrastructure**: Production-like setup with monitoring
+- **Data**: Reset weekly to ensure clean testing
+- **Support**: 24/7 Discord support channel
+
+### Access Credentials
+- **Testnet Faucet**: 1000 AITBC tokens per tester
+- **API Keys**: Unique keys per tester with rate limits
+- **Wallet Seeds**: Generated per tester with backup instructions
+- **Mining Accounts**: Pre-configured mining pools for testing
+
+## Feedback Collection Mechanisms
+
+### Automated Collection
+- **Error Reporting**: Automatic crash reports and error logs
+- **Performance Metrics**: Client-side performance data
+- **Usage Analytics**: Feature usage tracking (anonymized)
+- **Survey System**: In-app feedback prompts
+
+### Manual Collection
+- **Weekly Surveys**: Structured feedback on specific features
+- **Discord Channels**: Real-time feedback and discussions
+- **Office Hours**: Weekly Q&A sessions with the team
+- **Bug Bounty**: Program for critical issue discovery
+
+## Success Criteria
+
+### Go/No-Go Decision Points
+
+#### Week 2 Checkpoint (Jan 26)
+- **Go Criteria**: 80% of testers onboarded, basic flows working
+- **Blockers**: Critical bugs in job submission/completion
+
+#### Week 4 Checkpoint (Feb 9)
+- **Go Criteria**: 50 marketplace transactions, explorer functional
+- **Blockers**: Security vulnerabilities, performance < 50 TPS
+
+#### Week 6 Final Decision (Feb 23)
+- **Go Criteria**: All UAT scenarios passed, benchmarks met
+- **Blockers**: Any critical security issue, MTTR > 5 minutes
+
+### Overall Success Metrics
+- **User Satisfaction**: ≥ 4.0/5.0 average rating
+- **Bug Resolution**: 90% of reported bugs fixed
+- **Performance**: All benchmarks met
+- **Security**: No critical vulnerabilities
+
+## Risk Management
+
+### Technical Risks
+- **Consensus Issues**: Rollback to previous version
+- **Performance Degradation**: Auto-scaling and optimization
+- **Security Breaches**: Immediate patch and notification
+
+### Operational Risks
+- **Test Environment Downtime**: Backup environment ready
+- **Low Tester Participation**: Incentive program adjustments
+- **Feature Scope Creep**: Strict feature freeze after Week 4
+
+### Mitigation Strategies
+- **Daily Health Checks**: Automated monitoring and alerts
+- **Rollback Plan**: Documented procedures for quick rollback
+- **Communication Plan**: Regular updates to all stakeholders
+
+## Communication Plan
+
+### Internal Updates
+- **Daily Standups**: Development team sync
+- **Weekly Reports**: Progress to leadership
+- **Bi-weekly Demos**: Feature demonstrations
+
+### External Updates
+- **Beta Newsletter**: Weekly updates to testers
+- **Blog Posts**: Public progress updates
+- **Social Media**: Regular platform updates
+
+## Post-Beta Activities
+
+### RC Phase Preparation
+- **Bug Triage**: Prioritize and assign all reported issues
+- **Performance Tuning**: Optimize based on beta metrics
+- **Documentation Updates**: Incorporate beta feedback
+
+### GA Preparation
+- **Final Security Review**: Complete audit and penetration test
+- **Infrastructure Scaling**: Prepare for production load
+- **Support Team Training**: Enable customer support team
+
+## Appendix
+
+### A. Test Case Matrix
+[Detailed test case spreadsheet link]
+
+### B. Performance Benchmark Results
+[Benchmark data and graphs]
+
+### C. Security Audit Reports
+[Audit firm reports and findings]
+
+### D. Feedback Analysis
+[Summary of all user feedback and actions taken]
+
+## Contact Information
+
+- **Beta Program Manager**: beta@aitbc.io
+- **Technical Support**: support@aitbc.io
+- **Security Issues**: security@aitbc.io
+- **Discord Community**: https://discord.gg/aitbc
+
+---
+
+*Last Updated: 2025-01-10*
+*Version: 1.0*
+*Next Review: 2025-01-17*
--- a/docs/operator/deployment/ports.md
+++ b/docs/operator/deployment/ports.md
@ -0,0 +1,30 @@
+# Port Allocation Plan
+
+This document tracks current and planned TCP port assignments across the AITBC devnet stack. Update it whenever new services are introduced or defaults change.
+
+## Current Usage
+
+| Port | Service | Location | Notes |
+| --- | --- | --- | --- |
+| 8011 | Coordinator API (dev) | `apps/coordinator-api/` | Development coordinator API with job and marketplace endpoints. |
+| 8071 | Wallet Daemon API | `apps/wallet-daemon/` | REST and JSON-RPC wallet service with receipt verification. |
+| 8080 | Blockchain RPC API (FastAPI) | `apps/blockchain-node/scripts/devnet_up.sh` → `python -m uvicorn aitbc_chain.app:app` | Exposes REST/WebSocket RPC endpoints for blocks, transactions, receipts. |
+| 8090 | Mock Coordinator API | `apps/blockchain-node/scripts/devnet_up.sh` → `uvicorn mock_coordinator:app` | Generates synthetic coordinator/miner telemetry consumed by Grafana dashboards. |
+| 8100 | Pool Hub API (planned) | `apps/pool-hub/` | FastAPI service for miner registry and matching. |
+| 8900 | Coordinator API (production) | `apps/coordinator-api/` | Production-style deployment port. |
+| 9090 | Prometheus | `apps/blockchain-node/observability/` | Scrapes blockchain node + mock coordinator metrics. |
+| 3000 | Grafana | `apps/blockchain-node/observability/` | Visualizes metrics dashboards for blockchain and coordinator. |
+| 4173 | Explorer Web (dev) | `apps/explorer-web/` | Vite dev server for blockchain explorer interface. |
+| 5173 | Marketplace Web (dev) | `apps/marketplace-web/` | Vite dev server for marketplace interface. |
+
+## Reserved / Planned Ports
+
+- **Miner Node** – No default port (connects to coordinator via HTTP).
+- **JavaScript/Python SDKs** – Client libraries, no dedicated ports.
+
+## Guidance
+
+- Avoid reusing the same port across services in devnet scripts to prevent binding conflicts (recent issues occurred when `8080`/`8090` were already in use).
+- For production-grade environments, place HTTP services behind a reverse proxy (nginx/Traefik) and update this table with the external vs. internal port mapping.
+- When adding new dashboards or exporters, note both the scrape port (Prometheus) and any UI port (Grafana/others).
+- If a port is deprecated, strike it through in this table and add a note describing the migration path.
--- a/docs/operator/deployment/run.md
+++ b/docs/operator/deployment/run.md
@ -0,0 +1,281 @@
+# Service Run Instructions
+
+These instructions cover the newly scaffolded services. Install dependencies using Poetry (preferred) or `pip` inside a virtual environment.
+
+## Prerequisites
+
+- Python 3.11+
+- Poetry 1.7+ (or virtualenv + pip)
+- Optional: GPU drivers for miner node workloads
+
+## Coordinator API (`apps/coordinator-api/`)
+
+1. Navigate to the service directory:
+   ```bash
+   cd apps/coordinator-api
+   ```
+2. Install dependencies:
+   ```bash
+   ```
+3. Copy environment template and adjust values:
+   ```bash
+   cp .env.example .env
+   ```
+   Add coordinator API keys and, if you want signed receipts, set `RECEIPT_SIGNING_KEY_HEX` to a 64-byte Ed25519 private key encoded in hex.
+4. Configure database (shared Postgres): ensure `.env` contains `DATABASE_URL=postgresql://aitbc:248218d8b7657aef@localhost:5432/aitbc` or export it in the shell before running commands.
+
+5. Run the API locally (development):
+   ```bash
+   poetry run uvicorn app.main:app --host 127.0.0.2 --port 8011 --reload
+   ```
+6. Production-style launch using Gunicorn (ports start at 8900):
+   ```bash
+   poetry run gunicorn app.main:app -k uvicorn.workers.UvicornWorker -b 127.0.0.2:8900
+   ```
+7. Generate a signing key (optional):
+   ```bash
+   python - <<'PY'
+   from nacl.signing import SigningKey
+   sk = SigningKey.generate()
+   print(sk.encode().hex())
+   PY
+   ```
+   Store the printed hex string in `RECEIPT_SIGNING_KEY_HEX` to enable signed receipts in responses.
+   To add coordinator attestations, set `RECEIPT_ATTESTATION_KEY_HEX` to a separate Ed25519 private key; responses include an `attestations` array that can be verified with the corresponding public key.
+8. Retrieve receipts:
+   - Latest receipt for a job: `GET /v1/jobs/{job_id}/receipt`
+   - Entire receipt history: `GET /v1/jobs/{job_id}/receipts`
+
+   Ensure the client request includes the appropriate API key; responses embed signed payloads compatible with `packages/py/aitbc-crypto` verification helpers.
+   Example verification snippet using the Python helpers:
+   ```bash
+   export PYTHONPATH=packages/py/aitbc-crypto/src
+   python - <<'PY'
+   from aitbc_crypto.signing import ReceiptVerifier
+   from aitbc_crypto.receipt import canonical_json
+   import json
+
+   receipt = json.load(open("receipt.json", "r"))
+   verifier = ReceiptVerifier(receipt["signature"]["public_key"])
+   verifier.verify(receipt)
+   print("receipt verified", receipt["receipt_id"])
+   PY
+   ```
+   Alternatively, install the Python SDK helpers:
+   ```bash
+   cd packages/py/aitbc-sdk
+   poetry install
+   export PYTHONPATH=packages/py/aitbc-sdk/src:packages/py/aitbc-crypto/src
+   python - <<'PY'
+   from aitbc_sdk import CoordinatorReceiptClient, verify_receipt
+
+   client = CoordinatorReceiptClient("http://localhost:8011", "client_dev_key_1")
+   receipt = client.fetch_latest("<job_id>")
+   verification = verify_receipt(receipt)
+   print("miner signature valid:", verification.miner_signature.valid)
+   print("coordinator attestations:", [att.valid for att in verification.coordinator_attestations])
+   PY
+   For receipts containing `attestations`, iterate the list and verify each entry with the corresponding public key.
+   A JavaScript helper will ship with the Stage 2 SDK under `packages/js/`; until then, receipts can be verified with Node.js by loading the canonical JSON and invoking an Ed25519 verify function from `tweetnacl` (the payload is `canonical_json(receipt)` and the public key is `receipt.signature.public_key`).
+   Example Node.js snippet:
+   ```bash
+   node <<'JS'
+  import fs from "fs";
+  import nacl from "tweetnacl";
+  import canonical from "json-canonicalize";
+
+  const receipt = JSON.parse(fs.readFileSync("receipt.json", "utf-8"));
+  const message = canonical(receipt).trim();
+  const sig = receipt.signature.sig;
+  const key = receipt.signature.key_id;
+
+   const signature = Buffer.from(sig.replace(/-/g, "+").replace(/_/g, "/"), "base64");
+   const publicKey = Buffer.from(key.replace(/-/g, "+").replace(/_/g, "/"), "base64");
+
+   const ok = nacl.sign.detached.verify(Buffer.from(message, "utf-8"), signature, publicKey);
+  console.log("verified:", ok);
+  JS
+  ```
+
+## Solidity Token (`packages/solidity/aitbc-token/`)
+
+1. Navigate to the token project:
+   ```bash
+   cd packages/solidity/aitbc-token
+   npm install
+   ```
+2. Run the contract unit tests:
+   ```bash
+   npx hardhat test
+   ```
+3. Deploy `AIToken` to the configured Hardhat network. Provide the coordinator (required) and attestor (optional) role recipients via environment variables:
+   ```bash
+   COORDINATOR_ADDRESS=0xCoordinator \
+   ATTESTOR_ADDRESS=0xAttestor \
+   npx hardhat run scripts/deploy.ts --network localhost
+   ```
+   The script prints the deployed address and automatically grants the coordinator and attestor roles if they are not already assigned. Export the printed address for follow-on steps:
+   ```bash
+   export AITOKEN_ADDRESS=0xDeployedAddress
+   ```
+4. Mint tokens against an attested receipt by calling the contract from Hardhat’s console or a script. The helper below loads the deployed contract and invokes `mintWithReceipt` with an attestor signature:
+   ```ts
+   // scripts/mintWithReceipt.ts
+   import { ethers } from "hardhat";
+   import { AIToken__factory } from "../typechain-types";
+
+   async function main() {
+     const [coordinator] = await ethers.getSigners();
+     const token = AIToken__factory.connect(process.env.AITOKEN_ADDRESS!, coordinator);
+
+     const provider = "0xProvider";
+     const units = 100n;
+     const receiptHash = "0x...";
+     const signature = "0xSignedStructHash";
+
+     const tx = await token.mintWithReceipt(provider, units, receiptHash, signature);
+     await tx.wait();
+     console.log("Mint complete", await token.balanceOf(provider));
+   }
+
+   main().catch((err) => {
+     console.error(err);
+     process.exitCode = 1;
+   });
+   ```
+   Execute the helper with `AITOKEN_ADDRESS` exported and the signature produced by the attestor key used in your tests or integration flow:
+   ```bash
+   AITOKEN_ADDRESS=$AITOKEN_ADDRESS npx ts-node scripts/mintWithReceipt.ts
+   ```
+5. To derive the signature payload, reuse the `buildSignature` helper from `test/aitoken.test.ts` or recreate it in a script. The struct hash encodes `(chainId, contractAddress, provider, units, receiptHash)` and must be signed by an authorized attestor account.
+
+## Wallet Daemon (`apps/wallet-daemon/`)
+
+1. Navigate to the service directory:
+   ```bash
+   ```
+2. Install dependencies:
+   ```bash
+   poetry install
+   ```
+3. Copy or create `.env` with coordinator access:
+   ```bash
+   cp .env.example .env  # create if missing
+   ```
+   Populate `COORDINATOR_BASE_URL` and `COORDINATOR_API_KEY` to reuse the coordinator API when verifying receipts.
+4. Run the API locally:
+   ```bash
+   poetry run uvicorn app.main:app --host 127.0.0.2 --port 8071 --reload
+   ```
+5. REST endpoints:
+   - `GET /v1/receipts/{job_id}` – fetch + verify latest coordinator receipt.
+   - `GET /v1/receipts/{job_id}/history` – fetch + verify entire receipt history.
+6. JSON-RPC endpoint:
+   - `POST /rpc` with methods `receipts.verify_latest` and `receipts.verify_history` returning signature validation metadata identical to REST responses.
+7. Example REST usage:
+   ```bash
+   curl -s "http://localhost:8071/v1/receipts/<job_id>" | jq
+   ```
+8. Example JSON-RPC call:
+   ```bash
+   curl -s http://localhost:8071/rpc \
+     -H 'Content-Type: application/json' \
+     -d '{"jsonrpc":"2.0","id":1,"method":"receipts.verify_latest","params":{"job_id":"<job_id>"}}' | jq
+   ```
+9. Keystore scaffold:
+   - `KeystoreService` currently stores wallets in-memory using Argon2id key derivation + XChaCha20-Poly1305 encryption.
+   - Subsequent milestones will back this with persistence and CLI/REST routes for wallet creation/import.
+
+## Miner Node (`apps/miner-node/`)
+
+1. Navigate to the directory:
+   ```bash
+   cd apps/miner-node
+   ```
+2. Install dependencies:
+   ```bash
+   poetry install
+   ```
+3. Configure environment:
+   ```bash
+   cp .env.example .env
+   ```
+   Adjust `COORDINATOR_BASE_URL`, `MINER_AUTH_TOKEN`, and workspace paths.
+4. Run the miner control loop:
+   ```bash
+   poetry run python -m aitbc_miner.main
+   ```
+   The miner now registers and heartbeats against the coordinator, polling for work and executing CLI/Python runners. Ensure the coordinator service is running first.
+5. Deploy as a systemd service (optional):
+   ```bash
+   sudo scripts/ops/install_miner_systemd.sh
+   ```
+   Add or update `/opt/aitbc/apps/miner-node/.env`, then use `sudo systemctl status aitbc-miner` to monitor the service.
+
+## Blockchain Node (`apps/blockchain-node/`)
+
+1. Navigate to the directory:
+   ```bash
+   cd apps/blockchain-node
+   ```
+2. Install dependencies:
+   ```bash
+   poetry install
+   ```
+3. Configure environment:
+   ```bash
+   cp .env.example .env
+   ```
+   Update database path, proposer key, and bind host/port as needed.
+4. Run the node placeholder:
+   ```bash
+   poetry run python -m aitbc_chain.main
+   ```
+   (RPC, consensus, and P2P logic still to be implemented.)
+
+### Observability Dashboards & Alerts
+
+1. Generate the starter Grafana dashboards (if not already present):
+   ```bash
+   cd apps/blockchain-node
+   PYTHONPATH=src python - <<'PY'
+from pathlib import Path
+from aitbc_chain.observability.dashboards import generate_default_dashboards
+
+output_dir = Path("observability/generated_dashboards")
+output_dir.mkdir(parents=True, exist_ok=True)
+generate_default_dashboards(output_dir)
+print("Dashboards written to", output_dir)
+PY
+   ```
+2. Import each JSON file into Grafana (**Dashboards → Import**):
+   - `apps/blockchain-node/observability/generated_dashboards/coordinator-overview.json`
+   - `apps/blockchain-node/observability/generated_dashboards/blockchain-node-overview.json`
+
+   Select your Prometheus datasource (pointing at `127.0.0.1:8080` and `127.0.0.1:8090`) during import.
+3. Ensure Prometheus scrapes both services. Example snippet from `apps/blockchain-node/observability/prometheus.yml`:
+   ```yaml
+   scrape_configs:
+     - job_name: "blockchain-node"
+       static_configs:
+         - targets: ["127.0.0.1:8080"]
+
+     - job_name: "mock-coordinator"
+       static_configs:
+         - targets: ["127.0.0.1:8090"]
+   ```
+4. Deploy the Alertmanager rules in `apps/blockchain-node/observability/alerts.yml` (proposer stalls, miner errors, receipt drop-offs, RPC error spikes). After modifying rule files, reload Prometheus/Alertmanager:
+   ```bash
+   systemctl restart prometheus
+   systemctl restart alertmanager
+   ```
+5. Validate by briefly stopping `aitbc-coordinator.service`, confirming Grafana panels pause and the new alerts fire, then restart the service.
+
+## Next Steps
+
+- Flesh out remaining logic per task breakdowns in `docs/*.md` (e.g., capability-aware scheduling, artifact uploads).
+- Run the growing test suites regularly:
+  - `pytest apps/coordinator-api/tests/test_jobs.py`
+  - `pytest apps/coordinator-api/tests/test_miner_service.py`
+  - `pytest apps/miner-node/tests/test_runners.py`
+- Create systemd and Nginx configs once services are runnable in production mode.
--- a/docs/operator/incident-runbooks.md
+++ b/docs/operator/incident-runbooks.md
@ -0,0 +1,485 @@
+# AITBC Incident Runbooks
+
+This document contains specific runbooks for common incident scenarios, based on our chaos testing validation.
+
+## Runbook: Coordinator API Outage
+
+### Based on Chaos Test: `chaos_test_coordinator.py`
+
+### Symptoms
+- 503/504 errors on all endpoints
+- Health check failures
+- Job submission failures
+- Marketplace unresponsive
+
+### MTTR Target: 2 minutes
+
+### Immediate Actions (0-2 minutes)
+```bash
+# 1. Check pod status
+kubectl get pods -n default -l app.kubernetes.io/name=coordinator
+
+# 2. Check recent events
+kubectl get events -n default --sort-by=.metadata.creationTimestamp | tail -20
+
+# 3. Check if pods are crashlooping
+kubectl describe pod -n default -l app.kubernetes.io/name=coordinator
+
+# 4. Quick restart if needed
+kubectl rollout restart deployment/coordinator -n default
+```
+
+### Investigation (2-10 minutes)
+1. **Review Logs**
+   ```bash
+   kubectl logs -n default deployment/coordinator --tail=100
+   ```
+
+2. **Check Resource Limits**
+   ```bash
+   kubectl top pods -n default -l app.kubernetes.io/name=coordinator
+   ```
+
+3. **Verify Database Connectivity**
+   ```bash
+   kubectl exec -n default deployment/coordinator -- nc -z postgresql 5432
+   ```
+
+4. **Check Redis Connection**
+   ```bash
+   kubectl exec -n default deployment/coordinator -- redis-cli -h redis ping
+   ```
+
+### Recovery Actions
+1. **Scale Up if Resource Starved**
+   ```bash
+   kubectl scale deployment/coordinator --replicas=5 -n default
+   ```
+
+2. **Manual Pod Deletion if Stuck**
+   ```bash
+   kubectl delete pods -n default -l app.kubernetes.io/name=coordinator --force --grace-period=0
+   ```
+
+3. **Rollback Deployment**
+   ```bash
+   kubectl rollout undo deployment/coordinator -n default
+   ```
+
+### Verification
+```bash
+# Test health endpoint
+curl -f http://127.0.0.2:8011/v1/health
+
+# Test API with sample request
+curl -X GET http://127.0.0.2:8011/v1/jobs -H "X-API-Key: test-key"
+```
+
+## Runbook: Network Partition
+
+### Based on Chaos Test: `chaos_test_network.py`
+
+### Symptoms
+- Blockchain nodes not communicating
+- Consensus stalled
+- High finality latency
+- Transaction processing delays
+
+### MTTR Target: 5 minutes
+
+### Immediate Actions (0-5 minutes)
+```bash
+# 1. Check peer connectivity
+kubectl exec -n default deployment/blockchain-node -- curl -s http://localhost:8080/v1/peers | jq
+
+# 2. Check consensus status
+kubectl exec -n default deployment/blockchain-node -- curl -s http://localhost:8080/v1/consensus | jq
+
+# 3. Check network policies
+kubectl get networkpolicies -n default
+```
+
+### Investigation (5-15 minutes)
+1. **Identify Partitioned Nodes**
+   ```bash
+   # Check each node's peer count
+   for pod in $(kubectl get pods -n default -l app.kubernetes.io/name=blockchain-node -o jsonpath='{.items[*].metadata.name}'); do
+     echo "Pod: $pod"
+     kubectl exec -n default $pod -- curl -s http://localhost:8080/v1/peers | jq '. | length'
+   done
+   ```
+
+2. **Check Network Policies**
+   ```bash
+   kubectl describe networkpolicy default-deny-all-ingress -n default
+   kubectl describe networkpolicy blockchain-node-netpol -n default
+   ```
+
+3. **Verify DNS Resolution**
+   ```bash
+   kubectl exec -n default deployment/blockchain-node -- nslookup blockchain-node
+   ```
+
+### Recovery Actions
+1. **Remove Problematic Network Rules**
+   ```bash
+   # Flush iptables on affected nodes
+   for pod in $(kubectl get pods -n default -l app.kubernetes.io/name=blockchain-node -o jsonpath='{.items[*].metadata.name}'); do
+     kubectl exec -n default $pod -- iptables -F
+   done
+   ```
+
+2. **Restart Network Components**
+   ```bash
+   kubectl rollout restart deployment/blockchain-node -n default
+   ```
+
+3. **Force Re-peering**
+   ```bash
+   # Delete and recreate pods to force re-peering
+   kubectl delete pods -n default -l app.kubernetes.io/name=blockchain-node
+   ```
+
+### Verification
+```bash
+# Wait for consensus to resume
+watch -n 5 'kubectl exec -n default deployment/blockchain-node -- curl -s http://localhost:8080/v1/consensus | jq .height'
+
+# Verify peer connectivity
+kubectl exec -n default deployment/blockchain-node -- curl -s http://localhost:8080/v1/peers | jq '. | length'
+```
+
+## Runbook: Database Failure
+
+### Based on Chaos Test: `chaos_test_database.py`
+
+### Symptoms
+- Database connection errors
+- Service degradation
+- Failed transactions
+- High error rates
+
+### MTTR Target: 3 minutes
+
+### Immediate Actions (0-3 minutes)
+```bash
+# 1. Check PostgreSQL status
+kubectl exec -n default deployment/postgresql -- pg_isready
+
+# 2. Check connection count
+kubectl exec -n default deployment/postgresql -- psql -U aitbc -c "SELECT count(*) FROM pg_stat_activity;"
+
+# 3. Check replica lag
+kubectl exec -n default deployment/postgresql-replica -- psql -U aitbc -c "SELECT pg_last_xact_replay_timestamp();"
+```
+
+### Investigation (3-10 minutes)
+1. **Review Database Logs**
+   ```bash
+   kubectl logs -n default deployment/postgresql --tail=100
+   ```
+
+2. **Check Resource Usage**
+   ```bash
+   kubectl top pods -n default -l app.kubernetes.io/name=postgresql
+   df -h /var/lib/postgresql/data
+   ```
+
+3. **Identify Long-running Queries**
+   ```bash
+   kubectl exec -n default deployment/postgresql -- psql -U aitbc -c "SELECT pid, now() - pg_stat_activity.query_start AS duration, query FROM pg_stat_activity WHERE state = 'active' AND now() - pg_stat_activity.query_start > interval '5 minutes';"
+   ```
+
+### Recovery Actions
+1. **Kill Idle Connections**
+   ```bash
+   kubectl exec -n default deployment/postgresql -- psql -U aitbc -c "SELECT pg_terminate_backend(pid) FROM pg_stat_activity WHERE state = 'idle' AND query_start < now() - interval '1 hour';"
+   ```
+
+2. **Restart PostgreSQL**
+   ```bash
+   kubectl rollout restart deployment/postgresql -n default
+   ```
+
+3. **Failover to Replica**
+   ```bash
+   # Promote replica if primary fails
+   kubectl exec -n default deployment/postgresql-replica -- pg_ctl promote -D /var/lib/postgresql/data
+   ```
+
+### Verification
+```bash
+# Test database connectivity
+kubectl exec -n default deployment/coordinator -- python -c "import psycopg2; conn = psycopg2.connect('postgresql://aitbc:password@postgresql:5432/aitbc'); print('Connected')"
+
+# Check application health
+curl -f http://127.0.0.2:8011/v1/health
+```
+
+## Runbook: Redis Failure
+
+### Symptoms
+- Caching failures
+- Session loss
+- Increased database load
+- Slow response times
+
+### MTTR Target: 2 minutes
+
+### Immediate Actions (0-2 minutes)
+```bash
+# 1. Check Redis status
+kubectl exec -n default deployment/redis -- redis-cli ping
+
+# 2. Check memory usage
+kubectl exec -n default deployment/redis -- redis-cli info memory | grep used_memory_human
+
+# 3. Check connection count
+kubectl exec -n default deployment/redis -- redis-cli info clients | grep connected_clients
+```
+
+### Investigation (2-5 minutes)
+1. **Review Redis Logs**
+   ```bash
+   kubectl logs -n default deployment/redis --tail=100
+   ```
+
+2. **Check for Eviction**
+   ```bash
+   kubectl exec -n default deployment/redis -- redis-cli info stats | grep evicted_keys
+   ```
+
+3. **Identify Large Keys**
+   ```bash
+   kubectl exec -n default deployment/redis -- redis-cli --bigkeys
+   ```
+
+### Recovery Actions
+1. **Clear Expired Keys**
+   ```bash
+   kubectl exec -n default deployment/redis -- redis-cli --scan --pattern "*:*" | xargs redis-cli del
+   ```
+
+2. **Restart Redis**
+   ```bash
+   kubectl rollout restart deployment/redis -n default
+   ```
+
+3. **Scale Redis Cluster**
+   ```bash
+   kubectl scale deployment/redis --replicas=3 -n default
+   ```
+
+### Verification
+```bash
+# Test Redis connectivity
+kubectl exec -n default deployment/coordinator -- redis-cli -h redis ping
+
+# Check application performance
+curl -w "@curl-format.txt" -o /dev/null -s http://127.0.0.2:8011/v1/health
+```
+
+## Runbook: High CPU/Memory Usage
+
+### Symptoms
+- Slow response times
+- Pod evictions
+- OOM errors
+- System degradation
+
+### MTTR Target: 5 minutes
+
+### Immediate Actions (0-5 minutes)
+```bash
+# 1. Check resource usage
+kubectl top pods -n default
+kubectl top nodes
+
+# 2. Identify resource-hungry pods
+kubectl exec -n default deployment/coordinator -- top
+
+# 3. Check for OOM kills
+dmesg | grep -i "killed process"
+```
+
+### Investigation (5-15 minutes)
+1. **Analyze Resource Usage**
+   ```bash
+   # Detailed pod metrics
+   kubectl exec -n default deployment/coordinator -- ps aux --sort=-%cpu | head -10
+   kubectl exec -n default deployment/coordinator -- ps aux --sort=-%mem | head -10
+   ```
+
+2. **Check Resource Limits**
+   ```bash
+   kubectl describe pod -n default -l app.kubernetes.io/name=coordinator | grep -A 10 Limits
+   ```
+
+3. **Review Application Metrics**
+   ```bash
+   # Check Prometheus metrics
+   curl http://127.0.0.2:8011/metrics | grep -E "(cpu|memory)"
+   ```
+
+### Recovery Actions
+1. **Scale Services**
+   ```bash
+   kubectl scale deployment/coordinator --replicas=5 -n default
+   kubectl scale deployment/blockchain-node --replicas=3 -n default
+   ```
+
+2. **Increase Resource Limits**
+   ```bash
+   kubectl patch deployment coordinator -p '{"spec":{"template":{"spec":{"containers":[{"name":"coordinator","resources":{"limits":{"cpu":"2000m","memory":"4Gi"}}}]}}}}'
+   ```
+
+3. **Restart Affected Services**
+   ```bash
+   kubectl rollout restart deployment/coordinator -n default
+   ```
+
+### Verification
+```bash
+# Monitor resource usage
+watch -n 5 'kubectl top pods -n default'
+
+# Test service performance
+curl -w "@curl-format.txt" -o /dev/null -s http://127.0.0.2:8011/v1/health
+```
+
+## Runbook: Storage Issues
+
+### Symptoms
+- Disk space warnings
+- Write failures
+- Database errors
+- Pod crashes
+
+### MTTR Target: 10 minutes
+
+### Immediate Actions (0-10 minutes)
+```bash
+# 1. Check disk usage
+df -h
+kubectl exec -n default deployment/postgresql -- df -h
+
+# 2. Identify large files
+find /var/log -name "*.log" -size +100M
+kubectl exec -n default deployment/postgresql -- find /var/lib/postgresql -type f -size +1G
+
+# 3. Clean up logs
+kubectl logs -n default deployment/coordinator --tail=1000 > /tmp/coordinator.log && truncate -s 0 /var/log/containers/coordinator*.log
+```
+
+### Investigation (10-20 minutes)
+1. **Analyze Storage Usage**
+   ```bash
+   du -sh /var/log/*
+   du -sh /var/lib/docker/*
+   ```
+
+2. **Check PVC Usage**
+   ```bash
+   kubectl get pvc -n default
+   kubectl describe pvc postgresql-data -n default
+   ```
+
+3. **Review Retention Policies**
+   ```bash
+   kubectl get cronjobs -n default
+   kubectl describe cronjob log-cleanup -n default
+   ```
+
+### Recovery Actions
+1. **Expand Storage**
+   ```bash
+   kubectl patch pvc postgresql-data -p '{"spec":{"resources":{"requests":{"storage":"100Gi"}}}}'
+   ```
+
+2. **Force Cleanup**
+   ```bash
+   # Clean old logs
+   find /var/log -name "*.log" -mtime +7 -delete
+   
+   # Clean Docker images
+   docker system prune -a
+   ```
+
+3. **Restart Services**
+   ```bash
+   kubectl rollout restart deployment/postgresql -n default
+   ```
+
+### Verification
+```bash
+# Check disk space
+df -h
+
+# Verify database operations
+kubectl exec -n default deployment/postgresql -- psql -U aitbc -c "SELECT 1;"
+```
+
+## Emergency Contact Procedures
+
+### Escalation Matrix
+1. **Level 1**: On-call engineer (5 minutes)
+2. **Level 2**: On-call secondary (15 minutes)
+3. **Level 3**: Engineering manager (30 minutes)
+4. **Level 4**: CTO (1 hour, critical only)
+
+### War Room Activation
+```bash
+# Create Slack channel
+/slack create-channel #incident-$(date +%Y%m%d-%H%M%S)
+
+# Invite stakeholders
+/slack invite @sre-team @engineering-manager @cto
+
+# Start Zoom meeting
+/zoom start "AITBC Incident War Room"
+```
+
+### Customer Communication
+1. **Status Page Update** (5 minutes)
+2. **Email Notification** (15 minutes)
+3. **Twitter Update** (30 minutes, critical only)
+
+## Post-Incident Checklist
+
+### Immediate (0-1 hour)
+- [ ] Service fully restored
+- [ ] Monitoring normal
+- [ ] Status page updated
+- [ ] Stakeholders notified
+
+### Short-term (1-24 hours)
+- [ ] Incident document created
+- [ ] Root cause identified
+- [ ] Runbooks updated
+- [ ] Post-mortem scheduled
+
+### Long-term (1-7 days)
+- [ ] Post-mortem completed
+- [ ] Action items assigned
+- [ ] Monitoring improved
+- [ ] Process updated
+
+## Runbook Maintenance
+
+### Review Schedule
+- **Monthly**: Review and update runbooks
+- **Quarterly**: Full review and testing
+- **Annually**: Major revision
+
+### Update Process
+1. Test runbook procedures
+2. Document lessons learned
+3. Update procedures
+4. Train team members
+5. Update documentation
+
+---
+
+*Version: 1.0*
+*Last Updated: 2024-12-22*
+*Owner: SRE Team*
--- a/docs/operator/index.md
+++ b/docs/operator/index.md
@ -0,0 +1,40 @@
+# AITBC Operator Documentation
+
+Welcome to the AITBC operator documentation. This section contains resources for deploying, operating, and maintaining AITBC infrastructure.
+
+## Deployment
+
+- [Deployment Guide](deployment/run.md) - How to deploy AITBC components
+- [Installation](deployment/installation.md) - System requirements and installation
+- [Configuration](deployment/configuration.md) - Configuration options
+- [Ports](deployment/ports.md) - Network ports and requirements
+
+## Operations
+
+- [Backup & Restore](backup_restore.md) - Data backup and recovery procedures
+- [Security](security.md) - Security best practices and hardening
+- [Monitoring](monitoring/monitoring-playbook.md) - System monitoring and observability
+- [Incident Response](incident-runbooks.md) - Incident handling procedures
+
+## Architecture
+
+- [System Architecture](../reference/architecture/) - Understanding AITBC architecture
+- [Components](../reference/architecture/) - Component documentation
+- [Multi-tenancy](../reference/architecture/) - Multi-tenant infrastructure
+
+## Scaling
+
+- [Scaling Guide](scaling.md) - How to scale AITBC infrastructure
+- [Performance Tuning](performance.md) - Performance optimization
+- [Capacity Planning](capacity.md) - Resource planning
+
+## Reference
+
+- [Glossary](../reference/glossary.md) - Terms and definitions
+- [Troubleshooting](../user-guide/troubleshooting.md) - Common issues and solutions
+- [FAQ](../user-guide/faq.md) - Frequently asked questions
+
+## Support
+
+- [Getting Help](../user-guide/support.md) - How to get support
+- [Contact](../user-guide/support.md) - Contact information
--- a/docs/operator/monitoring/monitoring-playbook.md
+++ b/docs/operator/monitoring/monitoring-playbook.md
@ -0,0 +1,449 @@
+# AITBC Monitoring Playbook & On-Call Guide
+
+## Overview
+
+This document provides comprehensive monitoring procedures, on-call rotations, and incident response playbooks for the AITBC platform. It ensures reliable operation of all services and quick resolution of issues.
+
+## Service Overview
+
+### Core Services
+- **Coordinator API**: Job management and marketplace coordination
+- **Blockchain Nodes**: Consensus and transaction processing
+- **Explorer UI**: Block explorer and transaction visualization
+- **Marketplace UI**: User interface for marketplace operations
+- **Wallet Daemon**: Cryptographic key management
+- **Infrastructure**: PostgreSQL, Redis, Kubernetes cluster
+
+### Critical Metrics
+- **Availability**: 99.9% uptime SLA
+- **Performance**: <200ms API response time (95th percentile)
+- **Throughput**: 100+ TPS sustained
+- **MTTR**: <2 minutes for critical incidents
+
+## On-Call Rotation
+
+### Rotation Schedule
+- **Primary On-Call**: 1 week rotation, Monday 00:00 UTC to Monday 00:00 UTC
+- **Secondary On-Call**: Shadow primary, handles escalations
+- **Tertiary**: Backup for both primary and secondary
+- **Rotation Handoff**: Every Monday at 08:00 UTC
+
+### Team Structure
+```
+Week 1: Alice (Primary), Bob (Secondary), Carol (Tertiary)
+Week 2: Bob (Primary), Carol (Secondary), Alice (Tertiary)
+Week 3: Carol (Primary), Alice (Secondary), Bob (Tertiary)
+```
+
+### Handoff Procedures
+1. **Pre-handoff Check** (Sunday 22:00 UTC):
+   - Review active incidents
+   - Check scheduled maintenance
+   - Verify monitoring systems health
+
+2. **Handoff Meeting** (Monday 08:00 UTC):
+   - 15-minute video call
+   - Discuss current issues
+   - Transfer knowledge
+   - Confirm contact information
+
+3. **Post-handoff** (Monday 09:00 UTC):
+   - Primary acknowledges receipt
+   - Update on-call calendar
+   - Test alerting systems
+
+### Contact Information
+- **Primary**: +1-555-ONCALL-1 (PagerDuty)
+- **Secondary**: +1-555-ONCALL-2 (PagerDuty)
+- **Tertiary**: +1-555-ONCALL-3 (PagerDuty)
+- **Escalation Manager**: +1-555-ESCALATE
+- **Emergency**: +1-555-EMERGENCY (Critical infrastructure only)
+
+## Alerting & Escalation
+
+### Alert Severity Levels
+
+#### Critical (P0)
+- Service completely down
+- Data loss or corruption
+- Security breach
+- SLA violation in progress
+- **Response Time**: 5 minutes
+- **Escalation**: 15 minutes if no response
+
+#### High (P1)
+- Significant degradation
+- Partial service outage
+- High error rates (>10%)
+- **Response Time**: 15 minutes
+- **Escalation**: 1 hour if no response
+
+#### Medium (P2)
+- Minor degradation
+- Elevated error rates (5-10%)
+- Performance issues
+- **Response Time**: 1 hour
+- **Escalation**: 4 hours if no response
+
+#### Low (P3)
+- Informational alerts
+- Non-critical issues
+- **Response Time**: 4 hours
+- **Escalation**: 24 hours if no response
+
+### Escalation Policy
+1. **Level 1**: Primary On-Call (5-60 minutes)
+2. **Level 2**: Secondary On-Call (15 minutes - 4 hours)
+3. **Level 3**: Tertiary On-Call (1 hour - 24 hours)
+4. **Level 4**: Engineering Manager (4 hours)
+5. **Level 5**: CTO (Critical incidents only)
+
+### Alert Channels
+- **PagerDuty**: Primary alerting system
+- **Slack**: #on-call-aitbc channel
+- **Email**: oncall@aitbc.io
+- **SMS**: Critical alerts only
+- **Phone**: Critical incidents only
+
+## Incident Response
+
+### Incident Classification
+
+#### SEV-0 (Critical)
+- Complete service outage
+- Data loss or security breach
+- Financial impact >$10,000/hour
+- Customer impact >50%
+
+#### SEV-1 (High)
+- Significant service degradation
+- Feature unavailable
+- Financial impact $1,000-$10,000/hour
+- Customer impact 10-50%
+
+#### SEV-2 (Medium)
+- Minor service degradation
+- Performance issues
+- Financial impact <$1,000/hour
+- Customer impact <10%
+
+#### SEV-3 (Low)
+- Informational
+- No customer impact
+
+### Incident Response Process
+
+#### 1. Detection & Triage (0-5 minutes)
+```bash
+# Check alert severity
+# Verify impact
+# Create incident channel
+# Notify stakeholders
+```
+
+#### 2. Assessment (5-15 minutes)
+- Determine scope
+- Identify root cause area
+- Estimate resolution time
+- Declare severity level
+
+#### 3. Communication (15-30 minutes)
+- Update status page
+- Notify customers (if needed)
+- Internal stakeholder updates
+- Set up war room
+
+#### 4. Resolution (Varies)
+- Implement fix
+- Verify resolution
+- Monitor for recurrence
+- Document actions
+
+#### 5. Recovery (30-60 minutes)
+- Full service restoration
+- Performance validation
+- Customer communication
+- Incident closure
+
+## Service-Specific Runbooks
+
+### Coordinator API
+
+#### High Error Rate
+**Symptoms**: 5xx errors >5%, response time >500ms
+**Runbook**:
+1. Check pod health: `kubectl get pods -l app=coordinator`
+2. Review logs: `kubectl logs -f deployment/coordinator`
+3. Check database connectivity
+4. Verify Redis connection
+5. Scale if needed: `kubectl scale deployment coordinator --replicas=5`
+
+#### Service Unavailable
+**Symptoms**: 503 errors, health check failures
+**Runbook**:
+1. Check deployment status
+2. Review recent deployments
+3. Rollback if necessary
+4. Check resource limits
+5. Verify ingress configuration
+
+### Blockchain Nodes
+
+#### Consensus Stalled
+**Symptoms**: No new blocks, high finality latency
+**Runbook**:
+1. Check node sync status
+2. Verify network connectivity
+3. Review validator set
+4. Check governance proposals
+5. Restart if needed (with caution)
+
+#### High Peer Drop Rate
+**Symptoms**: Connected peers <50%, network partition
+**Runbook**:
+1. Check network policies
+2. Verify DNS resolution
+3. Review firewall rules
+4. Check load balancer health
+5. Restart networking components
+
+### Database (PostgreSQL)
+
+#### Connection Exhaustion
+**Symptoms**: "Too many connections" errors
+**Runbook**:
+1. Check active connections
+2. Identify long-running queries
+3. Kill idle connections
+4. Increase pool size if needed
+5. Scale database
+
+#### Replica Lag
+**Symptoms**: Read replica lag >10 seconds
+**Runbook**:
+1. Check replica status
+2. Review network latency
+3. Verify disk space
+4. Restart replication if needed
+5. Failover if necessary
+
+### Redis
+
+#### Memory Pressure
+**Symptoms**: OOM errors, high eviction rate
+**Runbook**:
+1. Check memory usage
+2. Review key expiration
+3. Clean up unused keys
+4. Scale Redis cluster
+5. Optimize data structures
+
+#### Connection Issues
+**Symptoms**: Connection timeouts, errors
+**Runbook**:
+1. Check max connections
+2. Review connection pool
+3. Verify network policies
+4. Restart Redis if needed
+5. Scale horizontally
+
+## Monitoring Dashboards
+
+### Primary Dashboards
+
+#### 1. System Overview
+- Service health status
+- Error rates (4xx/5xx)
+- Response times
+- Throughput metrics
+- Resource utilization
+
+#### 2. Infrastructure
+- Kubernetes cluster health
+- Node resource usage
+- Pod status and restarts
+- Network traffic
+- Storage capacity
+
+#### 3. Application Metrics
+- Job submission rates
+- Transaction processing
+- Marketplace activity
+- Wallet operations
+- Mining statistics
+
+#### 4. Business KPIs
+- Active users
+- Transaction volume
+- Revenue metrics
+- Customer satisfaction
+- SLA compliance
+
+### Alert Rules
+
+#### Critical Alerts
+- Service down >1 minute
+- Error rate >10%
+- Response time >1 second
+- Disk space >90%
+- Memory usage >95%
+
+#### Warning Alerts
+- Error rate >5%
+- Response time >500ms
+- CPU usage >80%
+- Queue depth >1000
+- Replica lag >5s
+
+## SLOs & SLIs
+
+### Service Level Objectives
+
+| Service | Metric | Target | Measurement |
+|---------|--------|--------|-------------|
+| Coordinator API | Availability | 99.9% | 30-day rolling |
+| Coordinator API | Latency | <200ms | 95th percentile |
+| Blockchain | Block Time | <2s | 24-hour average |
+| Marketplace | Success Rate | 99.5% | Daily |
+| Explorer | Response Time | <500ms | 95th percentile |
+
+### Service Level Indicators
+
+#### Availability
+- HTTP status codes
+- Health check responses
+- Pod readiness status
+
+#### Latency
+- Request duration histogram
+- Database query times
+- External API calls
+
+#### Throughput
+- Requests per second
+- Transactions per block
+- Jobs completed per hour
+
+#### Quality
+- Error rates
+- Success rates
+- Customer satisfaction
+
+## Post-Incident Process
+
+### Immediate Actions (0-1 hour)
+1. Verify full resolution
+2. Monitor for recurrence
+3. Update status page
+4. Notify stakeholders
+
+### Post-Mortem (1-24 hours)
+1. Create incident document
+2. Gather timeline and logs
+3. Identify root cause
+4. Document lessons learned
+
+### Follow-up (1-7 days)
+1. Schedule post-mortem meeting
+2. Assign action items
+3. Update runbooks
+4. Improve monitoring
+
+### Review (Weekly)
+1. Review incident trends
+2. Update SLOs if needed
+3. Adjust alerting thresholds
+4. Improve processes
+
+## Maintenance Windows
+
+### Scheduled Maintenance
+- **Frequency**: Weekly maintenance window
+- **Time**: Sunday 02:00-04:00 UTC
+- **Duration**: Maximum 2 hours
+- **Notification**: 72 hours advance
+
+### Emergency Maintenance
+- **Approval**: Engineering Manager required
+- **Notification**: 4 hours advance (if possible)
+- **Duration**: As needed
+- **Rollback**: Always required
+
+## Tools & Systems
+
+### Monitoring Stack
+- **Prometheus**: Metrics collection
+- **Grafana**: Visualization and dashboards
+- **Alertmanager**: Alert routing and management
+- **PagerDuty**: On-call scheduling and escalation
+
+### Observability
+- **Jaeger**: Distributed tracing
+- **Loki**: Log aggregation
+- **Kiali**: Service mesh visualization
+- **Kube-state-metrics**: Kubernetes metrics
+
+### Communication
+- **Slack**: Primary communication
+- **Zoom**: War room meetings
+- **Status Page**: Customer notifications
+- **Email**: Formal communications
+
+## Training & Onboarding
+
+### New On-Call Engineer
+1. Shadow primary for 1 week
+2. Review all runbooks
+3. Test alerting systems
+4. Handle low-severity incidents
+5. Solo on-call with mentor
+
+### Ongoing Training
+- Monthly incident drills
+- Quarterly runbook updates
+- Annual training refreshers
+- Cross-team knowledge sharing
+
+## Emergency Procedures
+
+### Major Outage
+1. Declare incident (SEV-0)
+2. Activate war room
+3. Customer communication
+4. Executive updates
+5. Recovery coordination
+
+### Security Incident
+1. Isolate affected systems
+2. Preserve evidence
+3. Notify security team
+4. Customer notification
+5. Regulatory compliance
+
+### Data Loss
+1. Stop affected services
+2. Assess impact
+3. Initiate recovery
+4. Customer communication
+5. Prevent recurrence
+
+## Appendix
+
+### A. Contact List
+[Detailed contact information]
+
+### B. Runbook Checklist
+[Quick reference checklists]
+
+### C. Alert Configuration
+[Prometheus rules and thresholds]
+
+### D. Dashboard Links
+[Grafana dashboard URLs]
+
+---
+
+*Document Version: 1.0*
+*Last Updated: 2024-12-22*
+*Next Review: 2025-01-22*
+*Owner: SRE Team*
--- a/docs/operator/security.md
+++ b/docs/operator/security.md
@ -0,0 +1,340 @@
+# AITBC Security Documentation
+
+This document outlines the security architecture, threat model, and implementation details for the AITBC platform.
+
+## Overview
+
+AITBC implements defense-in-depth security across multiple layers:
+- Network security with TLS termination
+- API authentication and authorization
+- Secrets management and encryption
+- Infrastructure security best practices
+- Monitoring and incident response
+
+## Threat Model
+
+### Threat Actors
+
+| Actor | Motivation | Capabilities | Impact |
+|-------|-----------|--------------|--------|
+| External attacker | Financial gain, disruption | Network access, exploits | High |
+| Malicious insider | Data theft, sabotage | Internal access | Critical |
+| Competitor | IP theft, market manipulation | Sophisticated attacks | High |
+| Casual user | Accidental misuse | Limited knowledge | Low |
+
+### Attack Vectors
+
+1. **Network Attacks**
+   - Man-in-the-middle (MITM) attacks
+   - DDoS attacks
+   - Network reconnaissance
+
+2. **API Attacks**
+   - Unauthorized access to marketplace
+   - API key leakage
+   - Rate limiting bypass
+   - Injection attacks
+
+3. **Infrastructure Attacks**
+   - Container escape
+   - Pod-to-pod attacks
+   - Secrets exfiltration
+   - Supply chain attacks
+
+4. **Blockchain-Specific Attacks**
+   - 51% attacks on consensus
+   - Transaction replay attacks
+   - Smart contract exploits
+   - Miner collusion
+
+### Security Controls
+
+| Control | Implementation | Mitigates |
+|---------|----------------|-----------|
+| TLS 1.3 | cert-manager + ingress | MITM, eavesdropping |
+| API Keys | X-API-Key header | Unauthorized access |
+| Rate Limiting | slowapi middleware | DDoS, abuse |
+| Network Policies | Kubernetes NetworkPolicy | Pod-to-pod attacks |
+| Secrets Mgmt | Kubernetes Secrets + SealedSecrets | Secrets exfiltration |
+| RBAC | Kubernetes RBAC | Privilege escalation |
+| Monitoring | Prometheus + AlertManager | Incident detection |
+
+## Security Architecture
+
+### Network Security
+
+#### TLS Termination
+```yaml
+# Ingress configuration with TLS
+apiVersion: networking.k8s.io/v1
+kind: Ingress
+metadata:
+  annotations:
+    cert-manager.io/cluster-issuer: letsencrypt-prod
+    nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.3"
+spec:
+  tls:
+  - hosts:
+    - api.aitbc.io
+    secretName: api-tls
+```
+
+#### Certificate Management
+- Uses cert-manager for automatic certificate provisioning
+- Supports Let's Encrypt for production
+- Internal CA for development environments
+- Automatic renewal 30 days before expiry
+
+### API Security
+
+#### Authentication
+- API key-based authentication for all services
+- Keys stored in Kubernetes Secrets
+- Per-service key rotation policies
+- Audit logging for all authenticated requests
+
+#### Authorization
+- Role-based access control (RBAC)
+- Resource-level permissions
+- Rate limiting per API key
+- IP whitelisting for sensitive operations
+
+#### API Key Format
+```
+Header: X-API-Key: aitbc_prod_ak_1a2b3c4d5e6f7g8h9i0j
+```
+
+### Secrets Management
+
+#### Kubernetes Secrets
+- Base64 encoded secrets (not encrypted by default)
+- Encrypted at rest with etcd encryption
+- Access controlled via RBAC
+
+#### SealedSecrets (Recommended for Production)
+- Client-side encryption of secrets
+- GitOps friendly
+- Zero-knowledge encryption
+
+#### Secret Rotation
+- Automated rotation every 90 days
+- Zero-downtime rotation for services
+- Audit trail of all rotations
+
+## Implementation Details
+
+### 1. TLS Configuration
+
+#### Coordinator API
+```yaml
+# Helm values for coordinator
+ingress:
+  enabled: true
+  annotations:
+    cert-manager.io/cluster-issuer: letsencrypt-prod
+    nginx.ingress.kubernetes.io/force-ssl-redirect: "true"
+    nginx.ingress.kubernetes.io/ssl-protocols: "TLSv1.3"
+  tls:
+    - secretName: coordinator-tls
+      hosts:
+        - api.aitbc.io
+```
+
+#### Blockchain Node RPC
+```yaml
+# WebSocket with TLS
+wss://api.aitbc.io:8080/ws
+```
+
+### 2. API Authentication Middleware
+
+#### Coordinator API Implementation
+```python
+from fastapi import Security, HTTPException
+from fastapi.security import APIKeyHeader
+
+api_key_header = APIKeyHeader(name="X-API-Key", auto_error=True)
+
+async def verify_api_key(api_key: str = Security(api_key_header)):
+    if not verify_key(api_key):
+        raise HTTPException(status_code=403, detail="Invalid API key")
+    return api_key
+
+@app.middleware("http")
+async def auth_middleware(request: Request, call_next):
+    if request.url.path.startswith("/v1/"):
+        api_key = request.headers.get("X-API-Key")
+        if not verify_key(api_key):
+            raise HTTPException(status_code=403, detail="API key required")
+    response = await call_next(request)
+    return response
+```
+
+### 3. Secrets Management Setup
+
+#### SealedSecrets Installation
+```bash
+# Install sealed-secrets controller
+helm repo add sealed-secrets https://bitnami-labs.github.io/sealed-secrets
+helm install sealed-secrets sealed-secrets/sealed-secrets -n kube-system
+
+# Create a sealed secret
+kubeseal --format yaml < secret.yaml > sealed-secret.yaml
+```
+
+#### Example Secret Structure
+```yaml
+apiVersion: bitnami.com/v1alpha1
+kind: SealedSecret
+metadata:
+  name: coordinator-api-keys
+spec:
+  encryptedData:
+    api-key-prod: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEQAx...
+    api-key-dev: AgBy3i4OJSWK+PiTySYZZA9rO43cGDEQAx...
+```
+
+### 4. Network Policies
+
+#### Default Deny Policy
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: default-deny-all
+spec:
+  podSelector: {}
+  policyTypes:
+  - Ingress
+  - Egress
+```
+
+#### Service-Specific Policies
+```yaml
+apiVersion: networking.k8s.io/v1
+kind: NetworkPolicy
+metadata:
+  name: coordinator-api-netpol
+spec:
+  podSelector:
+    matchLabels:
+      app: coordinator-api
+  policyTypes:
+  - Ingress
+  - Egress
+  ingress:
+  - from:
+    - podSelector:
+        matchLabels:
+          app: ingress-nginx
+    ports:
+    - protocol: TCP
+      port: 8011
+```
+
+## Security Best Practices
+
+### Development Environment
+- Use 127.0.0.2 for local development (not 0.0.0.0)
+- Separate API keys for dev/staging/prod
+- Enable debug logging only in development
+- Use self-signed certificates for local TLS
+
+### Production Environment
+- Enable all security headers
+- Implement comprehensive logging
+- Use external secret management
+- Regular security audits
+- Penetration testing quarterly
+
+### Monitoring and Alerting
+
+#### Security Metrics
+- Failed authentication attempts
+- Unusual API usage patterns
+- Certificate expiry warnings
+- Secret access audits
+
+#### Alert Rules
+```yaml
+- alert: HighAuthFailureRate
+  expr: rate(auth_failures_total[5m]) > 10
+  for: 2m
+  labels:
+    severity: warning
+  annotations:
+    summary: "High authentication failure rate detected"
+
+- alert: CertificateExpiringSoon
+  expr: cert_certificate_expiry_time < time() + 86400 * 7
+  for: 1h
+  labels:
+    severity: critical
+  annotations:
+    summary: "Certificate expires in less than 7 days"
+```
+
+## Incident Response
+
+### Security Incident Categories
+1. **Critical**: Data breach, system compromise
+2. **High**: Service disruption, privilege escalation
+3. **Medium**: Suspicious activity, policy violation
+4. **Low**: Misconfiguration, minor issue
+
+### Response Procedures
+1. **Detection**: Automated alerts, manual monitoring
+2. **Assessment**: Impact analysis, containment
+3. **Remediation**: Patch, rotate credentials, restore
+4. **Post-mortem**: Document, improve controls
+
+### Emergency Contacts
+- Security Team: security@aitbc.io
+- On-call Engineer: +1-555-SECURITY
+- Incident Commander: incident@aitbc.io
+
+## Compliance
+
+### Data Protection
+- GDPR compliance for EU users
+- CCPA compliance for California users
+- Data retention policies
+- Right to deletion implementation
+
+### Auditing
+- Quarterly security audits
+- Annual penetration testing
+- Continuous vulnerability scanning
+- Third-party security assessments
+
+## Security Checklist
+
+### Pre-deployment
+- [ ] All API endpoints require authentication
+- [ ] TLS certificates valid and properly configured
+- [ ] Secrets encrypted and access-controlled
+- [ ] Network policies implemented
+- [ ] RBAC configured correctly
+- [ ] Monitoring and alerting active
+- [ ] Backup encryption enabled
+- [ ] Security headers configured
+
+### Post-deployment
+- [ ] Security testing completed
+- [ ] Documentation updated
+- [ ] Team trained on procedures
+- [ ] Incident response tested
+- [ ] Compliance verified
+
+## References
+
+- [OWASP API Security Top 10](https://owasp.org/www-project-api-security/)
+- [Kubernetes Security Best Practices](https://kubernetes.io/docs/concepts/security/)
+- [NIST Cybersecurity Framework](https://www.nist.gov/cyberframework)
+- [CERT Coordination Center](https://www.cert.org/)
+
+## Security Updates
+
+This document is updated regularly. Last updated: 2024-12-22
+
+For questions or concerns, contact the security team at security@aitbc.io