feat: add marketplace metrics, privacy features, and service registry endpoints

- Add Prometheus metrics for marketplace API throughput and error rates with new dashboard panels - Implement confidential transaction models with encryption support and access control - Add key management system with registration, rotation, and audit logging - Create services and registry routers for service discovery and management - Integrate ZK proof generation for privacy-preserving receipts - Add metrics instru
2025-12-22 10:33:23 +01:00
parent d98b2c7772
commit c8be9d7414
260 changed files with 59033 additions and 351 deletions
--- a/infra/README.md
+++ b/infra/README.md
@ -0,0 +1,158 @@
+# AITBC Infrastructure Templates
+
+This directory contains Terraform and Helm templates for deploying AITBC services across dev, staging, and production environments.
+
+## Directory Structure
+
+```
+infra/
+├── terraform/                 # Infrastructure as Code
+│   ├── modules/              # Reusable Terraform modules
+│   │   └── kubernetes/       # EKS cluster module
+│   └── environments/         # Environment-specific configurations
+│       ├── dev/
+│       ├── staging/
+│       └── prod/
+└── helm/                     # Helm Charts
+    ├── charts/               # Application charts
+    │   ├── coordinator/      # Coordinator API chart
+    │   ├── blockchain-node/  # Blockchain node chart
+    │   └── monitoring/       # Monitoring stack (Prometheus, Grafana)
+    └── values/               # Environment-specific values
+        ├── dev.yaml
+        ├── staging.yaml
+        └── prod.yaml
+```
+
+## Quick Start
+
+### Prerequisites
+
+- Terraform >= 1.0
+- Helm >= 3.0
+- kubectl configured for your cluster
+- AWS CLI configured (for EKS)
+
+### Deploy Development Environment
+
+1. **Provision Infrastructure with Terraform:**
+   ```bash
+   cd infra/terraform/environments/dev
+   terraform init
+   terraform apply
+   ```
+
+2. **Configure kubectl:**
+   ```bash
+   aws eks update-kubeconfig --name aitbc-dev --region us-west-2
+   ```
+
+3. **Deploy Applications with Helm:**
+   ```bash
+   # Add required Helm repositories
+   helm repo add bitnami https://charts.bitnami.com/bitnami
+   helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
+   helm repo add grafana https://grafana.github.io/helm-charts
+   helm repo update
+   
+   # Deploy monitoring stack
+   helm install monitoring ../../helm/charts/monitoring -f ../../helm/values/dev.yaml
+   
+   # Deploy coordinator API
+   helm install coordinator ../../helm/charts/coordinator -f ../../helm/values/dev.yaml
+   ```
+
+### Environment Configurations
+
+#### Development
+- 1 replica per service
+- Minimal resource allocation
+- Public EKS endpoint enabled
+- 7-day metrics retention
+
+#### Staging
+- 2-3 replicas per service
+- Moderate resource allocation
+- Autoscaling enabled
+- 30-day metrics retention
+- TLS with staging certificates
+
+#### Production
+- 3+ replicas per service
+- High resource allocation
+- Full autoscaling configuration
+- 90-day metrics retention
+- TLS with production certificates
+- Network policies enabled
+- Backup configuration enabled
+
+## Monitoring
+
+The monitoring stack includes:
+- **Prometheus**: Metrics collection and storage
+- **Grafana**: Visualization dashboards
+- **AlertManager**: Alert routing and notification
+
+Access Grafana:
+```bash
+kubectl port-forward svc/monitoring-grafana 3000:3000
+# Open http://localhost:3000
+# Default credentials: admin/admin (check values files for environment-specific passwords)
+```
+
+## Scaling Guidelines
+
+Based on benchmark results (`apps/blockchain-node/scripts/benchmark_throughput.py`):
+
+- **Coordinator API**: Scale horizontally at ~500 TPS per node
+- **Blockchain Node**: Scale horizontally at ~1000 TPS per node
+- **Wallet Daemon**: Scale based on concurrent users
+
+## Security Considerations
+
+- Private subnets for all application workloads
+- Network policies restrict traffic between services
+- Secrets managed via Kubernetes Secrets
+- TLS termination at ingress level
+- Pod Security Policies enforced in production
+
+## Backup and Recovery
+
+- Automated daily backups of PostgreSQL databases
+- EBS snapshots for persistent volumes
+- Cross-region replication for production data
+- Restore procedures documented in runbooks
+
+## Cost Optimization
+
+- Use Spot instances for non-critical workloads
+- Implement cluster autoscaling
+- Right-size resources based on metrics
+- Schedule non-production environments to run only during business hours
+
+## Troubleshooting
+
+Common issues and solutions:
+
+1. **Helm chart fails to install:**
+   - Check if all dependencies are added
+   - Verify kubectl context is correct
+   - Review values files for syntax errors
+
+2. **Prometheus not scraping metrics:**
+   - Verify ServiceMonitor CRDs are installed
+   - Check service annotations
+   - Review network policies
+
+3. **High memory usage:**
+   - Review resource limits in values files
+   - Check for memory leaks in applications
+   - Consider increasing node size
+
+## Contributing
+
+When adding new services:
+1. Create a new Helm chart in `helm/charts/`
+2. Add environment-specific values in `helm/values/`
+3. Update monitoring configuration to include new service metrics
+4. Document any special requirements in this README