- Add Prometheus metrics for marketplace API throughput and error rates with new dashboard panels - Implement confidential transaction models with encryption support and access control - Add key management system with registration, rotation, and audit logging - Create services and registry routers for service discovery and management - Integrate ZK proof generation for privacy-preserving receipts - Add metrics instru
4.6 KiB
4.6 KiB
AITBC Infrastructure Templates
This directory contains Terraform and Helm templates for deploying AITBC services across dev, staging, and production environments.
Directory Structure
infra/
├── terraform/ # Infrastructure as Code
│ ├── modules/ # Reusable Terraform modules
│ │ └── kubernetes/ # EKS cluster module
│ └── environments/ # Environment-specific configurations
│ ├── dev/
│ ├── staging/
│ └── prod/
└── helm/ # Helm Charts
├── charts/ # Application charts
│ ├── coordinator/ # Coordinator API chart
│ ├── blockchain-node/ # Blockchain node chart
│ └── monitoring/ # Monitoring stack (Prometheus, Grafana)
└── values/ # Environment-specific values
├── dev.yaml
├── staging.yaml
└── prod.yaml
Quick Start
Prerequisites
- Terraform >= 1.0
- Helm >= 3.0
- kubectl configured for your cluster
- AWS CLI configured (for EKS)
Deploy Development Environment
-
Provision Infrastructure with Terraform:
cd infra/terraform/environments/dev terraform init terraform apply -
Configure kubectl:
aws eks update-kubeconfig --name aitbc-dev --region us-west-2 -
Deploy Applications with Helm:
# Add required Helm repositories helm repo add bitnami https://charts.bitnami.com/bitnami helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo add grafana https://grafana.github.io/helm-charts helm repo update # Deploy monitoring stack helm install monitoring ../../helm/charts/monitoring -f ../../helm/values/dev.yaml # Deploy coordinator API helm install coordinator ../../helm/charts/coordinator -f ../../helm/values/dev.yaml
Environment Configurations
Development
- 1 replica per service
- Minimal resource allocation
- Public EKS endpoint enabled
- 7-day metrics retention
Staging
- 2-3 replicas per service
- Moderate resource allocation
- Autoscaling enabled
- 30-day metrics retention
- TLS with staging certificates
Production
- 3+ replicas per service
- High resource allocation
- Full autoscaling configuration
- 90-day metrics retention
- TLS with production certificates
- Network policies enabled
- Backup configuration enabled
Monitoring
The monitoring stack includes:
- Prometheus: Metrics collection and storage
- Grafana: Visualization dashboards
- AlertManager: Alert routing and notification
Access Grafana:
kubectl port-forward svc/monitoring-grafana 3000:3000
# Open http://localhost:3000
# Default credentials: admin/admin (check values files for environment-specific passwords)
Scaling Guidelines
Based on benchmark results (apps/blockchain-node/scripts/benchmark_throughput.py):
- Coordinator API: Scale horizontally at ~500 TPS per node
- Blockchain Node: Scale horizontally at ~1000 TPS per node
- Wallet Daemon: Scale based on concurrent users
Security Considerations
- Private subnets for all application workloads
- Network policies restrict traffic between services
- Secrets managed via Kubernetes Secrets
- TLS termination at ingress level
- Pod Security Policies enforced in production
Backup and Recovery
- Automated daily backups of PostgreSQL databases
- EBS snapshots for persistent volumes
- Cross-region replication for production data
- Restore procedures documented in runbooks
Cost Optimization
- Use Spot instances for non-critical workloads
- Implement cluster autoscaling
- Right-size resources based on metrics
- Schedule non-production environments to run only during business hours
Troubleshooting
Common issues and solutions:
-
Helm chart fails to install:
- Check if all dependencies are added
- Verify kubectl context is correct
- Review values files for syntax errors
-
Prometheus not scraping metrics:
- Verify ServiceMonitor CRDs are installed
- Check service annotations
- Review network policies
-
High memory usage:
- Review resource limits in values files
- Check for memory leaks in applications
- Consider increasing node size
Contributing
When adding new services:
- Create a new Helm chart in
helm/charts/ - Add environment-specific values in
helm/values/ - Update monitoring configuration to include new service metrics
- Document any special requirements in this README