# AITBC Infrastructure Templates This directory contains Terraform and Helm templates for deploying AITBC services across dev, staging, and production environments. ## Directory Structure ``` infra/ ├── terraform/ # Infrastructure as Code │ ├── modules/ # Reusable Terraform modules │ │ └── kubernetes/ # EKS cluster module │ └── environments/ # Environment-specific configurations │ ├── dev/ │ ├── staging/ │ └── prod/ └── helm/ # Helm Charts ├── charts/ # Application charts │ ├── coordinator/ # Coordinator API chart │ ├── blockchain-node/ # Blockchain node chart │ └── monitoring/ # Monitoring stack (Prometheus, Grafana) └── values/ # Environment-specific values ├── dev.yaml ├── staging.yaml └── prod.yaml ``` ## Quick Start ### Prerequisites - Terraform >= 1.0 - Helm >= 3.0 - kubectl configured for your cluster - AWS CLI configured (for EKS) ### Deploy Development Environment 1. **Provision Infrastructure with Terraform:** ```bash cd infra/terraform/environments/dev terraform init terraform apply ``` 2. **Configure kubectl:** ```bash aws eks update-kubeconfig --name aitbc-dev --region us-west-2 ``` 3. **Deploy Applications with Helm:** ```bash # Add required Helm repositories helm repo add bitnami https://charts.bitnami.com/bitnami helm repo add prometheus-community https://prometheus-community.github.io/helm-charts helm repo add grafana https://grafana.github.io/helm-charts helm repo update # Deploy monitoring stack helm install monitoring ../../helm/charts/monitoring -f ../../helm/values/dev.yaml # Deploy coordinator API helm install coordinator ../../helm/charts/coordinator -f ../../helm/values/dev.yaml ``` ### Environment Configurations #### Development - 1 replica per service - Minimal resource allocation - Public EKS endpoint enabled - 7-day metrics retention #### Staging - 2-3 replicas per service - Moderate resource allocation - Autoscaling enabled - 30-day metrics retention - TLS with staging certificates #### Production - 3+ replicas per service - High resource allocation - Full autoscaling configuration - 90-day metrics retention - TLS with production certificates - Network policies enabled - Backup configuration enabled ## Monitoring The monitoring stack includes: - **Prometheus**: Metrics collection and storage - **Grafana**: Visualization dashboards - **AlertManager**: Alert routing and notification Access Grafana: ```bash kubectl port-forward svc/monitoring-grafana 3000:3000 # Open http://localhost:3000 # Default credentials: admin/admin (check values files for environment-specific passwords) ``` ## Scaling Guidelines Based on benchmark results (`apps/blockchain-node/scripts/benchmark_throughput.py`): - **Coordinator API**: Scale horizontally at ~500 TPS per node - **Blockchain Node**: Scale horizontally at ~1000 TPS per node - **Wallet Daemon**: Scale based on concurrent users ## Security Considerations - Private subnets for all application workloads - Network policies restrict traffic between services - Secrets managed via Kubernetes Secrets - TLS termination at ingress level - Pod Security Policies enforced in production ## Backup and Recovery - Automated daily backups of PostgreSQL databases - EBS snapshots for persistent volumes - Cross-region replication for production data - Restore procedures documented in runbooks ## Cost Optimization - Use Spot instances for non-critical workloads - Implement cluster autoscaling - Right-size resources based on metrics - Schedule non-production environments to run only during business hours ## Troubleshooting Common issues and solutions: 1. **Helm chart fails to install:** - Check if all dependencies are added - Verify kubectl context is correct - Review values files for syntax errors 2. **Prometheus not scraping metrics:** - Verify ServiceMonitor CRDs are installed - Check service annotations - Review network policies 3. **High memory usage:** - Review resource limits in values files - Check for memory leaks in applications - Consider increasing node size ## Contributing When adding new services: 1. Create a new Helm chart in `helm/charts/` 2. Add environment-specific values in `helm/values/` 3. Update monitoring configuration to include new service metrics 4. Document any special requirements in this README