Files
aitbc/.windsurf/skills/deploy-production/rollback-steps.md
oib 9b9c5beb23 ```
chore: enhance .gitignore and remove obsolete documentation files

- Reorganize .gitignore with categorized sections for better maintainability
- Add comprehensive ignore patterns for Python, Node.js, databases, logs, and build artifacts
- Add project-specific ignore rules for coordinator, explorer, and deployment files
- Remove outdated documentation: BITCOIN-WALLET-SETUP.md, LOCAL_ASSETS_SUMMARY.md, README-CONTAINER-DEPLOYMENT.md, README-DOMAIN-DEPLOYMENT.md
```
2026-01-24 14:44:51 +01:00

188 lines
4.1 KiB
Markdown

# Production Rollback Procedures
## Emergency Rollback Guide
Use these procedures when a deployment causes critical issues in production.
### Immediate Actions (First 5 minutes)
1. **Assess the Impact**
- Check monitoring dashboards
- Review error logs
- Identify affected services
- Determine if rollback is necessary
2. **Communicate**
- Notify team in #production-alerts
- Post status on status page if needed
- Document start time of incident
### Automated Rollback (if available)
```bash
# Quick rollback to previous version
./scripts/rollback-to-previous.sh
# Rollback to specific version
./scripts/rollback-to-version.sh v1.2.3
```
### Manual Rollback Steps
#### 1. Stop Current Services
```bash
# Stop all AITBC services
sudo systemctl stop aitbc-coordinator
sudo systemctl stop aitbc-node
sudo systemctl stop aitbc-miner
sudo systemctl stop aitbc-dashboard
sudo docker-compose down
```
#### 2. Restore Previous Code
```bash
# Get previous deployment tag
git tag --sort=-version:refname | head -n 5
# Checkout previous stable version
git checkout v1.2.3
# Rebuild if necessary
docker-compose build --no-cache
```
#### 3. Restore Database (if needed)
```bash
# List available backups
aws s3 ls s3://aitbc-backups/database/
# Restore latest backup
pg_restore -h localhost -U postgres -d aitbc_prod latest_backup.dump
```
#### 4. Restore Configuration
```bash
# Restore from backup
cp /etc/aitbc/backup/config.yaml /etc/aitbc/config.yaml
cp /etc/aitbc/backup/.env /etc/aitbc/.env
```
#### 5. Restart Services
```bash
# Start services in correct order
sudo systemctl start aitbc-coordinator
sleep 10
sudo systemctl start aitbc-node
sleep 10
sudo systemctl start aitbc-miner
sleep 10
sudo systemctl start aitbc-dashboard
```
#### 6. Verify Rollback
```bash
# Check service status
./scripts/health-check.sh
# Run smoke tests
./scripts/smoke-test.sh
# Verify blockchain sync
curl -X POST http://localhost:8545 -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'
```
### Database-Specific Rollbacks
#### Partial Data Rollback
```bash
# Create backup before changes
pg_dump -h localhost -U postgres aitbc_prod > pre-rollback-backup.sql
# Rollback specific tables
psql -h localhost -U postgres -d aitbc_prod < rollback-tables.sql
```
#### Migration Rollback
```bash
# Check migration status
./scripts/migration-status.sh
# Rollback last migration
./scripts/rollback-migration.sh
```
### Service-Specific Rollbacks
#### Coordinator Service
```bash
# Restore coordinator state
sudo systemctl stop aitbc-coordinator
cp /var/lib/aitbc/coordinator/backup/state.db /var/lib/aitbc/coordinator/
sudo systemctl start aitbc-coordinator
```
#### Blockchain Node
```bash
# Reset to last stable block
sudo systemctl stop aitbc-node
aitbc-node --reset-to-block 123456
sudo systemctl start aitbc-node
```
#### Mining Operations
```bash
# Stop mining immediately
curl -X POST http://localhost:8080/api/mining/stop
# Reset mining state
redis-cli FLUSHDB
```
### Verification Checklist
- [ ] All services running
- [ ] Database connectivity
- [ ] API endpoints responding
- [ ] Blockchain syncing
- [ ] Mining operations (if applicable)
- [ ] Dashboard accessible
- [ ] SSL certificates valid
- [ ] Monitoring alerts cleared
### Post-Rollback Actions
1. **Root Cause Analysis**
- Document what went wrong
- Identify failure point
- Create prevention plan
2. **Team Communication**
- Update incident ticket
- Share lessons learned
- Update runbooks
3. **Preventive Measures**
- Add additional tests
- Improve monitoring
- Update deployment checklist
### Contact Information
- **On-call Engineer**: [Phone/Slack]
- **Engineering Lead**: [Phone/Slack]
- **DevOps Team**: #devops-alerts
- **Management**: #management-alerts
### Escalation
1. **Level 1**: On-call engineer (first 15 minutes)
2. **Level 2**: Engineering lead (after 15 minutes)
3. **Level 3**: CTO (after 30 minutes)
### Notes
- Always create a backup before rollback
- Document every step during rollback
- Test in staging before production if possible
- Keep stakeholders informed throughout process