Files
aitbc/.windsurf/skills/deploy-production/rollback-steps.md
oib 9b9c5beb23 ```
chore: enhance .gitignore and remove obsolete documentation files

- Reorganize .gitignore with categorized sections for better maintainability
- Add comprehensive ignore patterns for Python, Node.js, databases, logs, and build artifacts
- Add project-specific ignore rules for coordinator, explorer, and deployment files
- Remove outdated documentation: BITCOIN-WALLET-SETUP.md, LOCAL_ASSETS_SUMMARY.md, README-CONTAINER-DEPLOYMENT.md, README-DOMAIN-DEPLOYMENT.md
```
2026-01-24 14:44:51 +01:00

4.1 KiB

Production Rollback Procedures

Emergency Rollback Guide

Use these procedures when a deployment causes critical issues in production.

Immediate Actions (First 5 minutes)

  1. Assess the Impact

    • Check monitoring dashboards
    • Review error logs
    • Identify affected services
    • Determine if rollback is necessary
  2. Communicate

    • Notify team in #production-alerts
    • Post status on status page if needed
    • Document start time of incident

Automated Rollback (if available)

# Quick rollback to previous version
./scripts/rollback-to-previous.sh

# Rollback to specific version
./scripts/rollback-to-version.sh v1.2.3

Manual Rollback Steps

1. Stop Current Services

# Stop all AITBC services
sudo systemctl stop aitbc-coordinator
sudo systemctl stop aitbc-node
sudo systemctl stop aitbc-miner
sudo systemctl stop aitbc-dashboard
sudo docker-compose down

2. Restore Previous Code

# Get previous deployment tag
git tag --sort=-version:refname | head -n 5

# Checkout previous stable version
git checkout v1.2.3

# Rebuild if necessary
docker-compose build --no-cache

3. Restore Database (if needed)

# List available backups
aws s3 ls s3://aitbc-backups/database/

# Restore latest backup
pg_restore -h localhost -U postgres -d aitbc_prod latest_backup.dump

4. Restore Configuration

# Restore from backup
cp /etc/aitbc/backup/config.yaml /etc/aitbc/config.yaml
cp /etc/aitbc/backup/.env /etc/aitbc/.env

5. Restart Services

# Start services in correct order
sudo systemctl start aitbc-coordinator
sleep 10
sudo systemctl start aitbc-node
sleep 10
sudo systemctl start aitbc-miner
sleep 10
sudo systemctl start aitbc-dashboard

6. Verify Rollback

# Check service status
./scripts/health-check.sh

# Run smoke tests
./scripts/smoke-test.sh

# Verify blockchain sync
curl -X POST http://localhost:8545 -H "Content-Type: application/json" -d '{"jsonrpc":"2.0","method":"eth_syncing","params":[],"id":1}'

Database-Specific Rollbacks

Partial Data Rollback

# Create backup before changes
pg_dump -h localhost -U postgres aitbc_prod > pre-rollback-backup.sql

# Rollback specific tables
psql -h localhost -U postgres -d aitbc_prod < rollback-tables.sql

Migration Rollback

# Check migration status
./scripts/migration-status.sh

# Rollback last migration
./scripts/rollback-migration.sh

Service-Specific Rollbacks

Coordinator Service

# Restore coordinator state
sudo systemctl stop aitbc-coordinator
cp /var/lib/aitbc/coordinator/backup/state.db /var/lib/aitbc/coordinator/
sudo systemctl start aitbc-coordinator

Blockchain Node

# Reset to last stable block
sudo systemctl stop aitbc-node
aitbc-node --reset-to-block 123456
sudo systemctl start aitbc-node

Mining Operations

# Stop mining immediately
curl -X POST http://localhost:8080/api/mining/stop

# Reset mining state
redis-cli FLUSHDB

Verification Checklist

  • All services running
  • Database connectivity
  • API endpoints responding
  • Blockchain syncing
  • Mining operations (if applicable)
  • Dashboard accessible
  • SSL certificates valid
  • Monitoring alerts cleared

Post-Rollback Actions

  1. Root Cause Analysis

    • Document what went wrong
    • Identify failure point
    • Create prevention plan
  2. Team Communication

    • Update incident ticket
    • Share lessons learned
    • Update runbooks
  3. Preventive Measures

    • Add additional tests
    • Improve monitoring
    • Update deployment checklist

Contact Information

  • On-call Engineer: [Phone/Slack]
  • Engineering Lead: [Phone/Slack]
  • DevOps Team: #devops-alerts
  • Management: #management-alerts

Escalation

  1. Level 1: On-call engineer (first 15 minutes)
  2. Level 2: Engineering lead (after 15 minutes)
  3. Level 3: CTO (after 30 minutes)

Notes

  • Always create a backup before rollback
  • Document every step during rollback
  • Test in staging before production if possible
  • Keep stakeholders informed throughout process