# Debugging Playbook This is a collection of diagnostic checklists and debugging techniques for common issues in the AITBC system. --- ## 1. CLI Import Errors **Symptom**: `aitbc` command crashes with `ImportError` or `ModuleNotFoundError`. **Checklist**: - [ ] Verify `apps/coordinator-api/src/app/services/__init__.py` exists. - [ ] Check that `cli/aitbc_cli/commands/*` modules use correct relative imports. - [ ] Ensure coordinator-api is importable: `python -c "import sys; sys.path.append('apps/coordinator-api/src'); from app.services import trading_surveillance"` should work. - [ ] Run `aitbc --help` to see if base CLI loads (indicates command module issue). - [ ] Look for absolute paths in command modules; replace with package-relative. **Common Fixes**: See failure-archive for the hardcoded path issue. --- ## 2. Coordinator API Won't Start **Symptom**: `uvicorn app.main:app` fails or hangs. **Checklist**: - [ ] Check port 8000 availability (`lsof -i:8000`). - [ ] Verify database file exists or can be created: `apps/coordinator-api/data/`. - [ ] Ensure `pyproject.toml` dependencies installed in active venv. - [ ] Check logs for specific exception (traceback). - [ ] Verify `REDIS_URL` if using broadcast; Redis must be running. **Common Causes**: - Missing `aiohttp` or `sqlalchemy` - Database locked or permission denied - Redis not running (if used) --- ## 3. Blockchain Node Not Producing Blocks **Symptom**: RPC `/status` shows `height` not increasing. **Checklist**: - [ ] Is the node process running? (`ps aux | grep blockchain`) - [ ] Check logs for consensus errors or DB errors. - [ ] Verify ports 8006 (RPC) and 8005 (P2P) are open. - [ ] Ensure wallet daemon running on 8015 (if needed for transactions). - [ ] Confirm network: other peers? Running devnet with proposer account funded? - [ ] Run `aitbc blockchain status` to see RPC response. **Common Causes**: - Not initialized (`scripts/devnet_up.sh` not executed) - Genesis proposer has no funds - P2P connectivity not established (check Redis for gossip) --- ## 4. AI Provider Job Fails with Payment Error **Symptom**: Provider returns 403 or says balance insufficient. **Checklist**: - [ ] Did buyer send funds first? (`aitbc blockchain send ...`) should precede job request. - [ ] Check provider's balance before/after; confirm expected amount transferred. - [ ] Verify provider and buyer are on same network (ait-devnet). - [ ] Ensure provider's wallet daemon is running (port 8015). - [ ] Check coordinator job URL (`--marketplace-url`) reachable. **Resolution**: Follow the correct payment flow: buyer sends transaction, waits for confirmation, then POST /job. --- ## 5. Gitea API Calls Fail (Transient) **Symptom**: Scripts fail with connection reset, 502, etc. **Checklist**: - [ ] Is Gitea instance up? Can you `curl` the API? - [ ] Check network connectivity and DNS. - [ ] Add retry with exponential backoff (already in `monitor-prs.py`). - [ ] If persistent, check Gitea logs for server-side issues. **Temporary Workaround**: Wait and re-run the script manually. --- ## 6. Redis Pub/Sub Not Delivering Messages **Symptom**: Agents don't receive broadcast messages. **Checklist**: - [ ] Is Redis running? `redis-cli ping` should return PONG. - [ ] Check that all agents use the same `REDIS_URL`. - [ ] Verify message channel names match exactly. - [ ] Ensure agents are subscribed before messages are published. - [ ] Use `redis-cli SUBSCRIBE ` to debug manually. **Note**: This is dev-only; production will use direct P2P. --- ## 7. Starlette Import Errors After Upgrade **Symptom**: `ImportError: cannot import name 'Broadcast'`. **Fix**: Pin Starlette to `<0.38` as documented. Alternatively, refactor to use a different broadcast mechanism (future work). --- ## 8. Test Isolation Failures **Symptom**: Tests pass individually but fail when run together. **Checklist**: - [ ] Look for shared resources (database files, ports, files). - [ ] Use fixtures with `scope="function"` and proper teardown. - [ ] Clean up after each test: close DB connections, stop servers. - [ ] Avoid global state; inject dependencies. **Action**: Refactor tests to be hermetic. --- ## 9. Port Conflicts **Symptom**: `OSError: Address already in use`. **Checklist**: - [ ] Identify which process owns the port: `lsof -i:`. - [ ] Kill lingering processes from previous runs. - [ ] Use dynamic port allocation for tests if possible. - [ ] Ensure services shut down cleanly on exit (signals). --- ## 10. Memory Conflicts (Concurrent Editing) **Symptom**: Two agents editing the same file cause Git merge conflicts. **Prevention**: - Use `ai-memory/daily/` with one file per day; agents append, not edit. - Avoid editing the same file simultaneously; coordinate via claims if necessary. - If conflict occurs, resolve manually by merging entries; preserve both contributions. --- ## 11. Cron Jobs Not Running **Symptom**: Expected periodic tasks not executing. **Checklist**: - [ ] Verify cron entries (`crontab -l` for the user). - [ ] Check system cron logs (`/var/log/cron`, `journalctl`). - [ ] Ensure scripts are executable and paths are absolute or correctly relative (use `cd` first). - [ ] Redirect output to a log file for debugging: `>> /var/log/claim-task.log 2>&1`. --- ## 12. Wallet Operations Fail (Unknown Wallet) **Symptom**: `aitbc wallet balance` returns "wallet not found". **Checklist**: - [ ] Has wallet been created? Use `aitbc wallet create` first. - [ ] Check the wallet name and hostname pattern: `_simple`. - [ ] Verify wallet daemon running on port 8015. - [ ] Ensure RPC URL matches (coordinator API running on 8000). --- ## 13. CI Jobs Stuck / Timeout **Symptom**: CI job runs for > 1 hour without finishing. **Checklist**: - [ ] Check for infinite loops or deadlocks in tests. - [ ] Increase CI timeout if legitimate long test. - [ ] Add `pytest -x` to fail fast on first error to identify root cause. - [ ] Split tests into smaller batches. --- ## 14. Permission Denied on Git Operations **Symptom**: `fatal: could not read Username` or `Permission denied (publickey)`. **Cause**: SSH key not loaded or Gitea token not set. **Fix**: - Ensure SSH agent has the key (`ssh-add -l`). - Set `GITEA_TOKEN` environment variable for API operations. - Test with `git push` manually. --- ## 15. Merge Conflict in Claim Branch **Symptom**: Pulling latest main into claim branch causes conflicts. **Resolution**: - Resolve conflicts manually; keep both sets of changes if they are independent. - Re-run tests after resolution. - Push resolved branch. - Consider rebasing instead of merging to keep history linear. --- *Add new debugging patterns as they emerge.*