6.6 KiB
Debugging Playbook
This is a collection of diagnostic checklists and debugging techniques for common issues in the AITBC system.
1. CLI Import Errors
Symptom: aitbc command crashes with ImportError or ModuleNotFoundError.
Checklist:
- Verify
apps/coordinator-api/src/app/services/__init__.pyexists. - Check that
cli/aitbc_cli/commands/*modules use correct relative imports. - Ensure coordinator-api is importable:
python -c "import sys; sys.path.append('apps/coordinator-api/src'); from app.services import trading_surveillance"should work. - Run
aitbc --helpto see if base CLI loads (indicates command module issue). - Look for absolute paths in command modules; replace with package-relative.
Common Fixes: See failure-archive for the hardcoded path issue.
2. Coordinator API Won't Start
Symptom: uvicorn app.main:app fails or hangs.
Checklist:
- Check port 8000 availability (
lsof -i:8000). - Verify database file exists or can be created:
apps/coordinator-api/data/. - Ensure
pyproject.tomldependencies installed in active venv. - Check logs for specific exception (traceback).
- Verify
REDIS_URLif using broadcast; Redis must be running.
Common Causes:
- Missing
aiohttporsqlalchemy - Database locked or permission denied
- Redis not running (if used)
3. Blockchain Node Not Producing Blocks
Symptom: RPC /status shows height not increasing.
Checklist:
- Is the node process running? (
ps aux | grep blockchain) - Check logs for consensus errors or DB errors.
- Verify ports 8006 (RPC) and 8005 (P2P) are open.
- Ensure wallet daemon running on 8015 (if needed for transactions).
- Confirm network: other peers? Running devnet with proposer account funded?
- Run
aitbc blockchain statusto see RPC response.
Common Causes:
- Not initialized (
scripts/devnet_up.shnot executed) - Genesis proposer has no funds
- P2P connectivity not established (check Redis for gossip)
4. AI Provider Job Fails with Payment Error
Symptom: Provider returns 403 or says balance insufficient.
Checklist:
- Did buyer send funds first? (
aitbc blockchain send ...) should precede job request. - Check provider's balance before/after; confirm expected amount transferred.
- Verify provider and buyer are on same network (ait-devnet).
- Ensure provider's wallet daemon is running (port 8015).
- Check coordinator job URL (
--marketplace-url) reachable.
Resolution: Follow the correct payment flow: buyer sends transaction, waits for confirmation, then POST /job.
5. Gitea API Calls Fail (Transient)
Symptom: Scripts fail with connection reset, 502, etc.
Checklist:
- Is Gitea instance up? Can you
curlthe API? - Check network connectivity and DNS.
- Add retry with exponential backoff (already in
monitor-prs.py). - If persistent, check Gitea logs for server-side issues.
Temporary Workaround: Wait and re-run the script manually.
6. Redis Pub/Sub Not Delivering Messages
Symptom: Agents don't receive broadcast messages.
Checklist:
- Is Redis running?
redis-cli pingshould return PONG. - Check that all agents use the same
REDIS_URL. - Verify message channel names match exactly.
- Ensure agents are subscribed before messages are published.
- Use
redis-cli SUBSCRIBE <channel>to debug manually.
Note: This is dev-only; production will use direct P2P.
7. Starlette Import Errors After Upgrade
Symptom: ImportError: cannot import name 'Broadcast'.
Fix: Pin Starlette to <0.38 as documented. Alternatively, refactor to use a different broadcast mechanism (future work).
8. Test Isolation Failures
Symptom: Tests pass individually but fail when run together.
Checklist:
- Look for shared resources (database files, ports, files).
- Use fixtures with
scope="function"and proper teardown. - Clean up after each test: close DB connections, stop servers.
- Avoid global state; inject dependencies.
Action: Refactor tests to be hermetic.
9. Port Conflicts
Symptom: OSError: Address already in use.
Checklist:
- Identify which process owns the port:
lsof -i:<port>. - Kill lingering processes from previous runs.
- Use dynamic port allocation for tests if possible.
- Ensure services shut down cleanly on exit (signals).
10. Memory Conflicts (Concurrent Editing)
Symptom: Two agents editing the same file cause Git merge conflicts.
Prevention:
- Use
ai-memory/daily/with one file per day; agents append, not edit. - Avoid editing the same file simultaneously; coordinate via claims if necessary.
- If conflict occurs, resolve manually by merging entries; preserve both contributions.
11. Cron Jobs Not Running
Symptom: Expected periodic tasks not executing.
Checklist:
- Verify cron entries (
crontab -lfor the user). - Check system cron logs (
/var/log/cron,journalctl). - Ensure scripts are executable and paths are absolute or correctly relative (use
cdfirst). - Redirect output to a log file for debugging:
>> /var/log/claim-task.log 2>&1.
12. Wallet Operations Fail (Unknown Wallet)
Symptom: aitbc wallet balance returns "wallet not found".
Checklist:
- Has wallet been created? Use
aitbc wallet createfirst. - Check the wallet name and hostname pattern:
<hostname><wallet_name>_simple. - Verify wallet daemon running on port 8015.
- Ensure RPC URL matches (coordinator API running on 8000).
13. CI Jobs Stuck / Timeout
Symptom: CI job runs for > 1 hour without finishing.
Checklist:
- Check for infinite loops or deadlocks in tests.
- Increase CI timeout if legitimate long test.
- Add
pytest -xto fail fast on first error to identify root cause. - Split tests into smaller batches.
14. Permission Denied on Git Operations
Symptom: fatal: could not read Username or Permission denied (publickey).
Cause: SSH key not loaded or Gitea token not set.
Fix:
- Ensure SSH agent has the key (
ssh-add -l). - Set
GITEA_TOKENenvironment variable for API operations. - Test with
git pushmanually.
15. Merge Conflict in Claim Branch
Symptom: Pulling latest main into claim branch causes conflicts.
Resolution:
- Resolve conflicts manually; keep both sets of changes if they are independent.
- Re-run tests after resolution.
- Push resolved branch.
- Consider rebasing instead of merging to keep history linear.
Add new debugging patterns as they emerge.