209 lines
6.6 KiB
Markdown
209 lines
6.6 KiB
Markdown
# Debugging Playbook
|
|
|
|
This is a collection of diagnostic checklists and debugging techniques for common issues in the AITBC system.
|
|
|
|
---
|
|
|
|
## 1. CLI Import Errors
|
|
|
|
**Symptom**: `aitbc` command crashes with `ImportError` or `ModuleNotFoundError`.
|
|
|
|
**Checklist**:
|
|
- [ ] Verify `apps/coordinator-api/src/app/services/__init__.py` exists.
|
|
- [ ] Check that `cli/aitbc_cli/commands/*` modules use correct relative imports.
|
|
- [ ] Ensure coordinator-api is importable: `python -c "import sys; sys.path.append('apps/coordinator-api/src'); from app.services import trading_surveillance"` should work.
|
|
- [ ] Run `aitbc --help` to see if base CLI loads (indicates command module issue).
|
|
- [ ] Look for absolute paths in command modules; replace with package-relative.
|
|
|
|
**Common Fixes**: See failure-archive for the hardcoded path issue.
|
|
|
|
---
|
|
|
|
## 2. Coordinator API Won't Start
|
|
|
|
**Symptom**: `uvicorn app.main:app` fails or hangs.
|
|
|
|
**Checklist**:
|
|
- [ ] Check port 8000 availability (`lsof -i:8000`).
|
|
- [ ] Verify database file exists or can be created: `apps/coordinator-api/data/`.
|
|
- [ ] Ensure `pyproject.toml` dependencies installed in active venv.
|
|
- [ ] Check logs for specific exception (traceback).
|
|
- [ ] Verify `REDIS_URL` if using broadcast; Redis must be running.
|
|
|
|
**Common Causes**:
|
|
- Missing `aiohttp` or `sqlalchemy`
|
|
- Database locked or permission denied
|
|
- Redis not running (if used)
|
|
|
|
---
|
|
|
|
## 3. Blockchain Node Not Producing Blocks
|
|
|
|
**Symptom**: RPC `/status` shows `height` not increasing.
|
|
|
|
**Checklist**:
|
|
- [ ] Is the node process running? (`ps aux | grep blockchain`)
|
|
- [ ] Check logs for consensus errors or DB errors.
|
|
- [ ] Verify ports 8006 (RPC) and 8005 (P2P) are open.
|
|
- [ ] Ensure wallet daemon running on 8015 (if needed for transactions).
|
|
- [ ] Confirm network: other peers? Running devnet with proposer account funded?
|
|
- [ ] Run `aitbc blockchain status` to see RPC response.
|
|
|
|
**Common Causes**:
|
|
- Not initialized (`scripts/devnet_up.sh` not executed)
|
|
- Genesis proposer has no funds
|
|
- P2P connectivity not established (check Redis for gossip)
|
|
|
|
---
|
|
|
|
## 4. AI Provider Job Fails with Payment Error
|
|
|
|
**Symptom**: Provider returns 403 or says balance insufficient.
|
|
|
|
**Checklist**:
|
|
- [ ] Did buyer send funds first? (`aitbc blockchain send ...`) should precede job request.
|
|
- [ ] Check provider's balance before/after; confirm expected amount transferred.
|
|
- [ ] Verify provider and buyer are on same network (ait-devnet).
|
|
- [ ] Ensure provider's wallet daemon is running (port 8015).
|
|
- [ ] Check coordinator job URL (`--marketplace-url`) reachable.
|
|
|
|
**Resolution**: Follow the correct payment flow: buyer sends transaction, waits for confirmation, then POST /job.
|
|
|
|
---
|
|
|
|
## 5. Gitea API Calls Fail (Transient)
|
|
|
|
**Symptom**: Scripts fail with connection reset, 502, etc.
|
|
|
|
**Checklist**:
|
|
- [ ] Is Gitea instance up? Can you `curl` the API?
|
|
- [ ] Check network connectivity and DNS.
|
|
- [ ] Add retry with exponential backoff (already in `monitor-prs.py`).
|
|
- [ ] If persistent, check Gitea logs for server-side issues.
|
|
|
|
**Temporary Workaround**: Wait and re-run the script manually.
|
|
|
|
---
|
|
|
|
## 6. Redis Pub/Sub Not Delivering Messages
|
|
|
|
**Symptom**: Agents don't receive broadcast messages.
|
|
|
|
**Checklist**:
|
|
- [ ] Is Redis running? `redis-cli ping` should return PONG.
|
|
- [ ] Check that all agents use the same `REDIS_URL`.
|
|
- [ ] Verify message channel names match exactly.
|
|
- [ ] Ensure agents are subscribed before messages are published.
|
|
- [ ] Use `redis-cli SUBSCRIBE <channel>` to debug manually.
|
|
|
|
**Note**: This is dev-only; production will use direct P2P.
|
|
|
|
---
|
|
|
|
## 7. Starlette Import Errors After Upgrade
|
|
|
|
**Symptom**: `ImportError: cannot import name 'Broadcast'`.
|
|
|
|
**Fix**: Pin Starlette to `<0.38` as documented. Alternatively, refactor to use a different broadcast mechanism (future work).
|
|
|
|
---
|
|
|
|
## 8. Test Isolation Failures
|
|
|
|
**Symptom**: Tests pass individually but fail when run together.
|
|
|
|
**Checklist**:
|
|
- [ ] Look for shared resources (database files, ports, files).
|
|
- [ ] Use fixtures with `scope="function"` and proper teardown.
|
|
- [ ] Clean up after each test: close DB connections, stop servers.
|
|
- [ ] Avoid global state; inject dependencies.
|
|
|
|
**Action**: Refactor tests to be hermetic.
|
|
|
|
---
|
|
|
|
## 9. Port Conflicts
|
|
|
|
**Symptom**: `OSError: Address already in use`.
|
|
|
|
**Checklist**:
|
|
- [ ] Identify which process owns the port: `lsof -i:<port>`.
|
|
- [ ] Kill lingering processes from previous runs.
|
|
- [ ] Use dynamic port allocation for tests if possible.
|
|
- [ ] Ensure services shut down cleanly on exit (signals).
|
|
|
|
---
|
|
|
|
## 10. Memory Conflicts (Concurrent Editing)
|
|
|
|
**Symptom**: Two agents editing the same file cause Git merge conflicts.
|
|
|
|
**Prevention**:
|
|
- Use `ai-memory/daily/` with one file per day; agents append, not edit.
|
|
- Avoid editing the same file simultaneously; coordinate via claims if necessary.
|
|
- If conflict occurs, resolve manually by merging entries; preserve both contributions.
|
|
|
|
---
|
|
|
|
## 11. Cron Jobs Not Running
|
|
|
|
**Symptom**: Expected periodic tasks not executing.
|
|
|
|
**Checklist**:
|
|
- [ ] Verify cron entries (`crontab -l` for the user).
|
|
- [ ] Check system cron logs (`/var/log/cron`, `journalctl`).
|
|
- [ ] Ensure scripts are executable and paths are absolute or correctly relative (use `cd` first).
|
|
- [ ] Redirect output to a log file for debugging: `>> /var/log/claim-task.log 2>&1`.
|
|
|
|
---
|
|
|
|
## 12. Wallet Operations Fail (Unknown Wallet)
|
|
|
|
**Symptom**: `aitbc wallet balance` returns "wallet not found".
|
|
|
|
**Checklist**:
|
|
- [ ] Has wallet been created? Use `aitbc wallet create` first.
|
|
- [ ] Check the wallet name and hostname pattern: `<hostname><wallet_name>_simple`.
|
|
- [ ] Verify wallet daemon running on port 8015.
|
|
- [ ] Ensure RPC URL matches (coordinator API running on 8000).
|
|
|
|
---
|
|
|
|
## 13. CI Jobs Stuck / Timeout
|
|
|
|
**Symptom**: CI job runs for > 1 hour without finishing.
|
|
|
|
**Checklist**:
|
|
- [ ] Check for infinite loops or deadlocks in tests.
|
|
- [ ] Increase CI timeout if legitimate long test.
|
|
- [ ] Add `pytest -x` to fail fast on first error to identify root cause.
|
|
- [ ] Split tests into smaller batches.
|
|
|
|
---
|
|
|
|
## 14. Permission Denied on Git Operations
|
|
|
|
**Symptom**: `fatal: could not read Username` or `Permission denied (publickey)`.
|
|
|
|
**Cause**: SSH key not loaded or Gitea token not set.
|
|
|
|
**Fix**:
|
|
- Ensure SSH agent has the key (`ssh-add -l`).
|
|
- Set `GITEA_TOKEN` environment variable for API operations.
|
|
- Test with `git push` manually.
|
|
|
|
---
|
|
|
|
## 15. Merge Conflict in Claim Branch
|
|
|
|
**Symptom**: Pulling latest main into claim branch causes conflicts.
|
|
|
|
**Resolution**:
|
|
- Resolve conflicts manually; keep both sets of changes if they are independent.
|
|
- Re-run tests after resolution.
|
|
- Push resolved branch.
|
|
- Consider rebasing instead of merging to keep history linear.
|
|
|
|
---
|
|
|
|
*Add new debugging patterns as they emerge.* |