Files
aitbc/ai-memory/failures/failure-archive.md

4.9 KiB
Raw Blame History

Failure Archive

This archive collects known failure patterns experienced during development, along with their causes and resolutions. Agents should consult before debugging similar symptoms.


Failure: CLI Fails to Launch Hardcoded Absolute Paths

Date: 2026-03-13

Symptom: ImportError: No module named 'trading_surveillance' when running aitbc --help or any subcommand.

Cause: Multiple command modules in cli/aitbc_cli/commands/ used:

sys.path.append('/home/oib/windsurf/aitbc/apps/coordinator-api/src/app/services')

This path is user-specific and does not exist on the aitbc1 host.

Modules affected:

  • surveillance.py
  • ai_trading.py
  • ai_surveillance.py
  • advanced_analytics.py
  • regulatory.py
  • enterprise_integration.py

Resolution:

  1. Added __init__.py to apps/coordinator-api/src/app/services/ to make it a proper package.
  2. Updated each affected command module to use:
    sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', '..', 'apps', 'coordinator-api', 'src'))
    from app.services.trading_surveillance import ...
    
    (or simply from app.services import <module> after path setup)
  3. Removed hardcoded fallback absolute paths.
  4. Verified: aitbc --help loads without errors; aitbc surveillance start works.

Prevention: Use package-relative imports; avoid user-specific absolute paths. Consider making coordinator-api a proper installable dependency.


Failure: Missing Dependency aiohttp

Symptom: ModuleNotFoundError: No module named 'aiohttp' when importing kyc_aml_providers.py.

Cause: cli/pyproject.toml did not declare aiohttp.

Resolution: poetry add aiohttp (or pip install aiohttp in venv). Updated pyproject.toml accordingly.

Prevention: Keep dependencies declared; run tests in fresh environment.


Failure: Package Build Fails Due to Missing README.md

Symptom: poetry build for packages/py/aitbc-agent-sdk fails with FileNotFoundError: README.md.

Cause: The package directory lacked a README.md, which some build configurations require.

Resolution: Created an empty or placeholder README.md. Later enhanced with usage examples.

Prevention: Ensure each package has at least a minimal README; add pre-commit hook to check.


Failure: Starlette Broadcast Module Missing After Upgrade

Symptom: ImportError: cannot import name 'Broadcast' from 'starlette' after upgrading Starlette to 0.38+.

Cause: Starlette removed the Broadcast module in version 0.38.

Impact: P2P gossip backend (using Redis broadcast) fails to import. Services crash on startup.

Resolution:

  • Pinned Starlette to >=0.37.2,<0.38 in pyproject.toml.
  • Added comment explaining the pin and that production should replace broadcast with direct P2P.

Prevention: Avoid upgrading Starlette without testing; track deprecations.

See also: debugging-notes.md for diagnostic steps.


Failure: Docker Compose Not Found

Symptom: docker-compose: command not found even though Docker is installed.

Cause: System has Docker Compose v2 (docker compose) but not v1 (docker-compose). The project documentation referenced docker-compose.

Resolution: Updated documentation to use docker compose (or detect whichever is available). Alternatively, create a symlink or alias.

Prevention: Detect both variants in scripts; document both names.


Failure: Test Scripts Use Absolute Paths

Symptom: run_all_tests.sh fails with "No such file or directory" for test scenario scripts located in /home/oib/windsurf/aitbc/....

Cause: Test scripts referenced a specific user's home directory, not the project root.

Resolution: Rewrote paths to be project-relative using $(dirname "$0"). Example: $(dirname "$0")/test_scenario_a.sh.

Prevention: Never hardcode absolute paths; always compute relative to project root or script location.


Failure: Gitea API Unstable During PR Approval

Symptom: Script monitor-prs.py fails to post approvals due to "connection reset" or 5xx errors from Gitea.

Cause: Gitea instance may be under load or temporarily unavailable.

Resolution: Added retry logic with exponential backoff. If still failing, log and skip; next run will succeed.

Prevention: Make API clients resilient to transient failures.


Failure: Coordinator API Idempotent DB Init

Symptom: Running init_db() multiple times causes sqlite3.IntegrityError due to duplicate index creation.

Cause: init_db() did not catch duplicate index errors; it assumed fresh DB.

Resolution: Wrapped index creation in try/except blocks catching sqlite3.IntegrityError (or using IF NOT EXISTS where supported). This made initialization idempotent.

Impact: Coordinator can be started repeatedly without manual DB cleanup.

Prevention: Design DB initialization to be idempotent from the start.


Add new failures chronologically below.