3.6 KiB
CI Failures
This file tracks continuous integration failures, their diagnosis, and fixes. Consult when CI breaks.
CI Failure: Poetry Build Error – Missing README
Date: 2026-03-13
Symptom: Gitea Actions job fails during poetry build:
FileNotFoundError: [Errno 2] No such file or directory: 'README.md'
Package: packages/py/aitbc-agent-sdk
Cause: The package directory lacked a README.md, which Poetry expects when building a package.
Fix: Added a minimal README.md (later expanded with usage examples). Re-ran CI; build passed.
Action: Recorded in failures/failure-archive.md as "Package Build Fails Due to Missing README.md".
CI Failure: ImportError in CLI Tests
Symptom: Test job for cli or import validation fails with:
ImportError: cannot import name 'trading_surveillance' from 'app.services'
Cause: Starlette/Broadcast mismatch or missing app/services/__init__.py, or path issues.
Resolution: Ensured app/services/__init__.py exists; fixed command module imports as per failure-archive; pinned Starlette version.
CI Failure: Pytest Fails Due to Database Lock
Symptom: Intermittent test failures with sqlite3.OperationalError: database is locked.
Cause: Tests using the same SQLite file in parallel without proper isolation.
Fix: Switched to in-memory SQLite (sqlite+aiosqlite:///:memory:) for unit tests; ensured each test gets a fresh DB. Alternatively, use file-based with cache=shared and proper cleanup.
Action: Add test isolation to conftest.py; ensure fixtures tear down connections.
CI Failure: Missing aiohttp Dependency
Symptom: Import error for aiohttp in kyc_aml_providers.py.
Cause: Dependency not declared in pyproject.toml.
Fix: Added aiohttp to dependencies. Pushed fix; CI passed after install.
CI Failure: Syntax Error in Sibling's PR
Symptom: monitor-prs.py auto-requests changes because py_compile fails.
Typical Cause: Simple syntax mistake (missing colon, unmatched parentheses).
Response: Comment on PR with the syntax error. Developer fixes and pushes; CI re-runs.
Note: This is expected behavior; the script is doing its job.
CI Failure: Redis Connection Refused
Symptom: Tests that rely on Redis connectivity fail:
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.
Cause: Redis service not running in CI environment.
Fix: Either start Redis in CI job before tests, or mock Redis in tests. For integration tests that need Redis, add a service container or start Redis as a background process.
CI Failure: Port Already in Use
Symptom: Test that starts a server fails with OSError: [Errno 98] Address already in use.
Cause: Previous test did not cleanly shut down the server; port 8006 (or other) still bound.
Fix: Ensure proper shutdown of servers in test teardown; use asyncio cancellation and wait for port release. Alternatively, use dynamic port allocation for CI.
CI Failure: Out of Memory (OOM)
Symptom: CI job killed with signal SIGKILL (exit code 137).
Cause: Building many packages or running heavy tests exceeded CI container memory limits.
Fix: Reduce parallelism; use swap if allowed; split CI into smaller jobs; optimize tests.
CI Failure: Permission Denied on Executable Scripts
Symptom: ./scripts/claim-task.py: Permission denied when cron tries to run it.
Cause: Script file not executable (chmod +x missing).
Fix: chmod +x scripts/claim-task.py; ensure all scripts have correct mode in repo.
Log new CI failures chronologically.