Files
aitbc/ai-memory/failures/ci-failures.md

3.6 KiB
Raw Blame History

CI Failures

This file tracks continuous integration failures, their diagnosis, and fixes. Consult when CI breaks.


CI Failure: Poetry Build Error Missing README

Date: 2026-03-13

Symptom: Gitea Actions job fails during poetry build:

FileNotFoundError: [Errno 2] No such file or directory: 'README.md'

Package: packages/py/aitbc-agent-sdk

Cause: The package directory lacked a README.md, which Poetry expects when building a package.

Fix: Added a minimal README.md (later expanded with usage examples). Re-ran CI; build passed.

Action: Recorded in failures/failure-archive.md as "Package Build Fails Due to Missing README.md".


CI Failure: ImportError in CLI Tests

Symptom: Test job for cli or import validation fails with:

ImportError: cannot import name 'trading_surveillance' from 'app.services'

Cause: Starlette/Broadcast mismatch or missing app/services/__init__.py, or path issues.

Resolution: Ensured app/services/__init__.py exists; fixed command module imports as per failure-archive; pinned Starlette version.


CI Failure: Pytest Fails Due to Database Lock

Symptom: Intermittent test failures with sqlite3.OperationalError: database is locked.

Cause: Tests using the same SQLite file in parallel without proper isolation.

Fix: Switched to in-memory SQLite (sqlite+aiosqlite:///:memory:) for unit tests; ensured each test gets a fresh DB. Alternatively, use file-based with cache=shared and proper cleanup.

Action: Add test isolation to conftest.py; ensure fixtures tear down connections.


CI Failure: Missing aiohttp Dependency

Symptom: Import error for aiohttp in kyc_aml_providers.py.

Cause: Dependency not declared in pyproject.toml.

Fix: Added aiohttp to dependencies. Pushed fix; CI passed after install.


CI Failure: Syntax Error in Sibling's PR

Symptom: monitor-prs.py auto-requests changes because py_compile fails.

Typical Cause: Simple syntax mistake (missing colon, unmatched parentheses).

Response: Comment on PR with the syntax error. Developer fixes and pushes; CI re-runs.

Note: This is expected behavior; the script is doing its job.


CI Failure: Redis Connection Refused

Symptom: Tests that rely on Redis connectivity fail:

redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.

Cause: Redis service not running in CI environment.

Fix: Either start Redis in CI job before tests, or mock Redis in tests. For integration tests that need Redis, add a service container or start Redis as a background process.


CI Failure: Port Already in Use

Symptom: Test that starts a server fails with OSError: [Errno 98] Address already in use.

Cause: Previous test did not cleanly shut down the server; port 8006 (or other) still bound.

Fix: Ensure proper shutdown of servers in test teardown; use asyncio cancellation and wait for port release. Alternatively, use dynamic port allocation for CI.


CI Failure: Out of Memory (OOM)

Symptom: CI job killed with signal SIGKILL (exit code 137).

Cause: Building many packages or running heavy tests exceeded CI container memory limits.

Fix: Reduce parallelism; use swap if allowed; split CI into smaller jobs; optimize tests.


CI Failure: Permission Denied on Executable Scripts

Symptom: ./scripts/claim-task.py: Permission denied when cron tries to run it.

Cause: Script file not executable (chmod +x missing).

Fix: chmod +x scripts/claim-task.py; ensure all scripts have correct mode in repo.


Log new CI failures chronologically.