Files
aitbc/ai-memory/failures/ci-failures.md

116 lines
3.6 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# CI Failures
This file tracks continuous integration failures, their diagnosis, and fixes. Consult when CI breaks.
---
## CI Failure: Poetry Build Error Missing README
**Date**: 2026-03-13
**Symptom**: Gitea Actions job fails during `poetry build`:
```
FileNotFoundError: [Errno 2] No such file or directory: 'README.md'
```
**Package**: `packages/py/aitbc-agent-sdk`
**Cause**: The package directory lacked a README.md, which Poetry expects when building a package.
**Fix**: Added a minimal README.md (later expanded with usage examples). Re-ran CI; build passed.
**Action**: Recorded in `failures/failure-archive.md` as "Package Build Fails Due to Missing README.md".
---
## CI Failure: ImportError in CLI Tests
**Symptom**: Test job for `cli` or import validation fails with:
```
ImportError: cannot import name 'trading_surveillance' from 'app.services'
```
**Cause**: Starlette/Broadcast mismatch or missing `app/services/__init__.py`, or path issues.
**Resolution**: Ensured `app/services/__init__.py` exists; fixed command module imports as per failure-archive; pinned Starlette version.
---
## CI Failure: Pytest Fails Due to Database Lock
**Symptom**: Intermittent test failures with `sqlite3.OperationalError: database is locked`.
**Cause**: Tests using the same SQLite file in parallel without proper isolation.
**Fix**: Switched to in-memory SQLite (`sqlite+aiosqlite:///:memory:`) for unit tests; ensured each test gets a fresh DB. Alternatively, use file-based with `cache=shared` and proper cleanup.
**Action**: Add test isolation to `conftest.py`; ensure fixtures tear down connections.
---
## CI Failure: Missing aiohttp Dependency
**Symptom**: Import error for `aiohttp` in `kyc_aml_providers.py`.
**Cause**: Dependency not declared in `pyproject.toml`.
**Fix**: Added `aiohttp` to dependencies. Pushed fix; CI passed after install.
---
## CI Failure: Syntax Error in Sibling's PR
**Symptom**: `monitor-prs.py` auto-requests changes because `py_compile` fails.
**Typical Cause**: Simple syntax mistake (missing colon, unmatched parentheses).
**Response**: Comment on PR with the syntax error. Developer fixes and pushes; CI re-runs.
**Note**: This is expected behavior; the script is doing its job.
---
## CI Failure: Redis Connection Refused
**Symptom**: Tests that rely on Redis connectivity fail:
```
redis.exceptions.ConnectionError: Error 111 connecting to localhost:6379. Connection refused.
```
**Cause**: Redis service not running in CI environment.
**Fix**: Either start Redis in CI job before tests, or mock Redis in tests. For integration tests that need Redis, add a service container or start Redis as a background process.
---
## CI Failure: Port Already in Use
**Symptom**: Test that starts a server fails with `OSError: [Errno 98] Address already in use`.
**Cause**: Previous test did not cleanly shut down the server; port 8006 (or other) still bound.
**Fix**: Ensure proper shutdown of servers in test teardown; use `asyncio` cancellation and wait for port release. Alternatively, use dynamic port allocation for CI.
---
## CI Failure: Out of Memory (OOM)
**Symptom**: CI job killed with signal SIGKILL (exit code 137).
**Cause**: Building many packages or running heavy tests exceeded CI container memory limits.
**Fix**: Reduce parallelism; use swap if allowed; split CI into smaller jobs; optimize tests.
---
## CI Failure: Permission Denied on Executable Scripts
**Symptom**: `./scripts/claim-task.py: Permission denied` when cron tries to run it.
**Cause**: Script file not executable (`chmod +x` missing).
**Fix**: `chmod +x scripts/claim-task.py`; ensure all scripts have correct mode in repo.
---
*Log new CI failures chronologically.*