# Architectural Decisions This log records significant architectural decisions made during the AITBC project to prevent re-debating past choices. ## Format - **Decision**: What was decided - **Date**: YYYY-MM-DD - **Context**: Why the decision was needed - **Alternatives Considered**: Other options - **Reason**: Why this option was chosen - **Impact**: Consequences (positive and negative) --- ## Decision 1: Stability Rings for PR Review Automation **Date**: 2026-03-15 **Context**: Need to automate PR reviews while maintaining quality. Different parts of the codebase have different risk profiles; a blanket policy would either be too strict or too lax. **Alternatives Considered**: 1. Manual review for all PRs (high overhead, slow) 2. Path-based auto-approve with no ring classification (fragile) 3. Single threshold based on file count (too coarse) **Reason**: Rings provide clear, scalable zones of trust. Core packages (Ring 0) require human review; lower rings can be automated. This balances safety and velocity. **Impact**: - Reduces review burden for non-critical changes - Maintains rigor where it matters (packages, blockchain) - Requires maintainers to understand and respect ring boundaries - Automated scripts can enforce ring policies consistently **Status**: Implemented and documented in `architecture/agent-roles.md` and `architecture/system-overview.md` --- ## Decision 2: Hierarchical Memory System (ai-memory/) **Date**: 2026-03-15 **Context**: Existing memory was unstructured (`memory/` with hourly files per agent, `MEMORY.md` notes). This caused information retrieval to be slow, knowledge to be scattered, and made coordination difficult. **Alternatives Considered**: 1. Single large daily file for all agents (edit conflicts, hard to parse) 2. Wiki system (external dependency, complexity) 3. Tag-based file system (ad-hoc, hard to enforce) **Reason**: A structured hierarchy with explicit layers (daily, architecture, decisions, failures, knowledge, agents) aligns with how agents need to consume information. Clear protocols for read/write operations improve consistency. **Impact**: - Agents have a predictable memory layout - Faster recall through organized documents - Reduces hallucinations by providing reliable sources - Encourages documentation discipline (record decisions, failures) **Status**: Implemented; this file is part of it. --- ## Decision 3: Distributed Task Claiming via Atomic Git Branches **Date**: 2026-03-15 **Context**: Multiple autonomous agents need to claim issues without stepping on each other. There is no central task queue service; we rely on Git as the coordination point. **Alternatives Considered**: 1. Gitea issue assignment API (requires locking, may race) 2. Shared JSON lock file in repo (prone to merge conflicts) 3. Cron-based claiming with sleep-and-retry (simple but effective) **Reason**: Atomic Git branch creation is a distributed mutex provided by Git itself. It's race-safe without extra infrastructure. Combining with a claiming script and issue labels yields a simple, robust system. **Impact**: - Eliminates duplicate work - Allows agents to operate independently - Easy to audit: branch names reveal claims - Claim branches are cleaned up after PR merge/close **Status**: Implemented in `scripts/claim-task.py` --- ## Decision 4: P2P Gossip via Redis Broadcast (Dev Only) **Date**: 2026-03-15 **Context**: Agents need to broadcast messages to peers on the network. The initial implementation needed something quick and reliable for local development. **Alternatives Considered**: 1. Direct peer-to-peer sockets (complex NAT traversal) 2. Central message broker with auth (more setup) 3. Multicast (limited to local network) **Reason**: Redis pub/sub is simple to set up, reliable, and works well on a local network. It's explicitly marked as dev-only; production will require a secure direct P2P mechanism. **Impact**: - Fast development iteration - No security for internet deployment (known limitation) - Forces future redesign for production (good constraint) **Status**: Dev environment uses Redis; production path deferred. --- ## Decision 5: Starlette Version Pinning (<0.38) **Date**: 2026-03-15 **Context**: Starlette removed the `Broadcast` module in version 0.38, breaking the gossip backend that depends on it. **Alternatives Considered**: 1. Migrate to a different broadcast library (effort, risk) 2. Reimplement broadcast on top of Redis only (eventual) 3. Pin Starlette version until production P2P is ready **Reason**: Pinning is the quickest way to restore dev environment functionality with minimal changes. The broadcast module is already dev-only; replacing it can be scheduled for production hardening. **Impact**: - Dev environment stable again - Must remember to bump/remove pin before production - Prevents accidental upgrades that break things **Status**: `pyproject.toml` pins `starlette>=0.37.2,<0.38` --- ## Decision 6: Use Poetry for Package Management **Date**: Prior to 2026-03-15 **Context**: Need a consistent way to define dependencies, build packages, and manage virtualenvs across multiple packages in the monorepo. **Alternatives Considered**: 1. pip + requirements.txt (flat, no build isolation) 2. Hatch (similar, but Poetry chosen) 3. Custom Makefile (reinventing wheel) **Reason**: Poetry provides a modern, all-in-one solution: dependency resolution, virtualenv management, building, publishing. It works well with monorepos via workspace-style handling (or multiple pyproject files). **Impact**: - Standardized packaging - Faster onboarding (poetry install) - Some learning curve; additional tool **Status**: Adopted across packages; ongoing. --- ## Decision 7: Blockchain Node Separate from Coordinator **Date**: Prior to 2026-03-15 **Context**: The system needs a ledger for payments and consensus, but also a marketplace for job matching. Should they be one service or two? **Alternatives Considered**: 1. Monolithic service (simpler deployment but tighter coupling) 2. Separate services with well-defined API (more flexible, scalable) 3. On-chain marketplace (too slow, costly) **Reason**: Separation of concerns: blockchain handles consensus and accounts; coordinator handles marketplace logic. This allows each to evolve independently and be scaled separately. **Impact**: - Clear service boundaries - Requires cross-service communication (HTTP) - More processes to run in dev **Status**: Two services in production (devnet). --- *Add subsequent decisions below as they arise.*