oib/aitbc

Files

aitbc 745f791eda refactor: improve error handling and remove hardcoded credentials

- Changed bare except clauses to specific exception types in web3_utils.py, testing.py, messages.py, and message_storage.py
- Replaced print() calls with logger in testing.py, agent_discovery.py, compliance_agent.py, coordinator.py, trading_agent.py, keys.py, escrow.py, persistent_spending_tracker.py, sync_cli.py, and client.py
- Added logger initialization using get_logger(__name__) in compliance_agent.py, coordinator.py, trading_agent.py, keys.py, escrow.py, persistent_spending_tracker.py, and client.py
- Removed hardcoded secret

2026-05-12 17:01:57 +02:00

9.1 KiB

Raw Blame History

Agent-Management Service Extraction Plan

Overview

Extract the agent-related functionality from the coordinator-api monolith into a standalone microservice while maintaining operational continuity.

Current State

Monolith: apps/coordinator-api/src/app/

Services: 46,594 LOC across 89 files
Domain layer: domain/ contains all business entities (Agent, AgentExecution, AgentStatus, etc.)
Target agent files to extract: 18 files (6 routers, 12 services)
Largest files: agent_service.py (1,159 LOC), agent_integration.py (1,117 LOC), agent_communication.py (988 LOC)

Bounded Context: Agent-Management

Responsibility: AI agent lifecycle, orchestration, performance tracking, security, and marketplace registry.

In-Scope Files:

Services (12)

services/agent_service.py (1,159 LOC)
services/agent_integration.py (1,117 LOC)
services/agent_communication.py (988 LOC)
services/agent_orchestrator.py
services/agent_performance_service.py
services/agent_security.py
services/agent_portfolio_manager.py
services/agent_service_marketplace.py
services/advanced_rl/agents.py (+ sub-agents: ppo_agent.py, rainbow_dqn_agent.py, sac_agent.py)

Routers (6)

routers/agent_router.py
routers/agent_integration_router.py
routers/agent_performance.py
routers/agent_creativity.py
routers/agent_security_router.py
routers/services.py (agent services listing endpoint)

Critical Dependencies

Domain Layer (app.domain)
- All agent services import from ..domain.agent (AgentExecution, AgentStatus, AIAgentWorkflow, etc.)
- Solution: Keep domain/ in monolith for now; new service imports via a shared-domain package to be created
- Create apps/shared-domain/src/app/domain/ as a symlink or copy that both services can import
- Long-term: Extract entire domain layer to shared-domain package
aitbc package
- Already available as root package. Use directly.
SQLModel/SQLAlchemy
- Already in dependencies via root pyproject.toml
Other monolith services
- Some routers may call agent endpoints. These will need to be updated to use HTTP client to new service (Phase 3 internal routing via nginx)

Implementation Steps

Step 0: Prepare Shared Domain Package (Prerequisite)

Create apps/shared-domain/src/app/domain/
Copy all files from coordinator-api's domain/ EXCEPT non-agent ones if desired
Or simpler: symlink entire domain directory: ln -s ../../coordinator-api/src/app/domain apps/shared-domain/src/app/
Update imports in new service to use from shared-domain.app.domain.agent import ...
Add shared-domain to pyproject.toml dependencies in consuming services

Recommendation: Use symlink for rapid iteration, then formalize package later.

Step 1: Create agent-management Service Skeleton

apps/agent-management/
├── pyproject.toml
├── README.md
└── src/
    └── app/
        ├── __init__.py
        ├── main.py
        ├── core/
        │   ├── __init__.py
        │   ├── config.py (import from shared-core)
        │   ├── logging.py (import from shared-core)
        │   └── database.py (import from shared-core)
        ├── domain/ → symlink to ../../shared-domain/src/app/domain
        ├── routers/
        │   ├── __init__.py
        │   ├── agent_router.py (copied & adapted)
        │   ├── agent_integration_router.py
        │   ├── agent_performance.py
        │   ├── agent_creativity.py
        │   ├── agent_security_router.py
        │   └── services.py
        └── services/
            ├── __init__.py
            ├── agent_service.py
            ├── agent_orchestrator.py
            ├── agent_communication.py
            ├── agent_performance_service.py
            ├── agent_security.py
            ├── agent_integration.py
            ├── agent_portfolio_manager.py
            ├── agent_service_marketplace.py
            └── advanced_rl/
                ├── __init__.py
                ├── agents.py
                └── ppo_agent.py, rainbow_dqn_agent.py, sac_agent.py

Step 2: Adapt Code for Service Boundaries

Changes needed per file:

Update all from ..domain.agent import X to from shared-domain.app.domain.agent import X
Remove any imports from other monolith services (e.g., from ..services.other_service import X)
Replace internal service calls with HTTP client calls or event bus (defer to later phase)
Update ServiceSettings to use agent-management specific defaults (port 8012)
Add health check endpoint (already in template)
Verify database setup: AgentExecution etc use shared Base. Need to call Base.metadata.create_all(bind=engine) on startup

Special Case: advanced_rl/

These are AI model inference services. Consider moving to ai-models service instead.
For now, keep in agent-management to maintain functionality.

Step 3: Update Monolith to Proxy Requests (During Transition)

Option A: Nginx Routing

Add nginx upstream for agent-management on port 8012
Change coordinator-api routes for /api/v1/agent/* to proxy to agent-management
Monolith no longer handles agent endpoints

Option B: In-app Redirection

Keep routers in monolith but replace handlers with HTTPClient calls to new service
More gradual migration but adds latency

Recommendation: Option A - cleaner separation, easier to rollback.

Step 4: Create Systemd Service

/etc/systemd/system/aitbc-agent-management.service
[Unit]
Description=AITBC Agent Management Service
After=network.target

[Service]
Type=simple
User=aitbc
WorkingDirectory=/opt/aitbc/apps/agent-management
Environment=PATH=/opt/aitbc/venv/bin
Environment=PYTHONPATH=/opt/aitbc
ExecStart=/opt/aitbc/venv/bin/uvicorn app.main:app --host 127.0.0.1 --port 8012
Restart=on-failure
RestartSec=10

[Install]
WantedBy=multi-user.target

Step 5: Database Migration

Agent domain models likely already have tables defined via SQLModel
In main.py startup event, call Base.metadata.create_all(bind=engine) to ensure tables exist
Ensure the new service uses same database as monolith (coordinator.db) initially
Later: separate database (Phase 8)

Step 6: Integration Testing

Start agent-management service
Verify health endpoint: curl http://localhost:8012/health
Test agent creation via API
Verify coordinator-api can still access agent data (through new service or direct DB if keeping shared DB)
Run existing integration tests against new service

Step 7: Update Coordinator-API

Remove the 18 extracted files from monolith
Remove domain/agent related imports from remaining monolith services if they now use agent-management API
Update any remaining references to agent endpoints to use HTTP client or nginx proxy

Step 8: Documentation & Monitoring

Update README with agent-management API docs
Add metrics endpoint if enabled
Update deployment scripts

Rollback Plan

Keep monolith files in git history (do not delete, just move)
Keep nginx config either/or - can revert upstream routing
Database shared initially, so data is accessible to both
Systemd service can be disabled; monolith still runs

Success Criteria

Agent-management service starts and health check passes on port 8012
Can create/query agents via API
Existing coordinator-api functionality that depends on agents still works
No errors in logs during integration test
Systemd service auto-restarts on failure

Open Questions

RL Agents: Should advanced_rl be part of agent-management or ai-models?
- Recommendation: Keep in agent-management for now (AI agent inference is part of agent runtime). Can split later if ai-models becomes a separate inference service.
Database: Separate or shared?
- Phase 1: Shared (same coordinator.db) for simplicity
- Phase 8: Split to dedicated agent-management database
Cross-service calls: Currently agent integration uses other services directly (imports). Need to replace with HTTP or event bus.
- Defer until Phase 8 (Final Integration) to avoid breaking existing flow
Domain extraction: The domain models are currently in monolith. Should we extract entire domain to a package?
- Immediate need: Create shared-domain package (symlink) to break import cycle
- Future: Extract domain to true package with independent version

Timeline Estimate

Step 0 (shared-domain): 2h
Step 1 (skeleton): 4h
Step 2 (adaptation): 8h (bulk of work - fixing imports, resolving dependencies)
Step 3 (nginx routing): 2h
Step 4 (systemd): 1h
Step 5 (DB): 1h
Step 6 (testing): 4h
Step 7 (monolith cleanup): 4h
Step 8 (docs): 2h

Total: ~28 hours (3-4 days)

Risks

Hidden dependencies on other monolith services may cause runtime import errors
Domain models may have cross-references that require co-migration
Database migrations may be needed if agent tables don't exist yet
Existing integration tests may fail and need updating
Breaking changes if API contracts differ from original

9.1 KiB Raw Blame History