Files
aitbc/docs/10_plan/02_implementation/backend-implementation-roadmap.md
oib 037a9aacc0 docs: remove outdated planning documents and consolidate milestone documentation
- Delete obsolete next milestone plan (00_nextMileston.md) with outdated Q2 2026 targets
- Delete global marketplace launch strategy (06_global_marketplace_launch.md) with superseded Q2 2026 plans
- Remove duplicate planning documentation and outdated status indicators
- Clean up planning directory structure for current development phase
- Consolidate strategic planning into active documentation
2026-03-05 14:07:08 +01:00

9.6 KiB

Backend Endpoint Implementation Roadmap - March 5, 2026

Overview

The AITBC CLI is now fully functional with proper authentication, error handling, and command structure. However, several key backend endpoints are missing, preventing full end-to-end functionality. This roadmap outlines the required backend implementations.

🎯 Current Status

CLI Status: 97% Complete

  • Authentication: Working (API keys configured)
  • Command Structure: Complete (all commands implemented)
  • Error Handling: Robust (proper error messages)
  • File Operations: Working (JSON/CSV parsing, templates)

⚠️ Backend Limitations: Missing Endpoints

  • Job Submission: /v1/jobs endpoint not implemented
  • Agent Operations: /v1/agents/* endpoints not implemented
  • Swarm Operations: /v1/swarm/* endpoints not implemented
  • Various Client APIs: History, blocks, receipts endpoints missing

🛠️ Required Backend Implementations

Priority 1: Core Job Management (High Impact)

1.1 Job Submission Endpoint

Endpoint: POST /v1/jobs Purpose: Submit inference jobs to the coordinator Required Features:

@app.post("/v1/jobs", response_model=JobView, status_code=201)
async def submit_job(
    req: JobCreate,
    request: Request,
    session: SessionDep,
    client_id: str = Depends(require_client_key()),
) -> JobView:

Implementation Requirements:

  • Validate job payload (type, prompt, model)
  • Queue job for processing
  • Return job ID and initial status
  • Support TTL (time-to-live) configuration
  • Rate limiting per client

1.2 Job Status Endpoint

Endpoint: GET /v1/jobs/{job_id} Purpose: Check job execution status Required Features:

  • Return current job state (queued, running, completed, failed)
  • Include progress information for long-running jobs
  • Support real-time status updates

1.3 Job Result Endpoint

Endpoint: GET /v1/jobs/{job_id}/result Purpose: Retrieve completed job results Required Features:

  • Return job output and metadata
  • Include execution time and resource usage
  • Support result caching

1.4 Job History Endpoint

Endpoint: GET /v1/jobs/history Purpose: List job history with filtering Required Features:

  • Pagination support
  • Filter by status, date range, job type
  • Include job metadata and results

Priority 2: Agent Management (Medium Impact)

2.1 Agent Workflow Creation

Endpoint: POST /v1/agents/workflows Purpose: Create AI agent workflows Required Features:

@app.post("/v1/agents/workflows", response_model=AgentWorkflowView)
async def create_agent_workflow(
    workflow: AgentWorkflowCreate,
    session: SessionDep,
    client_id: str = Depends(require_client_key()),
) -> AgentWorkflowView:

2.2 Agent Execution

Endpoint: POST /v1/agents/workflows/{agent_id}/execute Purpose: Execute agent workflows Required Features:

  • Workflow execution engine
  • Resource allocation
  • Execution monitoring

2.3 Agent Status & Receipts

Endpoints:

  • GET /v1/agents/executions/{execution_id}
  • GET /v1/agents/executions/{execution_id}/receipt Purpose: Monitor agent execution and get verifiable receipts

Priority 3: Swarm Intelligence (Medium Impact)

3.1 Swarm Join Endpoint

Endpoint: POST /v1/swarm/join Purpose: Join agent swarms for collective optimization Required Features:

@app.post("/v1/swarm/join", response_model=SwarmJoinView)
async def join_swarm(
    swarm_data: SwarmJoinRequest,
    session: SessionDep,
    client_id: str = Depends(require_client_key()),
) -> SwarmJoinView:

3.2 Swarm Coordination

Endpoint: POST /v1/swarm/coordinate Purpose: Coordinate swarm task execution Required Features:

  • Task distribution
  • Result aggregation
  • Consensus mechanisms

Priority 4: Enhanced Client Features (Low Impact)

4.1 Job Management

Endpoints:

  • DELETE /v1/jobs/{job_id} (Cancel job)
  • GET /v1/jobs/{job_id}/receipt (Job receipt)
  • GET /v1/explorer/receipts (List receipts)

4.2 Payment System

Endpoints:

  • POST /v1/payments (Create payment)
  • GET /v1/payments/{payment_id}/status (Payment status)
  • GET /v1/payments/{payment_id}/receipt (Payment receipt)

4.3 Block Integration

Endpoint: GET /v1/explorer/blocks Purpose: List recent blocks for client context

🏗️ Implementation Strategy

Phase 1: Core Job System (Week 1-2)

  1. Job Submission API

    • Implement basic job queue
    • Add job validation and routing
    • Create job status tracking
  2. Job Execution Engine

    • Connect to AI model inference
    • Implement job processing pipeline
    • Add result storage and retrieval
  3. Testing & Validation

    • End-to-end job submission tests
    • Performance benchmarking
    • Error handling validation

Phase 2: Agent System (Week 3-4)

  1. Agent Workflow Engine

    • Workflow definition and storage
    • Execution orchestration
    • Resource management
  2. Agent Integration

    • Connect to AI agent frameworks
    • Implement agent communication
    • Add monitoring and logging

Phase 3: Swarm Intelligence (Week 5-6)

  1. Swarm Coordination

    • Implement swarm algorithms
    • Add task distribution logic
    • Create result aggregation
  2. Swarm Optimization

    • Performance tuning
    • Load balancing
    • Fault tolerance

Phase 4: Enhanced Features (Week 7-8)

  1. Payment Integration

    • Payment processing
    • Escrow management
    • Receipt generation
  2. Advanced Features

    • Batch job optimization
    • Template system integration
    • Advanced filtering and search

📊 Technical Requirements

Database Schema Updates

-- Jobs Table
CREATE TABLE jobs (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    client_id VARCHAR(255) NOT NULL,
    type VARCHAR(50) NOT NULL,
    payload JSONB NOT NULL,
    status VARCHAR(20) DEFAULT 'queued',
    result JSONB,
    created_at TIMESTAMP DEFAULT NOW(),
    updated_at TIMESTAMP DEFAULT NOW(),
    ttl_seconds INTEGER DEFAULT 900
);

-- Agent Workflows Table
CREATE TABLE agent_workflows (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    name VARCHAR(255) NOT NULL,
    description TEXT,
    workflow_definition JSONB NOT NULL,
    client_id VARCHAR(255) NOT NULL,
    created_at TIMESTAMP DEFAULT NOW()
);

-- Swarm Members Table
CREATE TABLE swarm_members (
    id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
    swarm_id UUID NOT NULL,
    agent_id VARCHAR(255) NOT NULL,
    role VARCHAR(50) NOT NULL,
    capability VARCHAR(100),
    joined_at TIMESTAMP DEFAULT NOW()
);

Service Dependencies

  1. AI Model Integration: Connect to Ollama or other inference services
  2. Message Queue: Redis/RabbitMQ for job queuing
  3. Storage: Database for job and agent state
  4. Monitoring: Metrics and logging for observability

API Documentation

  • OpenAPI/Swagger specifications
  • Request/response examples
  • Error code documentation
  • Rate limiting information

🔧 Development Environment Setup

Local Development

# Start coordinator API with job endpoints
cd /opt/aitbc/apps/coordinator-api
.venv/bin/python -m uvicorn app.main:app --reload --port 8000

# Test with CLI
aitbc client submit --prompt "test" --model gemma3:1b

Testing Strategy

  1. Unit Tests: Individual endpoint testing
  2. Integration Tests: End-to-end workflow testing
  3. Load Tests: Performance under load
  4. Security Tests: Authentication and authorization

📈 Success Metrics

Phase 1 Success Criteria

  • Job submission working end-to-end
  • 100+ concurrent job support
  • <2s average job submission time
  • 99.9% uptime for job APIs

Phase 2 Success Criteria

  • Agent workflow creation and execution
  • Multi-agent coordination working
  • Agent receipt generation
  • Resource utilization optimization

Phase 3 Success Criteria

  • Swarm join and coordination
  • Collective optimization results
  • Swarm performance metrics
  • Fault tolerance testing

Phase 4 Success Criteria

  • Payment system integration
  • Advanced client features
  • Full CLI functionality
  • Production readiness

🚀 Deployment Plan

Staging Environment

  1. Infrastructure Setup: Deploy to staging cluster
  2. Database Migration: Apply schema updates
  3. Service Configuration: Configure all endpoints
  4. Integration Testing: Full workflow testing

Production Deployment

  1. Blue-Green Deployment: Zero-downtime deployment
  2. Monitoring Setup: Metrics and alerting
  3. Performance Tuning: Optimize for production load
  4. Documentation Update: Update API documentation

📝 Next Steps

Immediate Actions (This Week)

  1. Implement Job Submission: Start with basic /v1/jobs endpoint
  2. Database Setup: Create required tables and indexes
  3. Testing Framework: Set up automated testing
  4. CLI Integration: Test with existing CLI commands

Short Term (2-4 Weeks)

  1. Complete Job System: Full job lifecycle management
  2. Agent System: Basic agent workflow support
  3. Performance Optimization: Optimize for production load
  4. Documentation: Complete API documentation

Long Term (1-2 Months)

  1. Swarm Intelligence: Full swarm coordination
  2. Advanced Features: Payment system, advanced filtering
  3. Production Deployment: Full production readiness
  4. Monitoring & Analytics: Comprehensive observability

Summary: The CLI is 97% complete and ready for production use. The main remaining work is implementing the backend endpoints to support full end-to-end functionality. This roadmap provides a clear path to 100% completion.