Files
aitbc/docs/10_plan/02_implementation/backend-implementation-roadmap.md
oib 15427c96c0 chore: update file permissions to executable across repository
- Change file mode from 644 to 755 for all project files
- Add chain_id parameter to get_balance RPC endpoint with default "ait-devnet"
- Rename Miner.extra_meta_data to extra_metadata for consistency
2026-03-06 22:17:54 +01:00

322 lines
9.6 KiB
Markdown
Executable File

# Backend Endpoint Implementation Roadmap - March 5, 2026
## Overview
The AITBC CLI is now fully functional with proper authentication, error handling, and command structure. However, several key backend endpoints are missing, preventing full end-to-end functionality. This roadmap outlines the required backend implementations.
## 🎯 Current Status
### ✅ CLI Status: 97% Complete
- **Authentication**: ✅ Working (API keys configured)
- **Command Structure**: ✅ Complete (all commands implemented)
- **Error Handling**: ✅ Robust (proper error messages)
- **File Operations**: ✅ Working (JSON/CSV parsing, templates)
### ⚠️ Backend Limitations: Missing Endpoints
- **Job Submission**: `/v1/jobs` endpoint not implemented
- **Agent Operations**: `/v1/agents/*` endpoints not implemented
- **Swarm Operations**: `/v1/swarm/*` endpoints not implemented
- **Various Client APIs**: History, blocks, receipts endpoints missing
## 🛠️ Required Backend Implementations
### Priority 1: Core Job Management (High Impact)
#### 1.1 Job Submission Endpoint
**Endpoint**: `POST /v1/jobs`
**Purpose**: Submit inference jobs to the coordinator
**Required Features**:
```python
@app.post("/v1/jobs", response_model=JobView, status_code=201)
async def submit_job(
req: JobCreate,
request: Request,
session: SessionDep,
client_id: str = Depends(require_client_key()),
) -> JobView:
```
**Implementation Requirements**:
- Validate job payload (type, prompt, model)
- Queue job for processing
- Return job ID and initial status
- Support TTL (time-to-live) configuration
- Rate limiting per client
#### 1.2 Job Status Endpoint
**Endpoint**: `GET /v1/jobs/{job_id}`
**Purpose**: Check job execution status
**Required Features**:
- Return current job state (queued, running, completed, failed)
- Include progress information for long-running jobs
- Support real-time status updates
#### 1.3 Job Result Endpoint
**Endpoint**: `GET /v1/jobs/{job_id}/result`
**Purpose**: Retrieve completed job results
**Required Features**:
- Return job output and metadata
- Include execution time and resource usage
- Support result caching
#### 1.4 Job History Endpoint
**Endpoint**: `GET /v1/jobs/history`
**Purpose**: List job history with filtering
**Required Features**:
- Pagination support
- Filter by status, date range, job type
- Include job metadata and results
### Priority 2: Agent Management (Medium Impact)
#### 2.1 Agent Workflow Creation
**Endpoint**: `POST /v1/agents/workflows`
**Purpose**: Create AI agent workflows
**Required Features**:
```python
@app.post("/v1/agents/workflows", response_model=AgentWorkflowView)
async def create_agent_workflow(
workflow: AgentWorkflowCreate,
session: SessionDep,
client_id: str = Depends(require_client_key()),
) -> AgentWorkflowView:
```
#### 2.2 Agent Execution
**Endpoint**: `POST /v1/agents/workflows/{agent_id}/execute`
**Purpose**: Execute agent workflows
**Required Features**:
- Workflow execution engine
- Resource allocation
- Execution monitoring
#### 2.3 Agent Status & Receipts
**Endpoints**:
- `GET /v1/agents/executions/{execution_id}`
- `GET /v1/agents/executions/{execution_id}/receipt`
**Purpose**: Monitor agent execution and get verifiable receipts
### Priority 3: Swarm Intelligence (Medium Impact)
#### 3.1 Swarm Join Endpoint
**Endpoint**: `POST /v1/swarm/join`
**Purpose**: Join agent swarms for collective optimization
**Required Features**:
```python
@app.post("/v1/swarm/join", response_model=SwarmJoinView)
async def join_swarm(
swarm_data: SwarmJoinRequest,
session: SessionDep,
client_id: str = Depends(require_client_key()),
) -> SwarmJoinView:
```
#### 3.2 Swarm Coordination
**Endpoint**: `POST /v1/swarm/coordinate`
**Purpose**: Coordinate swarm task execution
**Required Features**:
- Task distribution
- Result aggregation
- Consensus mechanisms
### Priority 4: Enhanced Client Features (Low Impact)
#### 4.1 Job Management
**Endpoints**:
- `DELETE /v1/jobs/{job_id}` (Cancel job)
- `GET /v1/jobs/{job_id}/receipt` (Job receipt)
- `GET /v1/explorer/receipts` (List receipts)
#### 4.2 Payment System
**Endpoints**:
- `POST /v1/payments` (Create payment)
- `GET /v1/payments/{payment_id}/status` (Payment status)
- `GET /v1/payments/{payment_id}/receipt` (Payment receipt)
#### 4.3 Block Integration
**Endpoint**: `GET /v1/explorer/blocks`
**Purpose**: List recent blocks for client context
## 🏗️ Implementation Strategy
### Phase 1: Core Job System (Week 1-2)
1. **Job Submission API**
- Implement basic job queue
- Add job validation and routing
- Create job status tracking
2. **Job Execution Engine**
- Connect to AI model inference
- Implement job processing pipeline
- Add result storage and retrieval
3. **Testing & Validation**
- End-to-end job submission tests
- Performance benchmarking
- Error handling validation
### Phase 2: Agent System (Week 3-4)
1. **Agent Workflow Engine**
- Workflow definition and storage
- Execution orchestration
- Resource management
2. **Agent Integration**
- Connect to AI agent frameworks
- Implement agent communication
- Add monitoring and logging
### Phase 3: Swarm Intelligence (Week 5-6)
1. **Swarm Coordination**
- Implement swarm algorithms
- Add task distribution logic
- Create result aggregation
2. **Swarm Optimization**
- Performance tuning
- Load balancing
- Fault tolerance
### Phase 4: Enhanced Features (Week 7-8)
1. **Payment Integration**
- Payment processing
- Escrow management
- Receipt generation
2. **Advanced Features**
- Batch job optimization
- Template system integration
- Advanced filtering and search
## 📊 Technical Requirements
### Database Schema Updates
```sql
-- Jobs Table
CREATE TABLE jobs (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
client_id VARCHAR(255) NOT NULL,
type VARCHAR(50) NOT NULL,
payload JSONB NOT NULL,
status VARCHAR(20) DEFAULT 'queued',
result JSONB,
created_at TIMESTAMP DEFAULT NOW(),
updated_at TIMESTAMP DEFAULT NOW(),
ttl_seconds INTEGER DEFAULT 900
);
-- Agent Workflows Table
CREATE TABLE agent_workflows (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
name VARCHAR(255) NOT NULL,
description TEXT,
workflow_definition JSONB NOT NULL,
client_id VARCHAR(255) NOT NULL,
created_at TIMESTAMP DEFAULT NOW()
);
-- Swarm Members Table
CREATE TABLE swarm_members (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
swarm_id UUID NOT NULL,
agent_id VARCHAR(255) NOT NULL,
role VARCHAR(50) NOT NULL,
capability VARCHAR(100),
joined_at TIMESTAMP DEFAULT NOW()
);
```
### Service Dependencies
1. **AI Model Integration**: Connect to Ollama or other inference services
2. **Message Queue**: Redis/RabbitMQ for job queuing
3. **Storage**: Database for job and agent state
4. **Monitoring**: Metrics and logging for observability
### API Documentation
- OpenAPI/Swagger specifications
- Request/response examples
- Error code documentation
- Rate limiting information
## 🔧 Development Environment Setup
### Local Development
```bash
# Start coordinator API with job endpoints
cd /opt/aitbc/apps/coordinator-api
.venv/bin/python -m uvicorn app.main:app --reload --port 8000
# Test with CLI
aitbc client submit --prompt "test" --model gemma3:1b
```
### Testing Strategy
1. **Unit Tests**: Individual endpoint testing
2. **Integration Tests**: End-to-end workflow testing
3. **Load Tests**: Performance under load
4. **Security Tests**: Authentication and authorization
## 📈 Success Metrics
### Phase 1 Success Criteria
- [ ] Job submission working end-to-end
- [ ] 100+ concurrent job support
- [ ] <2s average job submission time
- [ ] 99.9% uptime for job APIs
### Phase 2 Success Criteria
- [ ] Agent workflow creation and execution
- [ ] Multi-agent coordination working
- [ ] Agent receipt generation
- [ ] Resource utilization optimization
### Phase 3 Success Criteria
- [ ] Swarm join and coordination
- [ ] Collective optimization results
- [ ] Swarm performance metrics
- [ ] Fault tolerance testing
### Phase 4 Success Criteria
- [ ] Payment system integration
- [ ] Advanced client features
- [ ] Full CLI functionality
- [ ] Production readiness
## 🚀 Deployment Plan
### Staging Environment
1. **Infrastructure Setup**: Deploy to staging cluster
2. **Database Migration**: Apply schema updates
3. **Service Configuration**: Configure all endpoints
4. **Integration Testing**: Full workflow testing
### Production Deployment
1. **Blue-Green Deployment**: Zero-downtime deployment
2. **Monitoring Setup**: Metrics and alerting
3. **Performance Tuning**: Optimize for production load
4. **Documentation Update**: Update API documentation
## 📝 Next Steps
### Immediate Actions (This Week)
1. **Implement Job Submission**: Start with basic `/v1/jobs` endpoint
2. **Database Setup**: Create required tables and indexes
3. **Testing Framework**: Set up automated testing
4. **CLI Integration**: Test with existing CLI commands
### Short Term (2-4 Weeks)
1. **Complete Job System**: Full job lifecycle management
2. **Agent System**: Basic agent workflow support
3. **Performance Optimization**: Optimize for production load
4. **Documentation**: Complete API documentation
### Long Term (1-2 Months)
1. **Swarm Intelligence**: Full swarm coordination
2. **Advanced Features**: Payment system, advanced filtering
3. **Production Deployment**: Full production readiness
4. **Monitoring & Analytics**: Comprehensive observability
---
**Summary**: The CLI is 97% complete and ready for production use. The main remaining work is implementing the backend endpoints to support full end-to-end functionality. This roadmap provides a clear path to 100% completion.