Some checks failed
- Mark Phase 2 as completed with all 11/11 atomic skills created - Update skill counts: AITBC skills (6/6), OpenClaw skills (5/5) - Move aitbc-node-coordinator and aitbc-analytics-analyzer from remaining to completed - Update Phase 3 status from PLANNED to IN PROGRESS - Add Gitea-based node synchronization documentation (replaces SCP) - Clarify two-node architecture with same port (8006) on different I
152 lines
4.0 KiB
Markdown
152 lines
4.0 KiB
Markdown
---
|
|
description: Atomic OpenClaw error detection and recovery procedures with deterministic outputs
|
|
title: openclaw-error-handler
|
|
version: 1.0
|
|
---
|
|
|
|
# OpenClaw Error Handler
|
|
|
|
## Purpose
|
|
Detect, diagnose, and recover from errors in OpenClaw agent operations with systematic error handling and recovery procedures.
|
|
|
|
## Activation
|
|
Trigger when user requests error handling: error diagnosis, recovery procedures, error analysis, or system health checks.
|
|
|
|
## Input
|
|
```json
|
|
{
|
|
"operation": "detect|diagnose|recover|analyze",
|
|
"agent": "agent_name",
|
|
"error_type": "execution|communication|configuration|timeout|unknown",
|
|
"error_context": "string (optional)",
|
|
"recovery_strategy": "auto|manual|rollback|retry"
|
|
}
|
|
```
|
|
|
|
## Output
|
|
```json
|
|
{
|
|
"summary": "Error handling operation completed successfully",
|
|
"operation": "detect|diagnose|recover|analyze",
|
|
"agent": "agent_name",
|
|
"error_detected": {
|
|
"type": "string",
|
|
"severity": "critical|high|medium|low",
|
|
"timestamp": "number",
|
|
"context": "string"
|
|
},
|
|
"diagnosis": {
|
|
"root_cause": "string",
|
|
"affected_components": ["component1", "component2"],
|
|
"impact_assessment": "string"
|
|
},
|
|
"recovery_applied": {
|
|
"strategy": "string",
|
|
"actions_taken": ["action1", "action2"],
|
|
"success": "boolean"
|
|
},
|
|
"issues": [],
|
|
"recommendations": [],
|
|
"confidence": 1.0,
|
|
"execution_time": "number",
|
|
"validation_status": "success|partial|failed"
|
|
}
|
|
```
|
|
|
|
## Process
|
|
|
|
### 1. Analyze
|
|
- Scan agent logs for errors
|
|
- Identify error patterns
|
|
- Assess error severity
|
|
- Determine error scope
|
|
|
|
### 2. Diagnose
|
|
- Analyze root cause
|
|
- Trace error propagation
|
|
- Identify affected components
|
|
- Assess impact
|
|
|
|
### 3. Execute Recovery
|
|
- Select recovery strategy
|
|
- Apply recovery actions
|
|
- Monitor recovery progress
|
|
- Validate recovery success
|
|
|
|
### 4. Validate
|
|
- Verify error resolution
|
|
- Check system stability
|
|
- Validate agent functionality
|
|
- Confirm no side effects
|
|
|
|
## Constraints
|
|
- **MUST NOT** modify critical system files
|
|
- **MUST NOT** exceed 60 seconds for error diagnosis
|
|
- **MUST** preserve error logs for analysis
|
|
- **MUST** validate recovery before applying
|
|
- **MUST** rollback on recovery failure
|
|
|
|
## Environment Assumptions
|
|
- Agent logs accessible at `/var/log/aitbc/`
|
|
- Error tracking system functional
|
|
- Recovery procedures documented
|
|
- Agent state persistence available
|
|
- System monitoring active
|
|
|
|
## Error Handling
|
|
- Recovery failure → Attempt alternative recovery strategy
|
|
- Multiple errors → Prioritize by severity
|
|
- Unknown error type → Apply generic recovery procedure
|
|
- System instability → Emergency rollback
|
|
|
|
## Example Usage Prompt
|
|
|
|
```
|
|
Diagnose and recover from execution errors in main agent
|
|
```
|
|
|
|
## Expected Output Example
|
|
|
|
```json
|
|
{
|
|
"summary": "Error diagnosed and recovered successfully in main agent",
|
|
"operation": "recover",
|
|
"agent": "main",
|
|
"error_detected": {
|
|
"type": "execution",
|
|
"severity": "high",
|
|
"timestamp": 1775811500,
|
|
"context": "Transaction processing timeout during blockchain sync"
|
|
},
|
|
"diagnosis": {
|
|
"root_cause": "Network latency causing P2P sync timeout",
|
|
"affected_components": ["p2p_network", "transaction_processor"],
|
|
"impact_assessment": "Delayed transaction processing, no data loss"
|
|
},
|
|
"recovery_applied": {
|
|
"strategy": "retry",
|
|
"actions_taken": ["Increased timeout threshold", "Retried transaction processing"],
|
|
"success": true
|
|
},
|
|
"issues": [],
|
|
"recommendations": ["Monitor network latency for future occurrences", "Consider implementing adaptive timeout"],
|
|
"confidence": 1.0,
|
|
"execution_time": 18.3,
|
|
"validation_status": "success"
|
|
}
|
|
```
|
|
|
|
## Model Routing Suggestion
|
|
|
|
**Reasoning Model** (Claude Sonnet, GPT-4)
|
|
- Complex error diagnosis
|
|
- Root cause analysis
|
|
- Recovery strategy selection
|
|
- Impact assessment
|
|
|
|
**Performance Notes**
|
|
- **Execution Time**: 5-30 seconds for detection, 15-45 seconds for diagnosis, 10-60 seconds for recovery
|
|
- **Memory Usage**: <150MB for error handling operations
|
|
- **Network Requirements**: Agent communication for error context
|
|
- **Concurrency**: Safe for sequential error handling on different agents
|