Some checks failed
- Mark Phase 2 as completed with all 11/11 atomic skills created - Update skill counts: AITBC skills (6/6), OpenClaw skills (5/5) - Move aitbc-node-coordinator and aitbc-analytics-analyzer from remaining to completed - Update Phase 3 status from PLANNED to IN PROGRESS - Add Gitea-based node synchronization documentation (replaces SCP) - Clarify two-node architecture with same port (8006) on different I
4.0 KiB
4.0 KiB
description, title, version
| description | title | version |
|---|---|---|
| Atomic OpenClaw error detection and recovery procedures with deterministic outputs | openclaw-error-handler | 1.0 |
OpenClaw Error Handler
Purpose
Detect, diagnose, and recover from errors in OpenClaw agent operations with systematic error handling and recovery procedures.
Activation
Trigger when user requests error handling: error diagnosis, recovery procedures, error analysis, or system health checks.
Input
{
"operation": "detect|diagnose|recover|analyze",
"agent": "agent_name",
"error_type": "execution|communication|configuration|timeout|unknown",
"error_context": "string (optional)",
"recovery_strategy": "auto|manual|rollback|retry"
}
Output
{
"summary": "Error handling operation completed successfully",
"operation": "detect|diagnose|recover|analyze",
"agent": "agent_name",
"error_detected": {
"type": "string",
"severity": "critical|high|medium|low",
"timestamp": "number",
"context": "string"
},
"diagnosis": {
"root_cause": "string",
"affected_components": ["component1", "component2"],
"impact_assessment": "string"
},
"recovery_applied": {
"strategy": "string",
"actions_taken": ["action1", "action2"],
"success": "boolean"
},
"issues": [],
"recommendations": [],
"confidence": 1.0,
"execution_time": "number",
"validation_status": "success|partial|failed"
}
Process
1. Analyze
- Scan agent logs for errors
- Identify error patterns
- Assess error severity
- Determine error scope
2. Diagnose
- Analyze root cause
- Trace error propagation
- Identify affected components
- Assess impact
3. Execute Recovery
- Select recovery strategy
- Apply recovery actions
- Monitor recovery progress
- Validate recovery success
4. Validate
- Verify error resolution
- Check system stability
- Validate agent functionality
- Confirm no side effects
Constraints
- MUST NOT modify critical system files
- MUST NOT exceed 60 seconds for error diagnosis
- MUST preserve error logs for analysis
- MUST validate recovery before applying
- MUST rollback on recovery failure
Environment Assumptions
- Agent logs accessible at
/var/log/aitbc/ - Error tracking system functional
- Recovery procedures documented
- Agent state persistence available
- System monitoring active
Error Handling
- Recovery failure → Attempt alternative recovery strategy
- Multiple errors → Prioritize by severity
- Unknown error type → Apply generic recovery procedure
- System instability → Emergency rollback
Example Usage Prompt
Diagnose and recover from execution errors in main agent
Expected Output Example
{
"summary": "Error diagnosed and recovered successfully in main agent",
"operation": "recover",
"agent": "main",
"error_detected": {
"type": "execution",
"severity": "high",
"timestamp": 1775811500,
"context": "Transaction processing timeout during blockchain sync"
},
"diagnosis": {
"root_cause": "Network latency causing P2P sync timeout",
"affected_components": ["p2p_network", "transaction_processor"],
"impact_assessment": "Delayed transaction processing, no data loss"
},
"recovery_applied": {
"strategy": "retry",
"actions_taken": ["Increased timeout threshold", "Retried transaction processing"],
"success": true
},
"issues": [],
"recommendations": ["Monitor network latency for future occurrences", "Consider implementing adaptive timeout"],
"confidence": 1.0,
"execution_time": 18.3,
"validation_status": "success"
}
Model Routing Suggestion
Reasoning Model (Claude Sonnet, GPT-4)
- Complex error diagnosis
- Root cause analysis
- Recovery strategy selection
- Impact assessment
Performance Notes
- Execution Time: 5-30 seconds for detection, 15-45 seconds for diagnosis, 10-60 seconds for recovery
- Memory Usage: <150MB for error handling operations
- Network Requirements: Agent communication for error context
- Concurrency: Safe for sequential error handling on different agents