Files
aitbc/.windsurf/skills/openclaw-error-handler.md
aitbc 084dcdef31
Some checks failed
Security Scanning / security-scan (push) Has been cancelled
Documentation Validation / validate-docs (push) Has been cancelled
Integration Tests / test-service-integration (push) Has been cancelled
Python Tests / test-python (push) Has been cancelled
docs: update refactoring summary and mastery plan to reflect completion of all 11 atomic skills
- Mark Phase 2 as completed with all 11/11 atomic skills created
- Update skill counts: AITBC skills (6/6), OpenClaw skills (5/5)
- Move aitbc-node-coordinator and aitbc-analytics-analyzer from remaining to completed
- Update Phase 3 status from PLANNED to IN PROGRESS
- Add Gitea-based node synchronization documentation (replaces SCP)
- Clarify two-node architecture with same port (8006) on different I
2026-04-10 12:46:09 +02:00

152 lines
4.0 KiB
Markdown

---
description: Atomic OpenClaw error detection and recovery procedures with deterministic outputs
title: openclaw-error-handler
version: 1.0
---
# OpenClaw Error Handler
## Purpose
Detect, diagnose, and recover from errors in OpenClaw agent operations with systematic error handling and recovery procedures.
## Activation
Trigger when user requests error handling: error diagnosis, recovery procedures, error analysis, or system health checks.
## Input
```json
{
"operation": "detect|diagnose|recover|analyze",
"agent": "agent_name",
"error_type": "execution|communication|configuration|timeout|unknown",
"error_context": "string (optional)",
"recovery_strategy": "auto|manual|rollback|retry"
}
```
## Output
```json
{
"summary": "Error handling operation completed successfully",
"operation": "detect|diagnose|recover|analyze",
"agent": "agent_name",
"error_detected": {
"type": "string",
"severity": "critical|high|medium|low",
"timestamp": "number",
"context": "string"
},
"diagnosis": {
"root_cause": "string",
"affected_components": ["component1", "component2"],
"impact_assessment": "string"
},
"recovery_applied": {
"strategy": "string",
"actions_taken": ["action1", "action2"],
"success": "boolean"
},
"issues": [],
"recommendations": [],
"confidence": 1.0,
"execution_time": "number",
"validation_status": "success|partial|failed"
}
```
## Process
### 1. Analyze
- Scan agent logs for errors
- Identify error patterns
- Assess error severity
- Determine error scope
### 2. Diagnose
- Analyze root cause
- Trace error propagation
- Identify affected components
- Assess impact
### 3. Execute Recovery
- Select recovery strategy
- Apply recovery actions
- Monitor recovery progress
- Validate recovery success
### 4. Validate
- Verify error resolution
- Check system stability
- Validate agent functionality
- Confirm no side effects
## Constraints
- **MUST NOT** modify critical system files
- **MUST NOT** exceed 60 seconds for error diagnosis
- **MUST** preserve error logs for analysis
- **MUST** validate recovery before applying
- **MUST** rollback on recovery failure
## Environment Assumptions
- Agent logs accessible at `/var/log/aitbc/`
- Error tracking system functional
- Recovery procedures documented
- Agent state persistence available
- System monitoring active
## Error Handling
- Recovery failure → Attempt alternative recovery strategy
- Multiple errors → Prioritize by severity
- Unknown error type → Apply generic recovery procedure
- System instability → Emergency rollback
## Example Usage Prompt
```
Diagnose and recover from execution errors in main agent
```
## Expected Output Example
```json
{
"summary": "Error diagnosed and recovered successfully in main agent",
"operation": "recover",
"agent": "main",
"error_detected": {
"type": "execution",
"severity": "high",
"timestamp": 1775811500,
"context": "Transaction processing timeout during blockchain sync"
},
"diagnosis": {
"root_cause": "Network latency causing P2P sync timeout",
"affected_components": ["p2p_network", "transaction_processor"],
"impact_assessment": "Delayed transaction processing, no data loss"
},
"recovery_applied": {
"strategy": "retry",
"actions_taken": ["Increased timeout threshold", "Retried transaction processing"],
"success": true
},
"issues": [],
"recommendations": ["Monitor network latency for future occurrences", "Consider implementing adaptive timeout"],
"confidence": 1.0,
"execution_time": 18.3,
"validation_status": "success"
}
```
## Model Routing Suggestion
**Reasoning Model** (Claude Sonnet, GPT-4)
- Complex error diagnosis
- Root cause analysis
- Recovery strategy selection
- Impact assessment
**Performance Notes**
- **Execution Time**: 5-30 seconds for detection, 15-45 seconds for diagnosis, 10-60 seconds for recovery
- **Memory Usage**: <150MB for error handling operations
- **Network Requirements**: Agent communication for error context
- **Concurrency**: Safe for sequential error handling on different agents