aitbc/.windsurf/skills/openclaw-error-handler.md

---
description: Atomic OpenClaw error detection and recovery procedures with deterministic outputs
title: openclaw-error-handler
version: 1.0
---

# OpenClaw Error Handler

## Purpose
Detect, diagnose, and recover from errors in OpenClaw agent operations with systematic error handling and recovery procedures.

## Activation
Trigger when user requests error handling: error diagnosis, recovery procedures, error analysis, or system health checks.

## Input
```json
{
  "operation": "detect|diagnose|recover|analyze",
  "agent": "agent_name",
  "error_type": "execution|communication|configuration|timeout|unknown",
  "error_context": "string (optional)",
  "recovery_strategy": "auto|manual|rollback|retry"
}
```

## Output
```json
{
  "summary": "Error handling operation completed successfully",
  "operation": "detect|diagnose|recover|analyze",
  "agent": "agent_name",
  "error_detected": {
    "type": "string",
    "severity": "critical|high|medium|low",
    "timestamp": "number",
    "context": "string"
  },
  "diagnosis": {
    "root_cause": "string",
    "affected_components": ["component1", "component2"],
    "impact_assessment": "string"
  },
  "recovery_applied": {
    "strategy": "string",
    "actions_taken": ["action1", "action2"],
    "success": "boolean"
  },
  "issues": [],
  "recommendations": [],
  "confidence": 1.0,
  "execution_time": "number",
  "validation_status": "success|partial|failed"
}
```

## Process

### 1. Analyze
- Scan agent logs for errors
- Identify error patterns
- Assess error severity
- Determine error scope

### 2. Diagnose
- Analyze root cause
- Trace error propagation
- Identify affected components
- Assess impact

### 3. Execute Recovery
- Select recovery strategy
- Apply recovery actions
- Monitor recovery progress
- Validate recovery success

### 4. Validate
- Verify error resolution
- Check system stability
- Validate agent functionality
- Confirm no side effects

## Constraints
- **MUST NOT** modify critical system files
- **MUST NOT** exceed 60 seconds for error diagnosis
- **MUST** preserve error logs for analysis
- **MUST** validate recovery before applying
- **MUST** rollback on recovery failure

## Environment Assumptions
- Agent logs accessible at `/var/log/aitbc/`
- Error tracking system functional
- Recovery procedures documented
- Agent state persistence available
- System monitoring active

## Error Handling
- Recovery failure → Attempt alternative recovery strategy
- Multiple errors → Prioritize by severity
- Unknown error type → Apply generic recovery procedure
- System instability → Emergency rollback

## Example Usage Prompt

```
Diagnose and recover from execution errors in main agent
```

## Expected Output Example

```json
{
  "summary": "Error diagnosed and recovered successfully in main agent",
  "operation": "recover",
  "agent": "main",
  "error_detected": {
    "type": "execution",
    "severity": "high",
    "timestamp": 1775811500,
    "context": "Transaction processing timeout during blockchain sync"
  },
  "diagnosis": {
    "root_cause": "Network latency causing P2P sync timeout",
    "affected_components": ["p2p_network", "transaction_processor"],
    "impact_assessment": "Delayed transaction processing, no data loss"
  },
  "recovery_applied": {
    "strategy": "retry",
    "actions_taken": ["Increased timeout threshold", "Retried transaction processing"],
    "success": true
  },
  "issues": [],
  "recommendations": ["Monitor network latency for future occurrences", "Consider implementing adaptive timeout"],
  "confidence": 1.0,
  "execution_time": 18.3,
  "validation_status": "success"
}
```

## Model Routing Suggestion

**Reasoning Model** (Claude Sonnet, GPT-4)
- Complex error diagnosis
- Root cause analysis
- Recovery strategy selection
- Impact assessment

**Performance Notes**
- **Execution Time**: 5-30 seconds for detection, 15-45 seconds for diagnosis, 10-60 seconds for recovery
- **Memory Usage**: <150MB for error handling operations
- **Network Requirements**: Agent communication for error context
- **Concurrency**: Safe for sequential error handling on different agents