Files
aitbc/.windsurf/skills/openclaw-error-handler.md
aitbc 084dcdef31
Some checks failed
Security Scanning / security-scan (push) Has been cancelled
Documentation Validation / validate-docs (push) Has been cancelled
Integration Tests / test-service-integration (push) Has been cancelled
Python Tests / test-python (push) Has been cancelled
docs: update refactoring summary and mastery plan to reflect completion of all 11 atomic skills
- Mark Phase 2 as completed with all 11/11 atomic skills created
- Update skill counts: AITBC skills (6/6), OpenClaw skills (5/5)
- Move aitbc-node-coordinator and aitbc-analytics-analyzer from remaining to completed
- Update Phase 3 status from PLANNED to IN PROGRESS
- Add Gitea-based node synchronization documentation (replaces SCP)
- Clarify two-node architecture with same port (8006) on different I
2026-04-10 12:46:09 +02:00

4.0 KiB

description, title, version
description title version
Atomic OpenClaw error detection and recovery procedures with deterministic outputs openclaw-error-handler 1.0

OpenClaw Error Handler

Purpose

Detect, diagnose, and recover from errors in OpenClaw agent operations with systematic error handling and recovery procedures.

Activation

Trigger when user requests error handling: error diagnosis, recovery procedures, error analysis, or system health checks.

Input

{
  "operation": "detect|diagnose|recover|analyze",
  "agent": "agent_name",
  "error_type": "execution|communication|configuration|timeout|unknown",
  "error_context": "string (optional)",
  "recovery_strategy": "auto|manual|rollback|retry"
}

Output

{
  "summary": "Error handling operation completed successfully",
  "operation": "detect|diagnose|recover|analyze",
  "agent": "agent_name",
  "error_detected": {
    "type": "string",
    "severity": "critical|high|medium|low",
    "timestamp": "number",
    "context": "string"
  },
  "diagnosis": {
    "root_cause": "string",
    "affected_components": ["component1", "component2"],
    "impact_assessment": "string"
  },
  "recovery_applied": {
    "strategy": "string",
    "actions_taken": ["action1", "action2"],
    "success": "boolean"
  },
  "issues": [],
  "recommendations": [],
  "confidence": 1.0,
  "execution_time": "number",
  "validation_status": "success|partial|failed"
}

Process

1. Analyze

  • Scan agent logs for errors
  • Identify error patterns
  • Assess error severity
  • Determine error scope

2. Diagnose

  • Analyze root cause
  • Trace error propagation
  • Identify affected components
  • Assess impact

3. Execute Recovery

  • Select recovery strategy
  • Apply recovery actions
  • Monitor recovery progress
  • Validate recovery success

4. Validate

  • Verify error resolution
  • Check system stability
  • Validate agent functionality
  • Confirm no side effects

Constraints

  • MUST NOT modify critical system files
  • MUST NOT exceed 60 seconds for error diagnosis
  • MUST preserve error logs for analysis
  • MUST validate recovery before applying
  • MUST rollback on recovery failure

Environment Assumptions

  • Agent logs accessible at /var/log/aitbc/
  • Error tracking system functional
  • Recovery procedures documented
  • Agent state persistence available
  • System monitoring active

Error Handling

  • Recovery failure → Attempt alternative recovery strategy
  • Multiple errors → Prioritize by severity
  • Unknown error type → Apply generic recovery procedure
  • System instability → Emergency rollback

Example Usage Prompt

Diagnose and recover from execution errors in main agent

Expected Output Example

{
  "summary": "Error diagnosed and recovered successfully in main agent",
  "operation": "recover",
  "agent": "main",
  "error_detected": {
    "type": "execution",
    "severity": "high",
    "timestamp": 1775811500,
    "context": "Transaction processing timeout during blockchain sync"
  },
  "diagnosis": {
    "root_cause": "Network latency causing P2P sync timeout",
    "affected_components": ["p2p_network", "transaction_processor"],
    "impact_assessment": "Delayed transaction processing, no data loss"
  },
  "recovery_applied": {
    "strategy": "retry",
    "actions_taken": ["Increased timeout threshold", "Retried transaction processing"],
    "success": true
  },
  "issues": [],
  "recommendations": ["Monitor network latency for future occurrences", "Consider implementing adaptive timeout"],
  "confidence": 1.0,
  "execution_time": 18.3,
  "validation_status": "success"
}

Model Routing Suggestion

Reasoning Model (Claude Sonnet, GPT-4)

  • Complex error diagnosis
  • Root cause analysis
  • Recovery strategy selection
  • Impact assessment

Performance Notes

  • Execution Time: 5-30 seconds for detection, 15-45 seconds for diagnosis, 10-60 seconds for recovery
  • Memory Usage: <150MB for error handling operations
  • Network Requirements: Agent communication for error context
  • Concurrency: Safe for sequential error handling on different agents