Files
aitbc/.windsurf/skills/log-monitor.md
aitbc 607ad2d434
Some checks failed
Blockchain Synchronization Verification / sync-verification (push) Successful in 3s
Integration Tests / test-service-integration (push) Failing after 9s
Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s
P2P Network Verification / p2p-verification (push) Successful in 2s
Python Tests / test-python (push) Successful in 11s
Security Scanning / security-scan (push) Successful in 31s
Remove debug logging and update default chain to ait-mainnet
2026-04-22 14:20:49 +02:00

11 KiB

description, title, version
description title version
Autonomous AI skill for monitoring journalctl and logfiles across all AITBC nodes AITBC Log Monitor 1.0

AITBC Log Monitor Skill

Purpose

Autonomous AI skill for real-time monitoring of journalctl logs and AITBC logfiles across all nodes (aitbc, aitbc1, gitea-runner). Provides error detection, alerting, and cross-node log correlation for aitbc-* systemd services and application logs.

Activation

Activate this skill when:

  • Real-time log monitoring is needed across all AITBC nodes
  • Error detection and alerting is required for aitbc-* services
  • Cross-node log correlation is needed for troubleshooting
  • Service health monitoring is required
  • Log analysis for debugging or investigation is needed

Input Schema

{
  "monitoring_mode": {
    "type": "string",
    "enum": ["realtime", "historical", "error_only", "full"],
    "description": "Monitoring mode for logs"
  },
  "services": {
    "type": "array",
    "items": {"type": "string"},
    "description": "Specific aitbc-* services to monitor (empty = all services)"
  },
  "nodes": {
    "type": "array",
    "items": {"type": "string", "enum": ["aitbc", "aitbc1", "gitea-runner", "all"]},
    "description": "Nodes to monitor (default: all)"
  },
  "log_paths": {
    "type": "array",
    "items": {"type": "string"},
    "description": "Additional log paths to monitor in /var/log/aitbc/"
  },
  "error_keywords": {
    "type": "array",
    "items": {"type": "string"},
    "description": "Keywords to trigger error alerts (default: ERROR, CRITICAL, FAILED, exception)"
  },
  "alert_threshold": {
    "type": "integer",
    "default": 5,
    "description": "Number of errors before triggering alert"
  },
  "duration": {
    "type": "integer",
    "description": "Monitoring duration in seconds (null = indefinite)"
  }
}

Output Schema

{
  "monitoring_status": {
    "type": "string",
    "enum": ["active", "completed", "stopped", "error"]
  },
  "nodes_monitored": {
    "type": "array",
    "items": {"type": "string"}
  },
  "services_monitored": {
    "type": "array",
    "items": {"type": "string"}
  },
  "error_summary": {
    "type": "object",
    "properties": {
      "total_errors": {"type": "integer"},
      "by_service": {"type": "object"},
      "by_node": {"type": "object"},
      "recent_errors": {"type": "array"}
    }
  },
  "alerts_triggered": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "timestamp": {"type": "string"},
        "node": {"type": "string"},
        "service": {"type": "string"},
        "message": {"type": "string"},
        "severity": {"type": "string"}
      }
    }
  },
  "log_samples": {
    "type": "object",
    "description": "Sample log entries from each service"
  },
  "recommendations": {
    "type": "array",
    "items": {"type": "string"}
  }
}

Process

1. Discover aitbc-* Services

# Get list of all aitbc-* services on each node
echo "=== aitbc services ==="
systemctl list-units --all | grep "aitbc-"

echo "=== aitbc1 services ==="
ssh aitbc1 'systemctl list-units --all | grep "aitbc-"'

echo "=== gitea-runner services ==="
ssh gitea-runner 'systemctl list-units --all | grep "aitbc-"'

2. Start Journalctl Monitoring (Real-time)

# Monitor all aitbc-* services on each node in parallel
journalctl -f -u "aitbc-*" --no-pager > /tmp/aitbc-journalctl.log 2>&1 &
JOURNALCTL_PID=$!

ssh aitbc1 'journalctl -f -u "aitbc-*" --no-pager' > /tmp/aitbc1-journalctl.log 2>&1 &
AITBC1_PID=$!

ssh gitea-runner 'journalctl -f -u "aitbc-*" --no-pager' > /tmp/gitea-runner-journalctl.log 2>&1 &
GITEA_RUNNER_PID=$!

3. Monitor Application Logfiles

# Monitor /var/log/aitbc/ logfiles on each node
tail -f /var/log/aitbc/*.log > /tmp/aitbc-applogs.log 2>&1 &
APPLOGS_PID=$!

ssh aitbc1 'tail -f /var/log/aitbc/*.log' > /tmp/aitbc1-applogs.log 2>&1 &
AITBC1_APPLOGS_PID=$!

ssh gitea-runner 'tail -f /var/log/aitbc/*.log' > /tmp/gitea-runner-applogs.log 2>&1 &
GITEA_RUNNER_APPLOGS_PID=$!

4. Error Detection and Alerting

# Monitor logs for error keywords
tail -f /tmp/aitbc-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
    echo "[ALERT] aitbc: $line"
    # Increment error counter
    # Trigger alert if threshold exceeded
done &

tail -f /tmp/aitbc1-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
    echo "[ALERT] aitbc1: $line"
done &

tail -f /tmp/gitea-runner-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
    echo "[ALERT] gitea-runner: $line"
done &

5. Cross-Node Log Correlation

# Correlate events across nodes by timestamp
# Example: detect if a service fails on all nodes simultaneously
# Check for common error patterns across nodes
# Identify propagation of errors from one node to another

6. Historical Log Analysis (if requested)

# Analyze recent logs for patterns
journalctl -u "aitbc-*" --since "1 hour ago" --no-pager | grep -E "(ERROR|CRITICAL|FAILED)"
ssh aitbc1 'journalctl -u "aitbc-*" --since "1 hour ago" --no-pager' | grep -E "(ERROR|CRITICAL|FAILED)"
ssh gitea-runner 'journalctl -u "aitbc-*" --since "1 hour ago" --no-pager' | grep -E "(ERROR|CRITICAL|FAILED)"

7. Stop Monitoring

# Kill background processes when monitoring duration expires
kill $JOURNALCTL_PID $AITBC1_PID $GITEA_RUNNER_PID
kill $APPLOGS_PID $AITBC1_APPLOGS_PID $GITEA_RUNNER_APPLOGS_PID

Common aitbc-* Services

Primary Services

  • aitbc-blockchain-node.service - Main blockchain node
  • aitbc-blockchain-p2p.service - P2P network service
  • aitbc-blockchain-rpc.service - RPC API service
  • aitbc-agent-daemon.service - Agent listener daemon
  • aitbc-agent-coordinator.service - Agent coordinator
  • aitbc-agent-registry.service - Agent registry

Secondary Services

  • aitbc-marketplace.service - Marketplace service
  • aitbc-gpu-miner.service - GPU mining service
  • aitbc-monitor.service - System monitoring

Logfile Locations

Application Logs

  • /var/log/aitbc/blockchain-communication-test.log
  • /var/log/aitbc/blockchain-test-errors.log
  • /var/log/aitbc/training*.log
  • /var/log/aitbc/service_monitoring.log
  • /var/log/aitbc/service_alerts.log

Service-Specific Logs

  • /var/log/aitbc/blockchain-node/
  • /var/log/aitbc/agent-coordinator/
  • /var/log/aitbc/agent-registry/
  • /var/log/aitbc/gpu-marketplace/

Error Patterns to Monitor

Critical Errors

  • "FileNotFoundError" - Missing configuration or data files
  • "Permission denied" - File permission issues
  • "Connection refused" - Network connectivity issues
  • "state root mismatch" - Blockchain state corruption
  • "provided invalid or self node_id" - P2P identity conflicts

Warning Patterns

  • "Large sync gap" - Blockchain sync issues
  • "Contract endpoints not available" - Service unavailability
  • "Memory limit exceeded" - Resource exhaustion

Constraints

  • Maximum monitoring duration: 24 hours unless renewed
  • Cannot monitor more than 50 concurrent log streams
  • Alert threshold cannot be lower than 3 to avoid false positives
  • Must preserve log integrity - cannot modify original logs
  • Monitoring should not impact system performance significantly
  • SSH connections must be established and working for remote nodes

Environment Assumptions

  • SSH access to aitbc1 and gitea-runner configured
  • Log directory: /var/log/aitbc/
  • Systemd services: aitbc-* pattern
  • Journalctl available on all nodes
  • Sufficient disk space for log buffering
  • Network connectivity between nodes for cross-node correlation

Error Handling

SSH Connection Failure

  • Log connection error
  • Mark node as unavailable
  • Continue monitoring other nodes
  • Alert user about connectivity issue

Service Not Found

  • Skip missing services gracefully
  • Log service not found warning
  • Continue monitoring available services

Log File Access Denied

  • Log permission error
  • Check file permissions
  • Alert user if critical logs inaccessible

Buffer Overflow

  • Monitor log buffer size
  • Rotate buffers if needed
  • Alert if disk space insufficient

Example Usage Prompts

Basic Monitoring

"Monitor all aitbc-* services on all nodes in real-time mode."

Error-Only Monitoring

"Monitor for errors only across aitbc and aitbc1 nodes."

Specific Services

"Monitor aitbc-blockchain-node and aitbc-agent-daemon services on all nodes."

Historical Analysis

"Analyze the last 2 hours of logs for errors across all nodes."

Duration-Limited Monitoring

"Monitor all services for 30 minutes and report error summary."

Custom Error Keywords

"Monitor for 'state root mismatch' and 'P2P handshake' errors across all nodes."

Expected Output Example

{
  "monitoring_status": "completed",
  "nodes_monitored": ["aitbc", "aitbc1", "gitea-runner"],
  "services_monitored": ["aitbc-blockchain-node.service", "aitbc-blockchain-p2p.service", "aitbc-agent-daemon.service"],
  "error_summary": {
    "total_errors": 12,
    "by_service": {
      "aitbc-blockchain-node.service": 5,
      "aitbc-agent-daemon.service": 7
    },
    "by_node": {
      "aitbc": 3,
      "aitbc1": 9,
      "gitea-runner": 0
    },
    "recent_errors": [
      {
        "timestamp": "2026-04-22T14:10:15",
        "node": "aitbc1",
        "service": "aitbc-agent-daemon.service",
        "message": "FileNotFoundError: /var/lib/aitbc/keystore/.agent_daemon_password",
        "severity": "CRITICAL"
      }
    ]
  },
  "alerts_triggered": [
    {
      "timestamp": "2026-04-22T14:10:15",
      "node": "aitbc1",
      "service": "aitbc-agent-daemon.service",
      "message": "Agent daemon service failed due to missing keystore file",
      "severity": "CRITICAL"
    }
  ],
  "log_samples": {
    "aitbc-blockchain-node.service": "Latest 10 log entries...",
    "aitbc-agent-daemon.service": "Latest 10 log entries..."
  },
  "recommendations": [
    "Check keystore directory on aitbc1",
    "Verify agent daemon service configuration",
    "Monitor for additional file permission errors"
  ]
}

Model Routing

  • Fast Model: Use for basic monitoring and error detection
  • Reasoning Model: Use for complex log correlation, root cause analysis, cross-node pattern detection

Performance Notes

  • Memory Usage: ~100-200MB for log buffering
  • Network Impact: Minimal for journalctl, moderate for log file tailing
  • CPU Usage: Low for grep-based filtering, moderate for complex correlation
  • Disk Usage: Temporary log buffers (~50-100MB per node)
  • Latency: Near real-time for journalctl (~1-2s delay)