oib/aitbc

Fork 0

Files

aitbc 607ad2d434

Blockchain Synchronization Verification / sync-verification (push) Successful in 3s

Details

Integration Tests / test-service-integration (push) Failing after 9s

Details

Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s

Details

P2P Network Verification / p2p-verification (push) Successful in 2s

Details

Python Tests / test-python (push) Successful in 11s

Details

Security Scanning / security-scan (push) Successful in 31s

Details

Remove debug logging and update default chain to ait-mainnet

2026-04-22 14:20:49 +02:00

11 KiB

Raw Blame History

description, title, version

description	title	version
Autonomous AI skill for monitoring journalctl and logfiles across all AITBC nodes	AITBC Log Monitor	1.0

AITBC Log Monitor Skill

Purpose

Autonomous AI skill for real-time monitoring of journalctl logs and AITBC logfiles across all nodes (aitbc, aitbc1, gitea-runner). Provides error detection, alerting, and cross-node log correlation for aitbc-* systemd services and application logs.

Activation

Activate this skill when:

Real-time log monitoring is needed across all AITBC nodes
Error detection and alerting is required for aitbc-* services
Cross-node log correlation is needed for troubleshooting
Service health monitoring is required
Log analysis for debugging or investigation is needed

Input Schema

{
  "monitoring_mode": {
    "type": "string",
    "enum": ["realtime", "historical", "error_only", "full"],
    "description": "Monitoring mode for logs"
  },
  "services": {
    "type": "array",
    "items": {"type": "string"},
    "description": "Specific aitbc-* services to monitor (empty = all services)"
  },
  "nodes": {
    "type": "array",
    "items": {"type": "string", "enum": ["aitbc", "aitbc1", "gitea-runner", "all"]},
    "description": "Nodes to monitor (default: all)"
  },
  "log_paths": {
    "type": "array",
    "items": {"type": "string"},
    "description": "Additional log paths to monitor in /var/log/aitbc/"
  },
  "error_keywords": {
    "type": "array",
    "items": {"type": "string"},
    "description": "Keywords to trigger error alerts (default: ERROR, CRITICAL, FAILED, exception)"
  },
  "alert_threshold": {
    "type": "integer",
    "default": 5,
    "description": "Number of errors before triggering alert"
  },
  "duration": {
    "type": "integer",
    "description": "Monitoring duration in seconds (null = indefinite)"
  }
}

Output Schema

{
  "monitoring_status": {
    "type": "string",
    "enum": ["active", "completed", "stopped", "error"]
  },
  "nodes_monitored": {
    "type": "array",
    "items": {"type": "string"}
  },
  "services_monitored": {
    "type": "array",
    "items": {"type": "string"}
  },
  "error_summary": {
    "type": "object",
    "properties": {
      "total_errors": {"type": "integer"},
      "by_service": {"type": "object"},
      "by_node": {"type": "object"},
      "recent_errors": {"type": "array"}
    }
  },
  "alerts_triggered": {
    "type": "array",
    "items": {
      "type": "object",
      "properties": {
        "timestamp": {"type": "string"},
        "node": {"type": "string"},
        "service": {"type": "string"},
        "message": {"type": "string"},
        "severity": {"type": "string"}
      }
    }
  },
  "log_samples": {
    "type": "object",
    "description": "Sample log entries from each service"
  },
  "recommendations": {
    "type": "array",
    "items": {"type": "string"}
  }
}

Process

1. Discover aitbc-* Services

# Get list of all aitbc-* services on each node
echo "=== aitbc services ==="
systemctl list-units --all | grep "aitbc-"

echo "=== aitbc1 services ==="
ssh aitbc1 'systemctl list-units --all | grep "aitbc-"'

echo "=== gitea-runner services ==="
ssh gitea-runner 'systemctl list-units --all | grep "aitbc-"'

2. Start Journalctl Monitoring (Real-time)

# Monitor all aitbc-* services on each node in parallel
journalctl -f -u "aitbc-*" --no-pager > /tmp/aitbc-journalctl.log 2>&1 &
JOURNALCTL_PID=$!

ssh aitbc1 'journalctl -f -u "aitbc-*" --no-pager' > /tmp/aitbc1-journalctl.log 2>&1 &
AITBC1_PID=$!

ssh gitea-runner 'journalctl -f -u "aitbc-*" --no-pager' > /tmp/gitea-runner-journalctl.log 2>&1 &
GITEA_RUNNER_PID=$!

3. Monitor Application Logfiles

# Monitor /var/log/aitbc/ logfiles on each node
tail -f /var/log/aitbc/*.log > /tmp/aitbc-applogs.log 2>&1 &
APPLOGS_PID=$!

ssh aitbc1 'tail -f /var/log/aitbc/*.log' > /tmp/aitbc1-applogs.log 2>&1 &
AITBC1_APPLOGS_PID=$!

ssh gitea-runner 'tail -f /var/log/aitbc/*.log' > /tmp/gitea-runner-applogs.log 2>&1 &
GITEA_RUNNER_APPLOGS_PID=$!

4. Error Detection and Alerting

# Monitor logs for error keywords
tail -f /tmp/aitbc-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
    echo "[ALERT] aitbc: $line"
    # Increment error counter
    # Trigger alert if threshold exceeded
done &

tail -f /tmp/aitbc1-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
    echo "[ALERT] aitbc1: $line"
done &

tail -f /tmp/gitea-runner-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
    echo "[ALERT] gitea-runner: $line"
done &

5. Cross-Node Log Correlation

# Correlate events across nodes by timestamp
# Example: detect if a service fails on all nodes simultaneously
# Check for common error patterns across nodes
# Identify propagation of errors from one node to another

6. Historical Log Analysis (if requested)

# Analyze recent logs for patterns
journalctl -u "aitbc-*" --since "1 hour ago" --no-pager | grep -E "(ERROR|CRITICAL|FAILED)"
ssh aitbc1 'journalctl -u "aitbc-*" --since "1 hour ago" --no-pager' | grep -E "(ERROR|CRITICAL|FAILED)"
ssh gitea-runner 'journalctl -u "aitbc-*" --since "1 hour ago" --no-pager' | grep -E "(ERROR|CRITICAL|FAILED)"

7. Stop Monitoring

# Kill background processes when monitoring duration expires
kill $JOURNALCTL_PID $AITBC1_PID $GITEA_RUNNER_PID
kill $APPLOGS_PID $AITBC1_APPLOGS_PID $GITEA_RUNNER_APPLOGS_PID

Common aitbc-* Services

Primary Services

aitbc-blockchain-node.service - Main blockchain node
aitbc-blockchain-p2p.service - P2P network service
aitbc-blockchain-rpc.service - RPC API service
aitbc-agent-daemon.service - Agent listener daemon
aitbc-agent-coordinator.service - Agent coordinator
aitbc-agent-registry.service - Agent registry

Secondary Services

aitbc-marketplace.service - Marketplace service
aitbc-gpu-miner.service - GPU mining service
aitbc-monitor.service - System monitoring

Logfile Locations

Application Logs

/var/log/aitbc/blockchain-communication-test.log
/var/log/aitbc/blockchain-test-errors.log
/var/log/aitbc/training*.log
/var/log/aitbc/service_monitoring.log
/var/log/aitbc/service_alerts.log

Service-Specific Logs

/var/log/aitbc/blockchain-node/
/var/log/aitbc/agent-coordinator/
/var/log/aitbc/agent-registry/
/var/log/aitbc/gpu-marketplace/

Error Patterns to Monitor

Critical Errors

"FileNotFoundError" - Missing configuration or data files
"Permission denied" - File permission issues
"Connection refused" - Network connectivity issues
"state root mismatch" - Blockchain state corruption
"provided invalid or self node_id" - P2P identity conflicts

Warning Patterns

"Large sync gap" - Blockchain sync issues
"Contract endpoints not available" - Service unavailability
"Memory limit exceeded" - Resource exhaustion

Constraints

Maximum monitoring duration: 24 hours unless renewed
Cannot monitor more than 50 concurrent log streams
Alert threshold cannot be lower than 3 to avoid false positives
Must preserve log integrity - cannot modify original logs
Monitoring should not impact system performance significantly
SSH connections must be established and working for remote nodes

Environment Assumptions

SSH access to aitbc1 and gitea-runner configured
Log directory: /var/log/aitbc/
Systemd services: aitbc-* pattern
Journalctl available on all nodes
Sufficient disk space for log buffering
Network connectivity between nodes for cross-node correlation

Error Handling

SSH Connection Failure

Log connection error
Mark node as unavailable
Continue monitoring other nodes
Alert user about connectivity issue

Service Not Found

Skip missing services gracefully
Log service not found warning
Continue monitoring available services

Log File Access Denied

Log permission error
Check file permissions
Alert user if critical logs inaccessible

Buffer Overflow

Monitor log buffer size
Rotate buffers if needed
Alert if disk space insufficient

Example Usage Prompts

Basic Monitoring

"Monitor all aitbc-* services on all nodes in real-time mode."

Error-Only Monitoring

"Monitor for errors only across aitbc and aitbc1 nodes."

Specific Services

"Monitor aitbc-blockchain-node and aitbc-agent-daemon services on all nodes."

Historical Analysis

"Analyze the last 2 hours of logs for errors across all nodes."

Duration-Limited Monitoring

"Monitor all services for 30 minutes and report error summary."

Custom Error Keywords

"Monitor for 'state root mismatch' and 'P2P handshake' errors across all nodes."

Expected Output Example

{
  "monitoring_status": "completed",
  "nodes_monitored": ["aitbc", "aitbc1", "gitea-runner"],
  "services_monitored": ["aitbc-blockchain-node.service", "aitbc-blockchain-p2p.service", "aitbc-agent-daemon.service"],
  "error_summary": {
    "total_errors": 12,
    "by_service": {
      "aitbc-blockchain-node.service": 5,
      "aitbc-agent-daemon.service": 7
    },
    "by_node": {
      "aitbc": 3,
      "aitbc1": 9,
      "gitea-runner": 0
    },
    "recent_errors": [
      {
        "timestamp": "2026-04-22T14:10:15",
        "node": "aitbc1",
        "service": "aitbc-agent-daemon.service",
        "message": "FileNotFoundError: /var/lib/aitbc/keystore/.agent_daemon_password",
        "severity": "CRITICAL"
      }
    ]
  },
  "alerts_triggered": [
    {
      "timestamp": "2026-04-22T14:10:15",
      "node": "aitbc1",
      "service": "aitbc-agent-daemon.service",
      "message": "Agent daemon service failed due to missing keystore file",
      "severity": "CRITICAL"
    }
  ],
  "log_samples": {
    "aitbc-blockchain-node.service": "Latest 10 log entries...",
    "aitbc-agent-daemon.service": "Latest 10 log entries..."
  },
  "recommendations": [
    "Check keystore directory on aitbc1",
    "Verify agent daemon service configuration",
    "Monitor for additional file permission errors"
  ]
}

Model Routing

Fast Model: Use for basic monitoring and error detection
Reasoning Model: Use for complex log correlation, root cause analysis, cross-node pattern detection

Performance Notes

Memory Usage: ~100-200MB for log buffering
Network Impact: Minimal for journalctl, moderate for log file tailing
CPU Usage: Low for grep-based filtering, moderate for complex correlation
Disk Usage: Temporary log buffers (~50-100MB per node)
Latency: Near real-time for journalctl (~1-2s delay)

blockchain-troubleshoot-recovery - For troubleshooting based on log findings
gitea-runner-log-debugger - For CI-specific log debugging
aitbc-node-coordinator - For cross-node coordination during issues

AITBC System Architecture Audit - System-wide audit including log analysis

11 KiB Raw Blame History

AITBC Log Monitor Skill

Purpose

Activation

Input Schema

Output Schema

Process

1. Discover aitbc-* Services

2. Start Journalctl Monitoring (Real-time)

3. Monitor Application Logfiles

4. Error Detection and Alerting

5. Cross-Node Log Correlation

6. Historical Log Analysis (if requested)

7. Stop Monitoring

Common aitbc-* Services

Primary Services

Secondary Services

Logfile Locations

Application Logs

Service-Specific Logs

Error Patterns to Monitor

Critical Errors

Warning Patterns

Constraints

Environment Assumptions

Error Handling

SSH Connection Failure

Service Not Found

Log File Access Denied

Buffer Overflow

Example Usage Prompts

Basic Monitoring

Error-Only Monitoring

Specific Services

Historical Analysis

Duration-Limited Monitoring

Custom Error Keywords

Expected Output Example

Model Routing

Performance Notes

Related Skills

Related Workflows

11 KiB

Raw Blame History