Some checks failed
Blockchain Synchronization Verification / sync-verification (push) Successful in 3s
Integration Tests / test-service-integration (push) Failing after 9s
Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s
P2P Network Verification / p2p-verification (push) Successful in 2s
Python Tests / test-python (push) Successful in 11s
Security Scanning / security-scan (push) Successful in 31s
359 lines
11 KiB
Markdown
359 lines
11 KiB
Markdown
---
|
|
description: Autonomous AI skill for monitoring journalctl and logfiles across all AITBC nodes
|
|
title: AITBC Log Monitor
|
|
version: 1.0
|
|
---
|
|
|
|
# AITBC Log Monitor Skill
|
|
|
|
## Purpose
|
|
Autonomous AI skill for real-time monitoring of journalctl logs and AITBC logfiles across all nodes (aitbc, aitbc1, gitea-runner). Provides error detection, alerting, and cross-node log correlation for aitbc-* systemd services and application logs.
|
|
|
|
## Activation
|
|
Activate this skill when:
|
|
- Real-time log monitoring is needed across all AITBC nodes
|
|
- Error detection and alerting is required for aitbc-* services
|
|
- Cross-node log correlation is needed for troubleshooting
|
|
- Service health monitoring is required
|
|
- Log analysis for debugging or investigation is needed
|
|
|
|
## Input Schema
|
|
```json
|
|
{
|
|
"monitoring_mode": {
|
|
"type": "string",
|
|
"enum": ["realtime", "historical", "error_only", "full"],
|
|
"description": "Monitoring mode for logs"
|
|
},
|
|
"services": {
|
|
"type": "array",
|
|
"items": {"type": "string"},
|
|
"description": "Specific aitbc-* services to monitor (empty = all services)"
|
|
},
|
|
"nodes": {
|
|
"type": "array",
|
|
"items": {"type": "string", "enum": ["aitbc", "aitbc1", "gitea-runner", "all"]},
|
|
"description": "Nodes to monitor (default: all)"
|
|
},
|
|
"log_paths": {
|
|
"type": "array",
|
|
"items": {"type": "string"},
|
|
"description": "Additional log paths to monitor in /var/log/aitbc/"
|
|
},
|
|
"error_keywords": {
|
|
"type": "array",
|
|
"items": {"type": "string"},
|
|
"description": "Keywords to trigger error alerts (default: ERROR, CRITICAL, FAILED, exception)"
|
|
},
|
|
"alert_threshold": {
|
|
"type": "integer",
|
|
"default": 5,
|
|
"description": "Number of errors before triggering alert"
|
|
},
|
|
"duration": {
|
|
"type": "integer",
|
|
"description": "Monitoring duration in seconds (null = indefinite)"
|
|
}
|
|
}
|
|
```
|
|
|
|
## Output Schema
|
|
```json
|
|
{
|
|
"monitoring_status": {
|
|
"type": "string",
|
|
"enum": ["active", "completed", "stopped", "error"]
|
|
},
|
|
"nodes_monitored": {
|
|
"type": "array",
|
|
"items": {"type": "string"}
|
|
},
|
|
"services_monitored": {
|
|
"type": "array",
|
|
"items": {"type": "string"}
|
|
},
|
|
"error_summary": {
|
|
"type": "object",
|
|
"properties": {
|
|
"total_errors": {"type": "integer"},
|
|
"by_service": {"type": "object"},
|
|
"by_node": {"type": "object"},
|
|
"recent_errors": {"type": "array"}
|
|
}
|
|
},
|
|
"alerts_triggered": {
|
|
"type": "array",
|
|
"items": {
|
|
"type": "object",
|
|
"properties": {
|
|
"timestamp": {"type": "string"},
|
|
"node": {"type": "string"},
|
|
"service": {"type": "string"},
|
|
"message": {"type": "string"},
|
|
"severity": {"type": "string"}
|
|
}
|
|
}
|
|
},
|
|
"log_samples": {
|
|
"type": "object",
|
|
"description": "Sample log entries from each service"
|
|
},
|
|
"recommendations": {
|
|
"type": "array",
|
|
"items": {"type": "string"}
|
|
}
|
|
}
|
|
```
|
|
|
|
## Process
|
|
|
|
### 1. Discover aitbc-* Services
|
|
```bash
|
|
# Get list of all aitbc-* services on each node
|
|
echo "=== aitbc services ==="
|
|
systemctl list-units --all | grep "aitbc-"
|
|
|
|
echo "=== aitbc1 services ==="
|
|
ssh aitbc1 'systemctl list-units --all | grep "aitbc-"'
|
|
|
|
echo "=== gitea-runner services ==="
|
|
ssh gitea-runner 'systemctl list-units --all | grep "aitbc-"'
|
|
```
|
|
|
|
### 2. Start Journalctl Monitoring (Real-time)
|
|
```bash
|
|
# Monitor all aitbc-* services on each node in parallel
|
|
journalctl -f -u "aitbc-*" --no-pager > /tmp/aitbc-journalctl.log 2>&1 &
|
|
JOURNALCTL_PID=$!
|
|
|
|
ssh aitbc1 'journalctl -f -u "aitbc-*" --no-pager' > /tmp/aitbc1-journalctl.log 2>&1 &
|
|
AITBC1_PID=$!
|
|
|
|
ssh gitea-runner 'journalctl -f -u "aitbc-*" --no-pager' > /tmp/gitea-runner-journalctl.log 2>&1 &
|
|
GITEA_RUNNER_PID=$!
|
|
```
|
|
|
|
### 3. Monitor Application Logfiles
|
|
```bash
|
|
# Monitor /var/log/aitbc/ logfiles on each node
|
|
tail -f /var/log/aitbc/*.log > /tmp/aitbc-applogs.log 2>&1 &
|
|
APPLOGS_PID=$!
|
|
|
|
ssh aitbc1 'tail -f /var/log/aitbc/*.log' > /tmp/aitbc1-applogs.log 2>&1 &
|
|
AITBC1_APPLOGS_PID=$!
|
|
|
|
ssh gitea-runner 'tail -f /var/log/aitbc/*.log' > /tmp/gitea-runner-applogs.log 2>&1 &
|
|
GITEA_RUNNER_APPLOGS_PID=$!
|
|
```
|
|
|
|
### 4. Error Detection and Alerting
|
|
```bash
|
|
# Monitor logs for error keywords
|
|
tail -f /tmp/aitbc-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
|
|
echo "[ALERT] aitbc: $line"
|
|
# Increment error counter
|
|
# Trigger alert if threshold exceeded
|
|
done &
|
|
|
|
tail -f /tmp/aitbc1-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
|
|
echo "[ALERT] aitbc1: $line"
|
|
done &
|
|
|
|
tail -f /tmp/gitea-runner-journalctl.log | grep -E --line-buffered "(ERROR|CRITICAL|FAILED|exception)" | while read line; do
|
|
echo "[ALERT] gitea-runner: $line"
|
|
done &
|
|
```
|
|
|
|
### 5. Cross-Node Log Correlation
|
|
```bash
|
|
# Correlate events across nodes by timestamp
|
|
# Example: detect if a service fails on all nodes simultaneously
|
|
# Check for common error patterns across nodes
|
|
# Identify propagation of errors from one node to another
|
|
```
|
|
|
|
### 6. Historical Log Analysis (if requested)
|
|
```bash
|
|
# Analyze recent logs for patterns
|
|
journalctl -u "aitbc-*" --since "1 hour ago" --no-pager | grep -E "(ERROR|CRITICAL|FAILED)"
|
|
ssh aitbc1 'journalctl -u "aitbc-*" --since "1 hour ago" --no-pager' | grep -E "(ERROR|CRITICAL|FAILED)"
|
|
ssh gitea-runner 'journalctl -u "aitbc-*" --since "1 hour ago" --no-pager' | grep -E "(ERROR|CRITICAL|FAILED)"
|
|
```
|
|
|
|
### 7. Stop Monitoring
|
|
```bash
|
|
# Kill background processes when monitoring duration expires
|
|
kill $JOURNALCTL_PID $AITBC1_PID $GITEA_RUNNER_PID
|
|
kill $APPLOGS_PID $AITBC1_APPLOGS_PID $GITEA_RUNNER_APPLOGS_PID
|
|
```
|
|
|
|
## Common aitbc-* Services
|
|
|
|
### Primary Services
|
|
- aitbc-blockchain-node.service - Main blockchain node
|
|
- aitbc-blockchain-p2p.service - P2P network service
|
|
- aitbc-blockchain-rpc.service - RPC API service
|
|
- aitbc-agent-daemon.service - Agent listener daemon
|
|
- aitbc-agent-coordinator.service - Agent coordinator
|
|
- aitbc-agent-registry.service - Agent registry
|
|
|
|
### Secondary Services
|
|
- aitbc-marketplace.service - Marketplace service
|
|
- aitbc-gpu-miner.service - GPU mining service
|
|
- aitbc-monitor.service - System monitoring
|
|
|
|
## Logfile Locations
|
|
|
|
### Application Logs
|
|
- /var/log/aitbc/blockchain-communication-test.log
|
|
- /var/log/aitbc/blockchain-test-errors.log
|
|
- /var/log/aitbc/training*.log
|
|
- /var/log/aitbc/service_monitoring.log
|
|
- /var/log/aitbc/service_alerts.log
|
|
|
|
### Service-Specific Logs
|
|
- /var/log/aitbc/blockchain-node/
|
|
- /var/log/aitbc/agent-coordinator/
|
|
- /var/log/aitbc/agent-registry/
|
|
- /var/log/aitbc/gpu-marketplace/
|
|
|
|
## Error Patterns to Monitor
|
|
|
|
### Critical Errors
|
|
- "FileNotFoundError" - Missing configuration or data files
|
|
- "Permission denied" - File permission issues
|
|
- "Connection refused" - Network connectivity issues
|
|
- "state root mismatch" - Blockchain state corruption
|
|
- "provided invalid or self node_id" - P2P identity conflicts
|
|
|
|
### Warning Patterns
|
|
- "Large sync gap" - Blockchain sync issues
|
|
- "Contract endpoints not available" - Service unavailability
|
|
- "Memory limit exceeded" - Resource exhaustion
|
|
|
|
## Constraints
|
|
- Maximum monitoring duration: 24 hours unless renewed
|
|
- Cannot monitor more than 50 concurrent log streams
|
|
- Alert threshold cannot be lower than 3 to avoid false positives
|
|
- Must preserve log integrity - cannot modify original logs
|
|
- Monitoring should not impact system performance significantly
|
|
- SSH connections must be established and working for remote nodes
|
|
|
|
## Environment Assumptions
|
|
- SSH access to aitbc1 and gitea-runner configured
|
|
- Log directory: /var/log/aitbc/
|
|
- Systemd services: aitbc-* pattern
|
|
- Journalctl available on all nodes
|
|
- Sufficient disk space for log buffering
|
|
- Network connectivity between nodes for cross-node correlation
|
|
|
|
## Error Handling
|
|
|
|
### SSH Connection Failure
|
|
- Log connection error
|
|
- Mark node as unavailable
|
|
- Continue monitoring other nodes
|
|
- Alert user about connectivity issue
|
|
|
|
### Service Not Found
|
|
- Skip missing services gracefully
|
|
- Log service not found warning
|
|
- Continue monitoring available services
|
|
|
|
### Log File Access Denied
|
|
- Log permission error
|
|
- Check file permissions
|
|
- Alert user if critical logs inaccessible
|
|
|
|
### Buffer Overflow
|
|
- Monitor log buffer size
|
|
- Rotate buffers if needed
|
|
- Alert if disk space insufficient
|
|
|
|
## Example Usage Prompts
|
|
|
|
### Basic Monitoring
|
|
"Monitor all aitbc-* services on all nodes in real-time mode."
|
|
|
|
### Error-Only Monitoring
|
|
"Monitor for errors only across aitbc and aitbc1 nodes."
|
|
|
|
### Specific Services
|
|
"Monitor aitbc-blockchain-node and aitbc-agent-daemon services on all nodes."
|
|
|
|
### Historical Analysis
|
|
"Analyze the last 2 hours of logs for errors across all nodes."
|
|
|
|
### Duration-Limited Monitoring
|
|
"Monitor all services for 30 minutes and report error summary."
|
|
|
|
### Custom Error Keywords
|
|
"Monitor for 'state root mismatch' and 'P2P handshake' errors across all nodes."
|
|
|
|
## Expected Output Example
|
|
```json
|
|
{
|
|
"monitoring_status": "completed",
|
|
"nodes_monitored": ["aitbc", "aitbc1", "gitea-runner"],
|
|
"services_monitored": ["aitbc-blockchain-node.service", "aitbc-blockchain-p2p.service", "aitbc-agent-daemon.service"],
|
|
"error_summary": {
|
|
"total_errors": 12,
|
|
"by_service": {
|
|
"aitbc-blockchain-node.service": 5,
|
|
"aitbc-agent-daemon.service": 7
|
|
},
|
|
"by_node": {
|
|
"aitbc": 3,
|
|
"aitbc1": 9,
|
|
"gitea-runner": 0
|
|
},
|
|
"recent_errors": [
|
|
{
|
|
"timestamp": "2026-04-22T14:10:15",
|
|
"node": "aitbc1",
|
|
"service": "aitbc-agent-daemon.service",
|
|
"message": "FileNotFoundError: /var/lib/aitbc/keystore/.agent_daemon_password",
|
|
"severity": "CRITICAL"
|
|
}
|
|
]
|
|
},
|
|
"alerts_triggered": [
|
|
{
|
|
"timestamp": "2026-04-22T14:10:15",
|
|
"node": "aitbc1",
|
|
"service": "aitbc-agent-daemon.service",
|
|
"message": "Agent daemon service failed due to missing keystore file",
|
|
"severity": "CRITICAL"
|
|
}
|
|
],
|
|
"log_samples": {
|
|
"aitbc-blockchain-node.service": "Latest 10 log entries...",
|
|
"aitbc-agent-daemon.service": "Latest 10 log entries..."
|
|
},
|
|
"recommendations": [
|
|
"Check keystore directory on aitbc1",
|
|
"Verify agent daemon service configuration",
|
|
"Monitor for additional file permission errors"
|
|
]
|
|
}
|
|
```
|
|
|
|
## Model Routing
|
|
- **Fast Model**: Use for basic monitoring and error detection
|
|
- **Reasoning Model**: Use for complex log correlation, root cause analysis, cross-node pattern detection
|
|
|
|
## Performance Notes
|
|
- **Memory Usage**: ~100-200MB for log buffering
|
|
- **Network Impact**: Minimal for journalctl, moderate for log file tailing
|
|
- **CPU Usage**: Low for grep-based filtering, moderate for complex correlation
|
|
- **Disk Usage**: Temporary log buffers (~50-100MB per node)
|
|
- **Latency**: Near real-time for journalctl (~1-2s delay)
|
|
|
|
## Related Skills
|
|
- [blockchain-troubleshoot-recovery](/blockchain-troubleshoot-recovery.md) - For troubleshooting based on log findings
|
|
- [gitea-runner-log-debugger](/gitea-runner-log-debugger.md) - For CI-specific log debugging
|
|
- [aitbc-node-coordinator](/aitbc-node-coordinator.md) - For cross-node coordination during issues
|
|
|
|
## Related Workflows
|
|
- [AITBC System Architecture Audit](/workflows/aitbc-system-architecture-audit.md) - System-wide audit including log analysis
|