oib/aitbc

Fork 0

Files

aitbc 3898df3887

Cross-Node Transaction Testing / transaction-test (push) Successful in 3s

Details

Deploy to Testnet / deploy-testnet (push) Successful in 1m11s

Details

Multi-Node Stress Testing / stress-test (push) Successful in 3s

Details

Node Failover Simulation / failover-test (push) Failing after 2s

Details

docs: add multi-node journalctl real-time monitoring workflow

- Add comprehensive workflow for real-time monitoring across all three AITBC nodes
- Document 4 monitoring modes: single node, multi-node, filtered, and pattern-specific
- Include quick start commands for warning/error and error-only monitoring
- Add 3 advanced monitoring scripts: timestamps, error counter with alerts, log aggregator
- Document common monitoring scenarios: config changes, chain-specific, block production, RPC bootstrap
- Add

2026-05-09 20:30:44 +02:00

11 KiB

Raw Blame History

Multi-Node Journalctl Real-Time Monitoring

This workflow provides real-time monitoring of SystemD journal logs across all three AITBC nodes (aitbc, aitbc1, gitea-runner) with filtering for warnings and errors.

Prerequisites

Required Setup

SSH access to all three nodes (aitbc, aitbc1, gitea-runner)
SystemD services running on all nodes
Working directory: /opt/aitbc
journalctl access on all nodes

Node Configuration

aitbc (hub for ait-mainnet): localhost
aitbc1 (hub for ait-testnet): ssh aitbc1
gitea-runner (follower): ssh gitea-runner

Real-Time Monitoring Modes

Mode 1: Single Node Real-Time Monitoring

Monitor aitbc (local node):

journalctl -fu aitbc-blockchain-node.service

Monitor specific service on aitbc:

journalctl -fu aitbc-blockchain-node.service -p warning -p err

Monitor all aitbc services:

journalctl -fu 'aitbc-*' -p warning -p err

Mode 2: Multi-Node Real-Time Monitoring

Monitor all blockchain services on all nodes (parallel):

# Terminal 1: aitbc
journalctl -fu 'aitbc-*' -p warning -p err

# Terminal 2: aitbc1
ssh aitbc1 'journalctl -fu "aitbc-*" -p warning -p err'

# Terminal 3: gitea-runner
ssh gitea-runner 'journalctl -fu "aitbc-*" -p warning -p err'

Monitor with node identification:

# aitbc
echo "=== MONITORING aitbc ===" && journalctl -fu 'aitbc-*' -p warning -p err --output-cat

# aitbc1
echo "=== MONITORING aitbc1 ===" && ssh aitbc1 'journalctl -fu "aitbc-*" -p warning -p err --output-cat'

# gitea-runner
echo "=== MONITORING gitea-runner ===" && ssh gitea-runner 'journalctl -fu "aitbc-*" -p warning -p err --output-cat'

Mode 3: Filtered Monitoring

Monitor only errors:

# aitbc
journalctl -fu 'aitbc-*' -p err

# aitbc1
ssh aitbc1 'journalctl -fu "aitbc-*" -p err'

# gitea-runner
ssh gitea-runner 'journalctl -fu "aitbc-*" -p err'

Monitor warnings and errors:

# aitbc
journalctl -fu 'aitbc-*' -p warning -p err

# aitbc1
ssh aitbc1 'journalctl -fu "aitbc-*" -p warning -p err'

# gitea-runner
ssh gitea-runner 'journalctl -fu "aitbc-*" -p warning -p err'

Monitor with time filter:

# Monitor last hour of logs, then follow new logs
journalctl -fu 'aitbc-*' --since "1 hour ago" -p warning -p err

Mode 4: Pattern-Specific Monitoring

Monitor for specific error patterns:

# Monitor for sync errors
journalctl -fu 'aitbc-*' | grep -i "sync\|error"

# Monitor for RPC bootstrap issues
journalctl -fu 'aitbc-*' | grep -i "bootstrap\|genesis"

# Monitor for P2P issues
journalctl -fu 'aitbc-*' | grep -i "p2p\|peer\|connection"

Multi-node pattern monitoring:

# aitbc
journalctl -fu 'aitbc-*' | grep -i "sync\|error"

# aitbc1
ssh aitbc1 'journalctl -fu "aitbc-*" | grep -i "sync\|error"'

# gitea-runner
ssh gitea-runner 'journalctl -fu "aitbc-*" | grep -i "sync\|error"'

Quick Start Commands

Quick All-Node Warning/Error Monitor

# Start monitoring all nodes for warnings and errors
echo "=== Starting multi-node warning/error monitoring ===" && \
journalctl -fu 'aitbc-*' -p warning -p err &
AITBC_PID=$!

ssh aitbc1 'journalctl -fu "aitbc-*" -p warning -p err' &
AITBC1_PID=$!

ssh gitea-runner 'journalctl -fu "aitbc-*" -p warning -p err' &
GITEA_PID=$!

# Store PIDs for cleanup
echo "Monitoring started. PIDs: aitbc=$AITBC_PID, aitbc1=$AITBC1_PID, gitea-runner=$GITEA_PID"
echo "Press Ctrl+C to stop all monitors"

# Function to cleanup on exit
trap "kill $AITBC_PID $AITBC1_PID $GITEA_PID 2>/dev/null; echo 'Monitoring stopped'" EXIT

wait

Quick Error-Only Monitor

# Monitor only errors across all nodes
echo "=== Starting multi-node error-only monitoring ===" && \
journalctl -fu 'aitbc-*' -p err &
AITBC_PID=$!

ssh aitbc1 'journalctl -fu "aitbc-*" -p err' &
AITBC1_PID=$!

ssh gitea-runner 'journalctl -fu "aitbc-*" -p err' &
GITEA_PID=$!

# Store PIDs for cleanup
echo "Error monitoring started. PIDs: aitbc=$AITBC_PID, aitbc1=$AITBC1_PID, gitea-runner=$GITEA_PID"
echo "Press Ctrl+C to stop all monitors"

# Function to cleanup on exit
trap "kill $AITBC_PID $AITBC1_PID $GITEA_PID 2>/dev/null; echo 'Error monitoring stopped'" EXIT

wait

Advanced Monitoring Scripts

Script 1: Multi-Node Monitor with Timestamps

#!/bin/bash
# multi-node-monitor.sh - Real-time monitoring with timestamps

echo "=== Multi-Node Journalctl Monitor with Timestamps ==="
echo "Press Ctrl+C to stop monitoring"
echo ""

# Function to monitor single node with prefix
monitor_node() {
    local node_name=$1
    local node_cmd=$2
    
    while true; do
        timestamp=$(date '+%Y-%m-%d %H:%M:%S')
        echo "[$timestamp] $node_name"
        eval "$node_cmd" | head -5
        sleep 5
    done
}

# Start monitors in background
monitor_node "aitbc" "journalctl -u aitbc-blockchain-node.service -n 5 --no-pager -p warning -p err" &
MONITOR1=$!

monitor_node "aitbc1" "ssh aitbc1 'journalctl -u aitbc-blockchain-node.service -n 5 --no-pager -p warning -p err'" &
MONITOR2=$!

monitor_node "gitea-runner" "ssh gitea-runner 'journalctl -u aitbc-blockchain-node.service -n 5 --no-pager -p warning -p err'" &
MONITOR3=$!

trap "kill $MONITOR1 $MONITOR2 $MONITOR3 2>/dev/null; echo 'Monitoring stopped'" EXIT

wait

Script 2: Error Counter with Alerts

#!/bin/bash
# error-counter.sh - Count errors and alert on threshold

ERROR_THRESHOLD=10
CHECK_INTERVAL=30

echo "=== Error Counter with Alerts ==="
echo "Threshold: $ERROR_THRESHOLD errors in $CHECK_INTERVAL seconds"
echo ""

while true; do
    echo "=== Error Count Check $(date '+%Y-%m-%d %H:%M:%S') ==="
    
    # Count errors on each node
    aitbc_errors=$(journalctl -u aitbc-blockchain-node.service --since "$CHECK_INTERVAL seconds ago" -p err --no-pager | wc -l)
    aitbc1_errors=$(ssh aitbc1 'journalctl -u aitbc-blockchain-node.service --since "$CHECK_INTERVAL seconds ago" -p err --no-pager' 2>/dev/null | wc -l)
    gitea_errors=$(ssh gitea-runner 'journalctl -u aitbc-blockchain-node.service --since "$CHECK_INTERVAL seconds ago" -p err --no-pager' 2>/dev/null | wc -l)
    
    echo "aitbc errors: $aitbc_errors"
    echo "aitbc1 errors: $aitbc1_errors"
    echo "gitea-runner errors: $gitea_errors"
    
    # Alert on threshold breach
    if [ "$aitbc_errors" -ge "$ERROR_THRESHOLD" ]; then
        echo "⚠️  ALERT: aitbc error count ($aitbc_errors) exceeds threshold ($ERROR_THRESHOLD)"
    fi
    
    if [ "$aitbc1_errors" -ge "$ERROR_THRESHOLD" ]; then
        echo "⚠️  ALERT: aitbc1 error count ($aitbc1_errors) exceeds threshold ($ERROR_THRESHOLD)"
    fi
    
    if [ "$gitea_errors" -ge "$ERROR_THRESHOLD" ]; then
        echo "⚠️  ALERT: gitea-runner error count ($gitea_errors) exceeds threshold ($ERROR_THRESHOLD)"
    fi
    
    echo ""
    sleep $CHECK_INTERVAL
done

Script 3: Real-Time Log Aggregator

#!/bin/bash
# log-aggregator.sh - Aggregate logs from all nodes in real-time

echo "=== Real-Time Log Aggregator ==="
echo "Press Ctrl+C to stop aggregation"
echo ""

# Create named pipes for each node
PIPE_AITBC=$(mktemp -u)
PIPE_AITBC1=$(mktemp -u)
PIPE_GITEA=$(mktemp -u)

mkfifo $PIPE_AITBC
mkfifo $PIPE_AITBC1
mkfifo $PIPE_GITEA

# Function to read from pipe and add prefix
read_pipe() {
    local prefix=$1
    local pipe=$2
    
    while read line; do
        echo "[$prefix] $line"
    done < $pipe
}

# Start journalctl for each node and pipe to named pipes
journalctl -fu 'aitbc-*' -p warning -p err > $PIPE_AITBC &
PID_AITBC=$!

ssh aitbc1 'journalctl -fu "aitbc-*" -p warning -p err' > $PIPE_AITBC1 &
PID_AITBC1=$!

ssh gitea-runner 'journalctl -fu "aitbc-*" -p warning -p err' > $PIPE_GITEA &
PID_GITEA=$!

# Start readers for each pipe
read_pipe "aitbc" $PIPE_AITBC &
READER1=$!

read_pipe "aitbc1" $PIPE_AITBC1 &
READER2=$!

read_pipe "gitea" $PIPE_GITEA &
READER3=$!

# Cleanup function
cleanup() {
    kill $PID_AITBC $PID_AITBC1 $PID_GITEA $READER1 $READER2 $READER3 2>/dev/null
    rm -f $PIPE_AITBC $PIPE_AITBC1 $PIPE_GITEA
    echo "Log aggregation stopped"
}

trap cleanup EXIT

wait

Common Monitoring Scenarios

Scenario 1: Monitor After Configuration Change

# Monitor all nodes for 5 minutes after making changes
timeout 300 bash -c '
journalctl -fu "aitbc-*" -p warning -p err &
AITBC_PID=$!

ssh aitbc1 "journalctl -fu \"aitbc-*\" -p warning -p err" &
AITBC1_PID=$!

ssh gitea-runner "journalctl -fu \"aitbc-*\" -p warning -p err" &
GITEA_PID=$!

trap "kill $AITBC_PID $AITBC1_PID $GITEA_PID 2>/dev/null" EXIT

wait
'

Scenario 2: Monitor Specific Chain

# Monitor for chain-specific issues
# aitbc (ait-mainnet hub)
journalctl -fu 'aitbc-*' | grep -i "mainnet\|chain=ait-mainnet"

# aitbc1 (ait-testnet hub)
ssh aitbc1 'journalctl -fu "aitbc-*" | grep -i "testnet\|chain=ait-testnet"'

Scenario 3: Monitor Block Production

# Monitor block production issues
journalctl -fu 'aitbc-*' | grep -i "block.*production\|proposer\|proposed"

# Monitor for sync issues
journalctl -fu 'aitbc-*' | grep -i "sync\|import\|bulk"

Scenario 4: Monitor RPC Bootstrap

# Monitor RPC bootstrap activity
journalctl -fu 'aitbc-*' | grep -i "bootstrap\|genesis\|rpc"

# Monitor across all nodes
ssh aitbc1 'journalctl -fu "aitbc-*" | grep -i "bootstrap\|genesis\|rpc"'

Journalctl Priority Levels

Understanding priority levels for filtering:

emerg (0): System is unusable
alert (1): Action must be taken immediately
crit (2): Critical conditions
err (3): Error conditions
warning (4): Warning conditions
notice (5): Normal but significant condition
info (6): Informational messages
debug (7): Debug-level messages

Common filtering combinations:

-p err: Only errors
-p warning -p err: Warnings and errors
-p notice -p warning -p err: Notice, warning, and errors
-p crit -p err: Critical and errors only

Troubleshooting Monitoring Issues

SSH Connection Issues

# Test SSH connectivity before monitoring
ssh aitbc1 'echo "Connection OK"'
ssh gitea-runner 'echo "Connection OK"'

Permission Issues

# Check journalctl access
journalctl -n 1 --no-pager
ssh aitbc1 'journalctl -n 1 --no-pager'
ssh gitea-runner 'journalctl -n 1 --no-pager'

Service Not Running

# Check if services are running before monitoring
systemctl status aitbc-blockchain-node.service
ssh aitbc1 'systemctl status aitbc-blockchain-node.service'
ssh gitea-runner 'systemctl status aitbc-blockchain-node.service'

Best Practices

Use priority filtering to reduce noise: -p warning -p err
Monitor in separate terminals for different nodes
Use grep patterns for specific issue types
Set timeouts for monitoring sessions to avoid indefinite runs
Use cleanup traps to stop background processes
Test connectivity before starting multi-node monitoring
Use meaningful timestamps when aggregating logs from multiple sources
Focus on specific services when troubleshooting known issues

multi-node-log-check - Comprehensive log checking workflow
aitbc-blockchain-troubleshooting - Blockchain troubleshooting procedures
aitbc-configuration-management - Configuration management and validation

11 KiB Raw Blame History

Multi-Node Journalctl Real-Time Monitoring

Prerequisites

Required Setup

Node Configuration

Real-Time Monitoring Modes

Mode 1: Single Node Real-Time Monitoring

Mode 2: Multi-Node Real-Time Monitoring

Mode 3: Filtered Monitoring

Mode 4: Pattern-Specific Monitoring

Quick Start Commands

Quick All-Node Warning/Error Monitor

Quick Error-Only Monitor

Advanced Monitoring Scripts

Script 1: Multi-Node Monitor with Timestamps

Script 2: Error Counter with Alerts

Script 3: Real-Time Log Aggregator

Common Monitoring Scenarios

Scenario 1: Monitor After Configuration Change

Scenario 2: Monitor Specific Chain

Scenario 3: Monitor Block Production

Scenario 4: Monitor RPC Bootstrap

Journalctl Priority Levels

Troubleshooting Monitoring Issues

SSH Connection Issues

Permission Issues

Service Not Running

Best Practices

Related Skills

11 KiB

Raw Blame History