- Updated marketplace commands: `marketplace --action` → `market` subcommands - Updated wallet commands: direct flags → `wallet` subcommands - Updated AI commands: `ai-submit`, `ai-status` → `ai submit`, `ai status` - Updated blockchain commands: `chain` → `blockchain info` - Standardized command structure across all workflow files - Affected files: MULTI_NODE_MASTER_INDEX.md, TEST_MASTER_INDEX.md, multi-node-blockchain-marketplace
6.3 KiB
Node Monitoring
Monitor your blockchain node performance and health.
Dashboard
aitbc-chain dashboard
Shows:
- Block height
- Peers connected
- Mempool size
- CPU/Memory/GPU usage
- Network traffic
Prometheus Metrics
# Enable metrics
aitbc-chain metrics --port 9090
Available metrics:
aitbc_block_height- Current block heightaitbc_peers_count- Number of connected peersaitbc_mempool_size- Transactions in mempoolaitbc_block_production_time- Block production timeaitbc_cpu_usage- CPU utilizationaitbc_memory_usage- Memory utilization
Coordinator API Metrics
The coordinator API now exposes a JSON metrics endpoint for dashboard consumption in addition to the Prometheus /metrics endpoint.
Live JSON Metrics
curl http://localhost:8000/v1/metrics
Includes:
- API request and error counters
- Average API response time
- Cache hit/miss and hit-rate data
- Lightweight process memory and CPU snapshot
- Alert threshold evaluation state
- Alert delivery result metadata
Dashboard Flow
The web dashboard at /opt/aitbc/website/dashboards/metrics.html consumes:
GET /v1/metricsfor live JSON metricsGET /v1/healthfor API health-state checksGET /metricsfor Prometheus-compatible scraping
Alert Configuration
Set Alerts
# Low peers alert
aitbc-chain alert --metric peers --threshold 3 --action notify
# High mempool alert
aitbc-chain alert --metric mempool --threshold 5000 --action notify
# Sync delay alert
aitbc-chain alert --metric sync_delay --threshold 100 --action notify
Alert Actions
| Action | Description |
|---|---|
| notify | Send notification |
| restart | Restart node |
| pause | Pause block production |
Log Monitoring
# Real-time logs
aitbc-chain logs --tail
# Search logs
aitbc-chain logs --grep "error" --since "1h"
# Export logs
aitbc-chain logs --export /var/log/aitbc-chain/
Health Checks
# Run health check
aitbc-chain health
# Detailed report
aitbc-chain health --detailed
Checks:
- Disk space
- Memory
- P2P connectivity
- RPC availability
- Database sync
Coordinator Metrics Verification
Verify JSON Metrics Endpoint
# Check live JSON metrics for dashboard consumption
curl http://localhost:8000/v1/metrics | jq
Expected fields:
api_requests- Total API request countapi_errors- Total API error counterror_rate_percent- Calculated error rate percentageavg_response_time_ms- Average API response timecache_hit_rate_percent- Cache hit rate percentagealerts- Alert threshold evaluation statesalert_delivery- Alert delivery result metadatauptime_seconds- Service uptime in seconds
Verify Prometheus Metrics
# Check Prometheus-compatible metrics
curl http://localhost:8000/metrics
Verify Alert History
# Get recent production alerts (requires admin key)
curl -H "X-API-Key: your-admin-key" \
"http://localhost:8000/agents/integration/production/alerts?limit=10" | jq
Filter by severity:
curl -H "X-API-Key: your-admin-key" \
"http://localhost:8000/agents/integration/production/alerts?severity=critical" | jq
Verify Dashboard Access
# Open the metrics dashboard in a browser
# File location: /opt/aitbc/website/dashboards/metrics.html
The dashboard polls:
GET /v1/metricsfor live JSON metricsGET /v1/healthfor API health-state checksGET /metricsfor Prometheus-compatible scraping
Troubleshooting
Metrics Not Updating
If /v1/metrics shows stale or zeroed metrics:
-
Check middleware is active
- Verify request metrics middleware is registered in
app/main.py - Check that
metrics_collectoris imported and used
- Verify request metrics middleware is registered in
-
Check cache stats integration
- Verify
cache_manager.get_stats()is called in the metrics endpoint - Check that cache manager is properly initialized
- Verify
-
Check system snapshot capture
- Verify
capture_system_snapshot()is not raising exceptions - Check that
os.getloadavg()andresourcemodule are available on your platform
- Verify
Alert Delivery Not Working
If alerts are not being delivered:
-
Check webhook configuration
- Verify
AITBC_ALERT_WEBHOOK_URLenvironment variable is set - Test webhook URL with a simple curl POST request
- Check webhook server logs for incoming requests
- Verify
-
Check alert suppression
- Alert dispatcher uses 5-minute cooldown by default
- Check if alerts are being suppressed due to recent deliveries
- Verify cooldown logic in
alert_dispatcher._is_suppressed()
-
Check alert history
- Use
/agents/integration/production/alertsto see recent alert attempts - Check
delivery_statusfield:sent,suppressed, orfailed - Check
errorfield for failed deliveries
- Use
-
Check log fallback
- If webhook URL is not configured, alerts fall back to log output
- Check coordinator API logs for warning messages about alerts
Dashboard Not Loading
If the metrics dashboard is not displaying data:
-
Check API endpoints are accessible
- Verify
/v1/metricsreturns valid JSON - Verify
/v1/healthreturns healthy status - Check browser console for CORS or network errors
- Verify
-
Check dashboard file path
- Ensure dashboard is served from correct location
- Verify static file serving is configured in web server
-
Check browser console
- Look for JavaScript errors
- Check for failed API requests
- Verify polling interval is reasonable (default 5 seconds)
Alert Thresholds Not Triggering
If alerts should trigger but do not:
-
Verify threshold values
- Error rate threshold: 1%
- Average response time threshold: 500ms
- Memory usage threshold: 90%
- Cache hit rate threshold: 70%
-
Check metrics calculation
- Verify metrics are being collected correctly
- Check that response times are recorded in seconds (not milliseconds)
- Verify cache hit rate calculation includes both hits and misses
-
Check alert evaluation logic
- Verify
get_alert_states()is called during metrics collection - Check that alert states are included in
/v1/metricsresponse
- Verify
Next
- Quick Start — Get started
- Configuration - Configure your node
- Operations — Day-to-day ops