Files
aitbc/docs/apps/infrastructure/monitor.md
aitbc 522655ef92
Some checks failed
API Endpoint Tests / test-api-endpoints (push) Successful in 10s
Blockchain Synchronization Verification / sync-verification (push) Failing after 3s
CLI Tests / test-cli (push) Failing after 4s
Documentation Validation / validate-docs (push) Successful in 8s
Documentation Validation / validate-policies-strict (push) Successful in 4s
Integration Tests / test-service-integration (push) Successful in 38s
Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s
P2P Network Verification / p2p-verification (push) Successful in 3s
Security Scanning / security-scan (push) Successful in 40s
Smart Contract Tests / test-solidity (map[name:aitbc-token path:packages/solidity/aitbc-token]) (push) Successful in 15s
Smart Contract Tests / lint-solidity (push) Successful in 8s
Move blockchain app READMEs to centralized documentation
- Relocate blockchain-event-bridge README content to docs/apps/blockchain/blockchain-event-bridge.md
- Relocate blockchain-explorer README content to docs/apps/blockchain/blockchain-explorer.md
- Replace app READMEs with redirect notices pointing to new documentation location
- Consolidate documentation in central docs/ directory for better organization
2026-04-23 12:24:48 +02:00

4.8 KiB

Monitor

Status

Operational

Overview

System monitoring and alerting service for tracking application health, performance metrics, and generating alerts for critical events.

Architecture

Core Components

  • Health Check Service: Periodic health checks for all services
  • Metrics Collector: Collects performance metrics from applications
  • Alert Manager: Manages alert rules and notifications
  • Dashboard: Web dashboard for monitoring visualization
  • Log Aggregator: Aggregates logs from all services
  • Notification Service: Sends alerts via email, Slack, etc.

Quick Start (End Users)

Prerequisites

  • Python 3.13+
  • Access to application endpoints
  • Notification service credentials (email, Slack webhook)

Installation

cd /opt/aitbc/apps/monitor
.venv/bin/pip install -r requirements.txt

Configuration

Set environment variables in .env:

MONITOR_INTERVAL=60
ALERT_EMAIL=admin@example.com
SLACK_WEBHOOK=https://hooks.slack.com/services/...
PROMETHEUS_URL=http://localhost:9090

Running the Service

.venv/bin/python main.py

Access Dashboard

Open http://localhost:8080 in a browser to access the monitoring dashboard.

Developer Guide

Development Setup

  1. Clone the repository
  2. Create virtual environment: python -m venv .venv
  3. Install dependencies: pip install -r requirements.txt
  4. Configure monitoring targets
  5. Run tests: pytest tests/

Project Structure

monitor/
├── src/
│   ├── health_check/        # Health check service
│   ├── metrics_collector/   # Metrics collection
│   ├── alert_manager/       # Alert management
│   ├── dashboard/           # Web dashboard
│   ├── log_aggregator/      # Log aggregation
│   └── notification/        # Notification service
├── tests/                   # Test suite
└── pyproject.toml           # Project configuration

Testing

# Run all tests
pytest tests/

# Run health check tests
pytest tests/test_health_check.py

# Run alert manager tests
pytest tests/test_alerts.py

API Reference

Health Checks

Run Health Check

GET /api/v1/monitor/health/{service_name}

Get All Health Status

GET /api/v1/monitor/health

Add Health Check Target

POST /api/v1/monitor/health/targets
Content-Type: application/json

{
  "service_name": "string",
  "endpoint": "http://localhost:8000/health",
  "interval": 60,
  "timeout": 10
}

Metrics

Get Metrics

GET /api/v1/monitor/metrics?service=blockchain-node

Query Prometheus

POST /api/v1/monitor/metrics/query
Content-Type: application/json

{
  "query": "up{job=\"blockchain-node\"}",
  "range": "1h"
}

Alerts

Create Alert Rule

POST /api/v1/monitor/alerts/rules
Content-Type: application/json

{
  "name": "high_cpu_usage",
  "condition": "cpu_usage > 80",
  "duration": 300,
  "severity": "warning|critical",
  "notification": "email|slack"
}

Get Active Alerts

GET /api/v1/monitor/alerts/active

Acknowledge Alert

POST /api/v1/monitor/alerts/{alert_id}/acknowledge

Logs

Query Logs

POST /api/v1/monitor/logs/query
Content-Type: application/json

{
  "service": "blockchain-node",
  "level": "ERROR",
  "time_range": "1h",
  "query": "error"
}

Get Log Statistics

GET /api/v1/monitor/logs/stats?service=blockchain-node

Configuration

Environment Variables

  • MONITOR_INTERVAL: Interval for health checks (default: 60s)
  • ALERT_EMAIL: Email address for alert notifications
  • SLACK_WEBHOOK: Slack webhook for notifications
  • PROMETHEUS_URL: Prometheus server URL
  • LOG_RETENTION_DAYS: Log retention period (default: 30 days)
  • ALERT_COOLDOWN: Alert cooldown period (default: 300s)

Monitoring Targets

  • Services: List of services to monitor
  • Endpoints: Health check endpoints for each service
  • Intervals: Check intervals for each service

Alert Rules

  • CPU Usage: Alert when CPU usage exceeds threshold
  • Memory Usage: Alert when memory usage exceeds threshold
  • Disk Usage: Alert when disk usage exceeds threshold
  • Service Down: Alert when service is unresponsive

Troubleshooting

Health check failing: Verify service endpoint and network connectivity.

Alerts not triggering: Check alert rule configuration and notification settings.

Metrics not collecting: Verify Prometheus integration and service metrics endpoints.

Logs not appearing: Check log aggregation configuration and service log paths.

Security Notes

  • Secure access to monitoring dashboard
  • Use authentication for API endpoints
  • Encrypt alert notification credentials
  • Implement role-based access control
  • Regularly review alert rules
  • Monitor for unauthorized access attempts