oib/aitbc

Fork 0

Files

aitbc 522655ef92

API Endpoint Tests / test-api-endpoints (push) Successful in 10s

Details

Blockchain Synchronization Verification / sync-verification (push) Failing after 3s

Details

CLI Tests / test-cli (push) Failing after 4s

Details

Documentation Validation / validate-docs (push) Successful in 8s

Details

Documentation Validation / validate-policies-strict (push) Successful in 4s

Details

Integration Tests / test-service-integration (push) Successful in 38s

Details

Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s

Details

P2P Network Verification / p2p-verification (push) Successful in 3s

Details

Security Scanning / security-scan (push) Successful in 40s

Details

Smart Contract Tests / test-solidity (map[name:aitbc-token path:packages/solidity/aitbc-token]) (push) Successful in 15s

Details

Smart Contract Tests / lint-solidity (push) Successful in 8s

Details

Move blockchain app READMEs to centralized documentation

- Relocate blockchain-event-bridge README content to docs/apps/blockchain/blockchain-event-bridge.md
- Relocate blockchain-explorer README content to docs/apps/blockchain/blockchain-explorer.md
- Replace app READMEs with redirect notices pointing to new documentation location
- Consolidate documentation in central docs/ directory for better organization

2026-04-23 12:24:48 +02:00

4.8 KiB

Raw Blame History

Monitor

Status

✅ Operational

Overview

System monitoring and alerting service for tracking application health, performance metrics, and generating alerts for critical events.

Architecture

Core Components

Health Check Service: Periodic health checks for all services
Metrics Collector: Collects performance metrics from applications
Alert Manager: Manages alert rules and notifications
Dashboard: Web dashboard for monitoring visualization
Log Aggregator: Aggregates logs from all services
Notification Service: Sends alerts via email, Slack, etc.

Quick Start (End Users)

Prerequisites

Python 3.13+
Access to application endpoints
Notification service credentials (email, Slack webhook)

Installation

cd /opt/aitbc/apps/monitor
.venv/bin/pip install -r requirements.txt

Configuration

Set environment variables in .env:

MONITOR_INTERVAL=60
ALERT_EMAIL=admin@example.com
SLACK_WEBHOOK=https://hooks.slack.com/services/...
PROMETHEUS_URL=http://localhost:9090

Running the Service

.venv/bin/python main.py

Access Dashboard

Open http://localhost:8080 in a browser to access the monitoring dashboard.

Developer Guide

Development Setup

Clone the repository
Create virtual environment: python -m venv .venv
Install dependencies: pip install -r requirements.txt
Configure monitoring targets
Run tests: pytest tests/

Project Structure

monitor/
├── src/
│   ├── health_check/        # Health check service
│   ├── metrics_collector/   # Metrics collection
│   ├── alert_manager/       # Alert management
│   ├── dashboard/           # Web dashboard
│   ├── log_aggregator/      # Log aggregation
│   └── notification/        # Notification service
├── tests/                   # Test suite
└── pyproject.toml           # Project configuration

Testing

# Run all tests
pytest tests/

# Run health check tests
pytest tests/test_health_check.py

# Run alert manager tests
pytest tests/test_alerts.py

API Reference

Health Checks

Run Health Check

GET /api/v1/monitor/health/{service_name}

Get All Health Status

GET /api/v1/monitor/health

Add Health Check Target

POST /api/v1/monitor/health/targets
Content-Type: application/json

{
  "service_name": "string",
  "endpoint": "http://localhost:8000/health",
  "interval": 60,
  "timeout": 10
}

Metrics

Get Metrics

GET /api/v1/monitor/metrics?service=blockchain-node

Query Prometheus

POST /api/v1/monitor/metrics/query
Content-Type: application/json

{
  "query": "up{job=\"blockchain-node\"}",
  "range": "1h"
}

Alerts

Create Alert Rule

POST /api/v1/monitor/alerts/rules
Content-Type: application/json

{
  "name": "high_cpu_usage",
  "condition": "cpu_usage > 80",
  "duration": 300,
  "severity": "warning|critical",
  "notification": "email|slack"
}

Get Active Alerts

GET /api/v1/monitor/alerts/active

Acknowledge Alert

POST /api/v1/monitor/alerts/{alert_id}/acknowledge

Logs

Query Logs

POST /api/v1/monitor/logs/query
Content-Type: application/json

{
  "service": "blockchain-node",
  "level": "ERROR",
  "time_range": "1h",
  "query": "error"
}

Get Log Statistics

GET /api/v1/monitor/logs/stats?service=blockchain-node

Configuration

Environment Variables

MONITOR_INTERVAL: Interval for health checks (default: 60s)
ALERT_EMAIL: Email address for alert notifications
SLACK_WEBHOOK: Slack webhook for notifications
PROMETHEUS_URL: Prometheus server URL
LOG_RETENTION_DAYS: Log retention period (default: 30 days)
ALERT_COOLDOWN: Alert cooldown period (default: 300s)

Monitoring Targets

Services: List of services to monitor
Endpoints: Health check endpoints for each service
Intervals: Check intervals for each service

Alert Rules

CPU Usage: Alert when CPU usage exceeds threshold
Memory Usage: Alert when memory usage exceeds threshold
Disk Usage: Alert when disk usage exceeds threshold
Service Down: Alert when service is unresponsive

Troubleshooting

Health check failing: Verify service endpoint and network connectivity.

Alerts not triggering: Check alert rule configuration and notification settings.

Metrics not collecting: Verify Prometheus integration and service metrics endpoints.

Logs not appearing: Check log aggregation configuration and service log paths.

Security Notes

Secure access to monitoring dashboard
Use authentication for API endpoints
Encrypt alert notification credentials
Implement role-based access control
Regularly review alert rules
Monitor for unauthorized access attempts

4.8 KiB Raw Blame History

Monitor

Status

Overview

Architecture

Core Components

Quick Start (End Users)

Prerequisites

Installation

Configuration

Running the Service

Access Dashboard

Developer Guide

Development Setup

Project Structure

Testing

API Reference

Health Checks

Run Health Check

Get All Health Status

Add Health Check Target

Metrics

Get Metrics

Query Prometheus

Alerts

Create Alert Rule

Get Active Alerts

Acknowledge Alert

Logs

Query Logs

Get Log Statistics

Configuration

Environment Variables

Monitoring Targets

Alert Rules

Troubleshooting

Security Notes

4.8 KiB

Raw Blame History