Some checks failed
API Endpoint Tests / test-api-endpoints (push) Successful in 10s
Blockchain Synchronization Verification / sync-verification (push) Failing after 3s
CLI Tests / test-cli (push) Failing after 4s
Documentation Validation / validate-docs (push) Successful in 8s
Documentation Validation / validate-policies-strict (push) Successful in 4s
Integration Tests / test-service-integration (push) Successful in 38s
Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s
P2P Network Verification / p2p-verification (push) Successful in 3s
Security Scanning / security-scan (push) Successful in 40s
Smart Contract Tests / test-solidity (map[name:aitbc-token path:packages/solidity/aitbc-token]) (push) Successful in 15s
Smart Contract Tests / lint-solidity (push) Successful in 8s
- Relocate blockchain-event-bridge README content to docs/apps/blockchain/blockchain-event-bridge.md - Relocate blockchain-explorer README content to docs/apps/blockchain/blockchain-explorer.md - Replace app READMEs with redirect notices pointing to new documentation location - Consolidate documentation in central docs/ directory for better organization
4.8 KiB
4.8 KiB
Monitor
Status
✅ Operational
Overview
System monitoring and alerting service for tracking application health, performance metrics, and generating alerts for critical events.
Architecture
Core Components
- Health Check Service: Periodic health checks for all services
- Metrics Collector: Collects performance metrics from applications
- Alert Manager: Manages alert rules and notifications
- Dashboard: Web dashboard for monitoring visualization
- Log Aggregator: Aggregates logs from all services
- Notification Service: Sends alerts via email, Slack, etc.
Quick Start (End Users)
Prerequisites
- Python 3.13+
- Access to application endpoints
- Notification service credentials (email, Slack webhook)
Installation
cd /opt/aitbc/apps/monitor
.venv/bin/pip install -r requirements.txt
Configuration
Set environment variables in .env:
MONITOR_INTERVAL=60
ALERT_EMAIL=admin@example.com
SLACK_WEBHOOK=https://hooks.slack.com/services/...
PROMETHEUS_URL=http://localhost:9090
Running the Service
.venv/bin/python main.py
Access Dashboard
Open http://localhost:8080 in a browser to access the monitoring dashboard.
Developer Guide
Development Setup
- Clone the repository
- Create virtual environment:
python -m venv .venv - Install dependencies:
pip install -r requirements.txt - Configure monitoring targets
- Run tests:
pytest tests/
Project Structure
monitor/
├── src/
│ ├── health_check/ # Health check service
│ ├── metrics_collector/ # Metrics collection
│ ├── alert_manager/ # Alert management
│ ├── dashboard/ # Web dashboard
│ ├── log_aggregator/ # Log aggregation
│ └── notification/ # Notification service
├── tests/ # Test suite
└── pyproject.toml # Project configuration
Testing
# Run all tests
pytest tests/
# Run health check tests
pytest tests/test_health_check.py
# Run alert manager tests
pytest tests/test_alerts.py
API Reference
Health Checks
Run Health Check
GET /api/v1/monitor/health/{service_name}
Get All Health Status
GET /api/v1/monitor/health
Add Health Check Target
POST /api/v1/monitor/health/targets
Content-Type: application/json
{
"service_name": "string",
"endpoint": "http://localhost:8000/health",
"interval": 60,
"timeout": 10
}
Metrics
Get Metrics
GET /api/v1/monitor/metrics?service=blockchain-node
Query Prometheus
POST /api/v1/monitor/metrics/query
Content-Type: application/json
{
"query": "up{job=\"blockchain-node\"}",
"range": "1h"
}
Alerts
Create Alert Rule
POST /api/v1/monitor/alerts/rules
Content-Type: application/json
{
"name": "high_cpu_usage",
"condition": "cpu_usage > 80",
"duration": 300,
"severity": "warning|critical",
"notification": "email|slack"
}
Get Active Alerts
GET /api/v1/monitor/alerts/active
Acknowledge Alert
POST /api/v1/monitor/alerts/{alert_id}/acknowledge
Logs
Query Logs
POST /api/v1/monitor/logs/query
Content-Type: application/json
{
"service": "blockchain-node",
"level": "ERROR",
"time_range": "1h",
"query": "error"
}
Get Log Statistics
GET /api/v1/monitor/logs/stats?service=blockchain-node
Configuration
Environment Variables
MONITOR_INTERVAL: Interval for health checks (default: 60s)ALERT_EMAIL: Email address for alert notificationsSLACK_WEBHOOK: Slack webhook for notificationsPROMETHEUS_URL: Prometheus server URLLOG_RETENTION_DAYS: Log retention period (default: 30 days)ALERT_COOLDOWN: Alert cooldown period (default: 300s)
Monitoring Targets
- Services: List of services to monitor
- Endpoints: Health check endpoints for each service
- Intervals: Check intervals for each service
Alert Rules
- CPU Usage: Alert when CPU usage exceeds threshold
- Memory Usage: Alert when memory usage exceeds threshold
- Disk Usage: Alert when disk usage exceeds threshold
- Service Down: Alert when service is unresponsive
Troubleshooting
Health check failing: Verify service endpoint and network connectivity.
Alerts not triggering: Check alert rule configuration and notification settings.
Metrics not collecting: Verify Prometheus integration and service metrics endpoints.
Logs not appearing: Check log aggregation configuration and service log paths.
Security Notes
- Secure access to monitoring dashboard
- Use authentication for API endpoints
- Encrypt alert notification credentials
- Implement role-based access control
- Regularly review alert rules
- Monitor for unauthorized access attempts