Files
aitbc/docs/apps/infrastructure/monitor.md
aitbc 522655ef92
Some checks failed
API Endpoint Tests / test-api-endpoints (push) Successful in 10s
Blockchain Synchronization Verification / sync-verification (push) Failing after 3s
CLI Tests / test-cli (push) Failing after 4s
Documentation Validation / validate-docs (push) Successful in 8s
Documentation Validation / validate-policies-strict (push) Successful in 4s
Integration Tests / test-service-integration (push) Successful in 38s
Multi-Node Blockchain Health Monitoring / health-check (push) Successful in 2s
P2P Network Verification / p2p-verification (push) Successful in 3s
Security Scanning / security-scan (push) Successful in 40s
Smart Contract Tests / test-solidity (map[name:aitbc-token path:packages/solidity/aitbc-token]) (push) Successful in 15s
Smart Contract Tests / lint-solidity (push) Successful in 8s
Move blockchain app READMEs to centralized documentation
- Relocate blockchain-event-bridge README content to docs/apps/blockchain/blockchain-event-bridge.md
- Relocate blockchain-explorer README content to docs/apps/blockchain/blockchain-explorer.md
- Replace app READMEs with redirect notices pointing to new documentation location
- Consolidate documentation in central docs/ directory for better organization
2026-04-23 12:24:48 +02:00

214 lines
4.8 KiB
Markdown

# Monitor
## Status
✅ Operational
## Overview
System monitoring and alerting service for tracking application health, performance metrics, and generating alerts for critical events.
## Architecture
### Core Components
- **Health Check Service**: Periodic health checks for all services
- **Metrics Collector**: Collects performance metrics from applications
- **Alert Manager**: Manages alert rules and notifications
- **Dashboard**: Web dashboard for monitoring visualization
- **Log Aggregator**: Aggregates logs from all services
- **Notification Service**: Sends alerts via email, Slack, etc.
## Quick Start (End Users)
### Prerequisites
- Python 3.13+
- Access to application endpoints
- Notification service credentials (email, Slack webhook)
### Installation
```bash
cd /opt/aitbc/apps/monitor
.venv/bin/pip install -r requirements.txt
```
### Configuration
Set environment variables in `.env`:
```bash
MONITOR_INTERVAL=60
ALERT_EMAIL=admin@example.com
SLACK_WEBHOOK=https://hooks.slack.com/services/...
PROMETHEUS_URL=http://localhost:9090
```
### Running the Service
```bash
.venv/bin/python main.py
```
### Access Dashboard
Open `http://localhost:8080` in a browser to access the monitoring dashboard.
## Developer Guide
### Development Setup
1. Clone the repository
2. Create virtual environment: `python -m venv .venv`
3. Install dependencies: `pip install -r requirements.txt`
4. Configure monitoring targets
5. Run tests: `pytest tests/`
### Project Structure
```
monitor/
├── src/
│ ├── health_check/ # Health check service
│ ├── metrics_collector/ # Metrics collection
│ ├── alert_manager/ # Alert management
│ ├── dashboard/ # Web dashboard
│ ├── log_aggregator/ # Log aggregation
│ └── notification/ # Notification service
├── tests/ # Test suite
└── pyproject.toml # Project configuration
```
### Testing
```bash
# Run all tests
pytest tests/
# Run health check tests
pytest tests/test_health_check.py
# Run alert manager tests
pytest tests/test_alerts.py
```
## API Reference
### Health Checks
#### Run Health Check
```http
GET /api/v1/monitor/health/{service_name}
```
#### Get All Health Status
```http
GET /api/v1/monitor/health
```
#### Add Health Check Target
```http
POST /api/v1/monitor/health/targets
Content-Type: application/json
{
"service_name": "string",
"endpoint": "http://localhost:8000/health",
"interval": 60,
"timeout": 10
}
```
### Metrics
#### Get Metrics
```http
GET /api/v1/monitor/metrics?service=blockchain-node
```
#### Query Prometheus
```http
POST /api/v1/monitor/metrics/query
Content-Type: application/json
{
"query": "up{job=\"blockchain-node\"}",
"range": "1h"
}
```
### Alerts
#### Create Alert Rule
```http
POST /api/v1/monitor/alerts/rules
Content-Type: application/json
{
"name": "high_cpu_usage",
"condition": "cpu_usage > 80",
"duration": 300,
"severity": "warning|critical",
"notification": "email|slack"
}
```
#### Get Active Alerts
```http
GET /api/v1/monitor/alerts/active
```
#### Acknowledge Alert
```http
POST /api/v1/monitor/alerts/{alert_id}/acknowledge
```
### Logs
#### Query Logs
```http
POST /api/v1/monitor/logs/query
Content-Type: application/json
{
"service": "blockchain-node",
"level": "ERROR",
"time_range": "1h",
"query": "error"
}
```
#### Get Log Statistics
```http
GET /api/v1/monitor/logs/stats?service=blockchain-node
```
## Configuration
### Environment Variables
- `MONITOR_INTERVAL`: Interval for health checks (default: 60s)
- `ALERT_EMAIL`: Email address for alert notifications
- `SLACK_WEBHOOK`: Slack webhook for notifications
- `PROMETHEUS_URL`: Prometheus server URL
- `LOG_RETENTION_DAYS`: Log retention period (default: 30 days)
- `ALERT_COOLDOWN`: Alert cooldown period (default: 300s)
### Monitoring Targets
- **Services**: List of services to monitor
- **Endpoints**: Health check endpoints for each service
- **Intervals**: Check intervals for each service
### Alert Rules
- **CPU Usage**: Alert when CPU usage exceeds threshold
- **Memory Usage**: Alert when memory usage exceeds threshold
- **Disk Usage**: Alert when disk usage exceeds threshold
- **Service Down**: Alert when service is unresponsive
## Troubleshooting
**Health check failing**: Verify service endpoint and network connectivity.
**Alerts not triggering**: Check alert rule configuration and notification settings.
**Metrics not collecting**: Verify Prometheus integration and service metrics endpoints.
**Logs not appearing**: Check log aggregation configuration and service log paths.
## Security Notes
- Secure access to monitoring dashboard
- Use authentication for API endpoints
- Encrypt alert notification credentials
- Implement role-based access control
- Regularly review alert rules
- Monitor for unauthorized access attempts