Files
oib d98b2c7772 Based on the repository's commit message style and the changes in the diff, here's an appropriate commit message:
```
feat: add websocket tests, PoA metrics, marketplace endpoints, and enhanced observability

- Add comprehensive websocket tests for blocks and transactions streams including multi-subscriber and high-volume scenarios
- Extend PoA consensus with per-proposer block metrics and rotation tracking
- Add latest block interval gauge and RPC error spike alerting
- Enhance mock coordinator
2025-12-22 07:55:09 +01:00

44 lines
2.5 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Blockchain Node Observability
This directory contains Prometheus and Grafana assets for the devnet environment. The stack relies on the HTTP `/metrics` endpoint exposed by:
1. The blockchain node API (`http://127.0.0.1:8080/metrics`).
2. The mock coordinator/miner exporter (`http://127.0.0.1:8090/metrics`).
## Files
- `prometheus.yml` Scrapes both blockchain node and mock coordinator/miner metrics.
- `grafana-dashboard.json` Panels for block interval (including latest interval gauge), RPC throughput, miner activity, coordinator receipt flow, gossip queue/subscriber/publication metrics, and PoA proposer visibility (rotation counts, blocks proposed per proposer).
- `alerts.yml` Alertmanager rules highlighting proposer stalls, miner errors, and coordinator receipt drop-offs.
- `gossip-recording-rules.yml` Prometheus recording rules that derive queue/subscriber gauges and publication rates from gossip metrics.
## Usage
```bash
# Launch Prometheus using the sample config
prometheus --config.file=apps/blockchain-node/observability/prometheus.yml
# Import the dashboard JSON into Grafana
grafana-cli dashboards import apps/blockchain-node/observability/grafana-dashboard.json
# Run Alertmanager with the example rules
alertmanager --config.file=apps/blockchain-node/observability/alerts.yml
# Reload Prometheus and Alertmanager after tuning thresholds
kill -HUP $(pgrep prometheus)
kill -HUP $(pgrep alertmanager)
```
> **Tip:** The devnet helper `scripts/devnet_up.sh` seeds the metrics endpoints. After running it, both scrape targets will begin emitting data in under a minute.
## Gossip Observability
Recent updates instrumented the gossip broker with Prometheus counters and gauges. Key metrics surfaced via the recording rules and dashboard include:
- `gossip_publications_rate_per_sec` and `gossip_broadcast_publications_rate_per_sec` per-second publication throughput for in-memory and broadcast backends.
- `gossip_publications_topic_rate_per_sec` topic-level publication rate time series (Grafana panel “Gossip Publication Rate by Topic”).
- `gossip_queue_size_by_topic` instantaneous queue depth per topic (“Gossip Queue Depth by Topic”).
- `gossip_subscribers_by_topic`, `gossip_subscribers_total`, `gossip_broadcast_subscribers_total` subscriber counts (“Gossip Subscriber Counts”).
Use these panels to monitor convergence/back-pressure during load tests (for example with `scripts/ws_load_test.py`) when running against a Redis-backed broadcast backend.