Monitoring & Alerting

Setting up comprehensive monitoring with Discord, email, and dashboard alerts.

9 min read

monitoringalertsdiscordemail

Effective monitoring is the backbone of a reliable Flux node hosting operation. Without proper monitoring and alerting, you're flying blind — node failures, benchmark degradations, and reward losses can go unnoticed for hours or days. This guide covers building a comprehensive monitoring stack.

Essential Metrics to Monitor

Every FluxNode generates metrics that indicate its health, performance, and reward status. Here are the critical ones:

Metric	Why It Matters	Alert Threshold
Node status	CONFIRMED = earning rewards	Any status change
Benchmark status	Must pass to stay active	Any benchmark failure
Uptime %	Affects PNR eligibility	Below 97%
EPS score	CPU performance benchmark	Below minimum for tier
Disk usage	Full disks crash nodes	Above 85%
RAM usage	High RAM usage causes swapping	Above 90%
Node rank	Lower rank = more frequent rewards	Sudden jumps
FluxOS version	Outdated versions may fail	Behind latest by 1+
Last reward	Confirms node is earning	No reward for 2x expected interval

Built-in FluxOS Monitoring

Every FluxNode has a built-in web UI accessible at http://YOUR_IP:16126. This dashboard provides real-time information about your node including:

•Node Status — current blockchain confirmation status
•Benchmark Results — latest EPS, RAM, SSD, and network scores
•Connected Peers — how many other nodes you're connected to
•Running Apps — Docker containers deployed on the node
•Resource Usage — current CPU, RAM, and disk utilization
•Flux Daemon Info — blockchain sync status and chain height

The FluxOS API is also available programmatically. You can query node status via:

Query node status via FluxOS API

# Get node status
curl http://YOUR_IP:16127/flux/info

# Get benchmark status
curl http://YOUR_IP:16127/benchmark/getbenchmarks

# Get FluxOS version
curl http://YOUR_IP:16127/flux/version

# Get running apps
curl http://YOUR_IP:16127/apps/installedapps

External Monitoring Tools

FluxNodes.net

FluxNodes.net is a community-operated network explorer that provides a comprehensive view of all Flux nodes. You can look up any node by IP address or Zel ID to check its status, benchmarks, rank, and reward history. It's invaluable for verifying node health from an external perspective.

UptimeRobot / Hetrixtools

Third-party uptime monitoring services can ping your node's FluxOS API endpoint and alert you when it becomes unreachable. This catches network-level issues that the node itself can't report.

1
Create an account
Sign up for a free plan on UptimeRobot (50 monitors free) or Hetrixtools (15 monitors free).
2
Add HTTP monitors
Monitor http://YOUR_IP:16127/flux/info — this endpoint returns node info when FluxOS is running.
3
Set check interval
5-minute intervals are sufficient for most providers. Premium plans offer 1-minute intervals.
4
Configure alerts
Set up email, SMS, or webhook notifications for downtime events.

Discord Webhook Alerts

Discord webhooks are a popular and free way to get real-time alerts in a team channel. Here's how to set them up:

1
Create a Discord channel
Create a dedicated #node-alerts channel in your Discord server.
2
Create a webhook
Channel Settings → Integrations → Webhooks → New Webhook. Copy the webhook URL.
3
Write a monitoring script
Create a bash script that checks node status via the FluxOS API and sends alerts to the webhook when issues are detected.
4
Schedule with cron
Run the script every 5 minutes via a cron job on a separate monitoring server (not the node itself).

Simple Discord alert script (check_node.sh)

#!/bin/bash
NODE_IP="YOUR_IP"
WEBHOOK_URL="YOUR_DISCORD_WEBHOOK_URL"

# Check if FluxOS API responds
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
  --connect-timeout 10 \
  "http://$NODE_IP:16127/flux/info")

if [ "$STATUS" != "200" ]; then
  curl -H "Content-Type: application/json" \
    -d "{"content": "⚠️ **Node Alert**: $NODE_IP is unreachable (HTTP $STATUS)"}" \
    "$WEBHOOK_URL"
fi

Cron job (runs every 5 minutes)

# Add to crontab: crontab -e
*/5 * * * * /home/user/check_node.sh >> /var/log/node-monitor.log 2>&1

Fleet-Wide Monitoring Dashboard

When managing 10+ nodes, individual monitoring becomes impractical. You need a centralized dashboard that shows the health of your entire fleet at a glance.

•Fluxme.io Dashboard — the built-in monitoring on this platform shows fleet-wide status, alerts, and performance metrics
•Custom Grafana setup — for advanced providers: collect metrics with Prometheus, visualize with Grafana. Query FluxOS API from each node and aggregate.
•Spreadsheet tracking — for smaller fleets: maintain a simple spreadsheet with node IPs, status, last benchmark, last reward, expiry date

Automated Remediation

For common, well-understood issues, automated remediation can save significant time:

•Auto-restart FluxOS — if the FluxOS service stops, a systemd watchdog or script can restart it automatically
•Disk cleanup — automated scripts to clean Docker images, logs, and temporary files when disk usage exceeds 80%
•Benchmark recovery — if a benchmark fails due to temporary load, a script can restart the daemon and force a re-benchmark
•FluxOS auto-update — scripts that check for new FluxOS versions and apply updates during maintenance windows

Always test automated remediation scripts thoroughly before deploying to production. A buggy auto-restart script can cause more downtime than it prevents. Start with monitoring-only, then add automation gradually.

Escalation Procedures

Define clear escalation paths for different severity levels:

Severity	Example	Response	Escalation
Critical	Multiple nodes offline	Immediate investigation	Wake up on-call engineer
High	Single node benchmark failure	Within 1 hour	Notify lead engineer
Medium	Disk usage above 85%	Within 4 hours	Standard ticket
Low	Minor version behind	Within 24 hours	Add to maintenance queue