Monitoring & Alerting
Setting up comprehensive monitoring with Discord, email, and dashboard alerts.
Effective monitoring is the backbone of a reliable Flux node hosting operation. Without proper monitoring and alerting, you're flying blind β node failures, benchmark degradations, and reward losses can go unnoticed for hours or days. This guide covers building a comprehensive monitoring stack.
Essential Metrics to Monitor
Every FluxNode generates metrics that indicate its health, performance, and reward status. Here are the critical ones:
| Metric | Why It Matters | Alert Threshold |
|---|---|---|
| Node status | CONFIRMED = earning rewards | Any status change |
| Benchmark status | Must pass to stay active | Any benchmark failure |
| Uptime % | Affects PNR eligibility | Below 97% |
| EPS score | CPU performance benchmark | Below minimum for tier |
| Disk usage | Full disks crash nodes | Above 85% |
| RAM usage | High RAM usage causes swapping | Above 90% |
| Node rank | Lower rank = more frequent rewards | Sudden jumps |
| FluxOS version | Outdated versions may fail | Behind latest by 1+ |
| Last reward | Confirms node is earning | No reward for 2x expected interval |
Built-in FluxOS Monitoring
Every FluxNode has a built-in web UI accessible at http://YOUR_IP:16126. This dashboard provides real-time information about your node including:
- β’Node Status β current blockchain confirmation status
- β’Benchmark Results β latest EPS, RAM, SSD, and network scores
- β’Connected Peers β how many other nodes you're connected to
- β’Running Apps β Docker containers deployed on the node
- β’Resource Usage β current CPU, RAM, and disk utilization
- β’Flux Daemon Info β blockchain sync status and chain height
The FluxOS API is also available programmatically. You can query node status via:
Query node status via FluxOS API
# Get node status
curl http://YOUR_IP:16127/flux/info
# Get benchmark status
curl http://YOUR_IP:16127/benchmark/getbenchmarks
# Get FluxOS version
curl http://YOUR_IP:16127/flux/version
# Get running apps
curl http://YOUR_IP:16127/apps/installedappsExternal Monitoring Tools
FluxNodes.net
FluxNodes.net is a community-operated network explorer that provides a comprehensive view of all Flux nodes. You can look up any node by IP address or Zel ID to check its status, benchmarks, rank, and reward history. It's invaluable for verifying node health from an external perspective.
UptimeRobot / Hetrixtools
Third-party uptime monitoring services can ping your node's FluxOS API endpoint and alert you when it becomes unreachable. This catches network-level issues that the node itself can't report.
- 1
Create an account
Sign up for a free plan on UptimeRobot (50 monitors free) or Hetrixtools (15 monitors free).
- 2
Add HTTP monitors
Monitor http://YOUR_IP:16127/flux/info β this endpoint returns node info when FluxOS is running.
- 3
Set check interval
5-minute intervals are sufficient for most providers. Premium plans offer 1-minute intervals.
- 4
Configure alerts
Set up email, SMS, or webhook notifications for downtime events.
Discord Webhook Alerts
Discord webhooks are a popular and free way to get real-time alerts in a team channel. Here's how to set them up:
- 1
Create a Discord channel
Create a dedicated #node-alerts channel in your Discord server.
- 2
Create a webhook
Channel Settings β Integrations β Webhooks β New Webhook. Copy the webhook URL.
- 3
Write a monitoring script
Create a bash script that checks node status via the FluxOS API and sends alerts to the webhook when issues are detected.
- 4
Schedule with cron
Run the script every 5 minutes via a cron job on a separate monitoring server (not the node itself).
Simple Discord alert script (check_node.sh)
#!/bin/bash
NODE_IP="YOUR_IP"
WEBHOOK_URL="YOUR_DISCORD_WEBHOOK_URL"
# Check if FluxOS API responds
STATUS=$(curl -s -o /dev/null -w "%{http_code}" \
--connect-timeout 10 \
"http://$NODE_IP:16127/flux/info")
if [ "$STATUS" != "200" ]; then
curl -H "Content-Type: application/json" \
-d "{"content": "β οΈ **Node Alert**: $NODE_IP is unreachable (HTTP $STATUS)"}" \
"$WEBHOOK_URL"
fiCron job (runs every 5 minutes)
# Add to crontab: crontab -e
*/5 * * * * /home/user/check_node.sh >> /var/log/node-monitor.log 2>&1Fleet-Wide Monitoring Dashboard
When managing 10+ nodes, individual monitoring becomes impractical. You need a centralized dashboard that shows the health of your entire fleet at a glance.
- β’Fluxme.io Dashboard β the built-in monitoring on this platform shows fleet-wide status, alerts, and performance metrics
- β’Custom Grafana setup β for advanced providers: collect metrics with Prometheus, visualize with Grafana. Query FluxOS API from each node and aggregate.
- β’Spreadsheet tracking β for smaller fleets: maintain a simple spreadsheet with node IPs, status, last benchmark, last reward, expiry date
Automated Remediation
For common, well-understood issues, automated remediation can save significant time:
- β’Auto-restart FluxOS β if the FluxOS service stops, a systemd watchdog or script can restart it automatically
- β’Disk cleanup β automated scripts to clean Docker images, logs, and temporary files when disk usage exceeds 80%
- β’Benchmark recovery β if a benchmark fails due to temporary load, a script can restart the daemon and force a re-benchmark
- β’FluxOS auto-update β scripts that check for new FluxOS versions and apply updates during maintenance windows
Always test automated remediation scripts thoroughly before deploying to production. A buggy auto-restart script can cause more downtime than it prevents. Start with monitoring-only, then add automation gradually.
Escalation Procedures
Define clear escalation paths for different severity levels:
| Severity | Example | Response | Escalation |
|---|---|---|---|
| Critical | Multiple nodes offline | Immediate investigation | Wake up on-call engineer |
| High | Single node benchmark failure | Within 1 hour | Notify lead engineer |
| Medium | Disk usage above 85% | Within 4 hours | Standard ticket |
| Low | Minor version behind | Within 24 hours | Add to maintenance queue |
Other articles in Best Practices
Security Best Practices
Protecting your keys, securing your nodes, and backup strategies.
Provider Pricing Strategy
How to price your hosting services competitively while maintaining profitability.
Client Management Guide
Best practices for onboarding, communicating with, and retaining hosting clients.
Scaling Your Operations
Strategies for growing your hosting business and automating operations.
Provider Agent β Turnkey Solution
How the Provider Agent transforms any ArcaneOS node owner into a hosting provider with automated setup, payments, and client management.