We manage hundreds of servers at Canadian Web Hosting. When a customer calls at 3 AM because their site is down, the first question is always the same: what changed? Without monitoring, that question turns into 45 minutes of SSH-ing into boxes and reading logs. With monitoring, it’s a 30-second glance at a dashboard.

If you’re running anything in production — a web app, a database, a container stack — you need monitoring. But the open-source landscape is overwhelming. Prometheus? Zabbix? Netdata? Checkmk? They all claim to be the best. We’ve deployed most of them for customers over the years, so here’s what we’ve actually learned.

What Self-Hosted Monitoring Gets You

Before comparing tools, let’s be clear about what monitoring solves:

  • Downtime detection — know before your customers do
  • Capacity planning — see trends before you hit limits
  • Root cause analysis — correlate CPU, memory, disk, and network when something breaks
  • Compliance evidence — SOC 2 and PCI DSS require monitoring and alerting

Self-hosting your monitoring means your data stays in Canada (or wherever you choose), you control retention, and there are no per-host fees that balloon as you scale.

The Comparison: 12 Monitoring Tools, Tested

Tool Best For Min RAM Architecture
Prometheus Metrics + alerting (cloud-native) 2 GB Pull-based, PromQL
Grafana Dashboards + visualization 512 MB Query layer (pairs with everything)
Zabbix Enterprise infra monitoring 4 GB Agent-based, auto-discovery
Netdata Real-time per-host metrics 256 MB Per-node agent, zero config
Checkmk Traditional IT (network + servers) 2 GB Nagios-derived, agent + SNMP
Nagios Legacy check-based monitoring 1 GB Plugin-based, active checks
Icinga 2 Modern Nagios replacement 2 GB Cluster-capable, REST API
Uptime Kuma Simple uptime + status pages 256 MB Node.js, SQLite
VictoriaMetrics Long-term Prometheus storage 1 GB Prometheus-compatible, better compression
LibreNMS Network device monitoring 2 GB SNMP-based, auto-discovery
Sensu Go Pipeline-based observability 2 GB Agent + event pipeline
Monit Process watchdog 32 MB Lightweight, auto-restart

Prometheus + Grafana: The Modern Standard

If you’re starting fresh, this is probably the right answer. Prometheus scrapes metrics from your services every 15 seconds, stores them in a time-series database, and fires alerts through Alertmanager. Grafana gives you the dashboards.

Why it wins:

  • PromQL is powerful once you learn it — percentile calculations, rate functions, label filtering
  • Massive ecosystem: exporters exist for Docker, MySQL, PostgreSQL, Redis, Nginx, and hundreds more
  • Cloud-native: designed for containers and Kubernetes from the start
  • Alertmanager handles routing, silencing, and deduplication

Where it struggles:

  • Not a logs solution — you need a separate logging stack (Loki, ELK, syslog-ng)
  • Single-node by default — for long-term storage, add VictoriaMetrics or Thanos
  • Pull-based model means you need network access to every target

Production setup (Docker Compose):

version: "3.8"
services:
  prometheus:
    image: prom/prometheus:latest
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=90d'
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:latest
    volumes:
      - grafana_data:/var/lib/grafana
    ports:
      - "3000:3000"
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=changeme

  node-exporter:
    image: prom/node-exporter:latest
    pid: host
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'

  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"

volumes:
  prometheus_data:
  grafana_data:

Key Exporters Worth Adding

Exporter What It Monitors Port
node-exporter CPU, memory, disk, network 9100
cAdvisor Container metrics 8080
mysqld-exporter MySQL/MariaDB queries, connections 9104
postgres-exporter PostgreSQL stats, locks 9187
blackbox-exporter HTTP/TCP/ICMP probes (uptime) 9115

Zabbix: The Enterprise Workhorse

Zabbix has been around since 2001. It’s not flashy, but it monitors everything: servers, network gear, SNMP devices, IPMI, JMX, cloud APIs. If you have a mixed fleet of physical servers, switches, and VMs, Zabbix handles it all in one place.

Why teams choose it:

  • Auto-discovery finds new hosts and services automatically
  • Built-in alerting with escalation chains (email ? Slack ? PagerDuty)
  • Template library covers thousands of device types
  • Agent-based with low overhead per monitored host

The trade-offs:

  • Web UI feels dated compared to Grafana dashboards
  • MySQL/PostgreSQL backend needs tuning at scale (500+ hosts)
  • Configuration is template-heavy — steep learning curve
  • Needs 4+ GB RAM for the server itself

Netdata: Instant Visibility, Zero Config

Netdata is the fastest way to see what’s happening on a server. Install the agent, open port 19999, and you get 2,000+ metrics per second with per-second granularity. No configuration needed.

What makes it unique:

  • Installs in 30 seconds: bash <(curl -Ss https://get.netdata.cloud/kickstart.sh)
  • Per-second resolution (most tools do 15-60 second intervals)
  • Anomaly detection built in — flags unusual patterns automatically
  • Extremely low overhead (~2% CPU, ~100 MB RAM)

Limitations:

  • Per-host dashboards — no centralized multi-host view without Netdata Cloud (SaaS)
  • Short default retention (depends on RAM allocated to dbengine)
  • Not designed for alerting pipelines or complex routing

Uptime Kuma: Simple Status Pages

Not every project needs a full observability stack. If you just need to know “is it up?” and want a clean status page for customers, Uptime Kuma is the answer.

  • HTTP, TCP, DNS, Docker, and ping monitors
  • Beautiful status pages you can share with clients
  • Notifications via Slack, Discord, Telegram, email, and 90+ integrations
  • Runs on 256 MB RAM — fits on the smallest VPS

VictoriaMetrics: When Prometheus Runs Out of Disk

VictoriaMetrics is a drop-in replacement for Prometheus’s storage engine. It speaks PromQL, accepts Prometheus remote-write, and compresses data 7-10x better. If you’re keeping 6+ months of metrics, this saves significant disk space.

  • Drop-in: point Prometheus remote_write at VictoriaMetrics, done
  • Single binary, no dependencies
  • Handles millions of time series on modest hardware
  • Also works standalone (without Prometheus) using vmagent

The Legacy Options: Nagios, Cacti, Observium

We still see these on older infrastructure. They work, but there’s rarely a reason to deploy them fresh in 2026:

  • Nagios — the original. Plugin-based, check scripts, CGI web UI. Icinga 2 is the modern fork with clustering and a REST API.
  • Cacti — SNMP-based graphing. Good for network bandwidth charts, limited beyond that.
  • Observium — network monitoring. Community edition is free but feature-limited. LibreNMS is the actively-maintained fork.

If you’re running Nagios today, consider migrating to Checkmk (which runs Nagios plugins) or Icinga 2 (which has a Nagios-compatible config format).

How to Choose: Decision Tree

Scenario Recommended Stack Why
Containers + microservices Prometheus + Grafana Built for dynamic, ephemeral workloads
Mixed fleet (servers + network) Zabbix Agent + SNMP + auto-discovery
Quick single-server visibility Netdata Zero config, instant dashboards
Simple uptime monitoring Uptime Kuma Lightweight, clean status pages
Long-term metrics storage VictoriaMetrics + Grafana Better compression, PromQL compatible
Process watchdog Monit 32 MB RAM, auto-restarts crashed services
Legacy Nagios migration Checkmk or Icinga 2 Plugin-compatible, modern features
Network devices only LibreNMS SNMP-native, auto-discovery

Hosting Requirements

Monitoring tools range from tiny (Monit at 32 MB) to resource-hungry (Zabbix at 4+ GB). Here’s what we recommend for a production deployment:

Stack CPU RAM Storage
Prometheus + Grafana (small) 2 cores 4 GB 50 GB SSD
Prometheus + Grafana (100+ targets) 4 cores 8 GB 200 GB SSD
Zabbix (enterprise) 4 cores 8 GB 100 GB SSD
Netdata (per host) 1 core 1 GB 10 GB
Uptime Kuma 1 core 512 MB 5 GB
VictoriaMetrics (long-term) 2 cores 4 GB 500 GB SSD

A Canadian Web Hosting Cloud VPS handles Prometheus + Grafana comfortably starting at the 4 GB tier. For Zabbix monitoring 500+ hosts, a dedicated server gives you the I/O headroom the database needs.

Running monitoring for compliance (SOC 2, PCI DSS)? Our infrastructure is SOC 2 Type II certified, and our Managed Security team can help with the alerting and audit trail requirements.

Hardening Checklist

Whichever tool you choose, lock it down before exposing it:

  • Reverse proxy — put Nginx or Caddy in front with TLS. Never expose Prometheus or Grafana directly on port 9090/3000.
  • Authentication — enable auth on Grafana (default admin/admin is a gift to attackers). Use OAuth or LDAP if possible.
  • Firewall — restrict exporter ports (9100, 9090, etc.) to your monitoring server’s IP only.
  • Retention policy — set --storage.tsdb.retention.time in Prometheus. 90 days is a good default; use VictoriaMetrics for longer.
  • Backups — snapshot Prometheus data and Grafana dashboards. A monitoring system that loses its history is useless for trend analysis.

Need help with the initial setup or ongoing management? Our Managed Support team handles monitoring deployments — we’ll configure alerting, dashboards, and retention so you can focus on your application.

What’s Next

Monitoring is only half the picture. You also need centralized logging to correlate metrics with events. When a CPU spike happens, logs tell you why. We cover logging stacks (Loki, ELK, Graylog) in a separate comparison.

For container-heavy environments, pair your monitoring with proper Docker troubleshooting practices and VPS hardening to keep everything stable.