Server monitoring is one of those things everyone agrees is important and nobody wants to set up. The good news: you only need a few metrics to catch 90% of problems before they escalate. The bad news: most default monitoring setups track the wrong things or alert on noise.

The four resources that matter

Every server problem eventually shows up as one of these:

  1. CPU — sustained high usage, I/O wait, or single-core saturation
  2. Memory — running out of RAM, swapping, or OOM killer events
  3. Disk — running out of space or inodes, high I/O latency
  4. Network — bandwidth saturation, packet loss, connection limits

Everything else (load average, context switches, open files) is a symptom of one of these four.

Quick CLI checks

Before installing anything, learn to read the standard Linux tools:

# CPU: top processes, load, I/O wait
top -bn1 | head -20

# Memory: actual usage minus buffers/cache
free -h

# Disk usage: watch for >85% on any partition
df -h

# Disk I/O: look for high await (latency)
iostat -x 1 3

# Network: connections by state
ss -tuln | wc -l
ss -s

# Inodes: often overlooked, same consequences as disk full
df -i

The most common silent killer is inode exhaustion on small-file workloads (email servers, session storage, cache directories).

Setting up automated monitoring

Option 1: Netdata (beginner-friendly, zero config)

# One-line install
wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh && sh /tmp/netdata-kickstart.sh

Netdata gives you a real-time dashboard with hundreds of metrics, auto-detection of services, and pre-configured alarms. It is the best option if you want monitoring without configuration work.

Option 2: Prometheus + Node Exporter + Grafana (power user)

# Install node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.9.0/node_exporter-1.9.0.linux-amd64.tar.gz
tar xzf node_exporter-1.9.0.linux-amd64.tar.gz
sudo mv node_exporter-1.9.0.linux-amd64/node_exporter /usr/local/bin/

# Run as systemd service
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Prometheus Node Exporter
After=network.target

[Service]
ExecStart=/usr/local/bin/node_exporter
Restart=always

[Install]
WantedBy=multi-user.target
EOF

sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter

This exposes metrics on port 9100 that Prometheus scrapes. Grafana visualises them. More setup, but infinitely customisable.

Option 3: Monit (lightweight process + resource monitoring)

sudo apt install monit

Monit is tiny and watches specific processes. It restarts them if they stop and can alert you. Good as a second layer alongside something like Netdata.

What to alert on

Do not alert on “CPU over 80%”. Alert on:

MetricThresholdWhy
Disk usage> 85%You still have time to react
Disk usage> 95%Critical — clear logs or expand immediately
Memory + swapswap > 1GB AND memory < 10% freeSystem is thrashing
CPU iowait> 30% for 5+ minutesDisk bottleneck, not CPU
Service downport not listeningSimple, unambiguous
SSL expiry< 14 daysNo excuse for expired certs
Disk inode usage> 85%Silent disk-full scenario

Setting up alerts with Monit

# /etc/monit/monitrc
set mailserver smtp.example.com
set alert your@email.com

check filesystem root with path /
  if space usage > 85% then alert
  if inode usage > 85% then alert

check system example.com
  if loadavg (1min) > 8 for 5 cycles then alert
  if memory usage > 90% then alert
  if swap usage > 25% then alert

check process nginx with pidfile /var/run/nginx.pid
  start program = "/usr/bin/systemctl start nginx"
  stop program = "/usr/bin/systemctl stop nginx"
  if failed port 80 protocol http then restart

Disk monitoring specifics

The fastest way to fill a disk on a web server:

  1. Log files — set up logrotate properly
  2. WordPress backup plugins — they store backups in wp-content/uploads/ and never clean up
  3. MySQL binary logs — set expire_logs_days
  4. Session files — PHP sessions never cleaned up
  5. Docker images — run docker system prune -f weekly
# Find what's using space
du -sh /* 2>/dev/null | sort -rh | head -10

# Find large files
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null

# Check largest directories
du -sh /var/* | sort -rh | head -10

Practical monitoring workflow

  1. Install Netdata or node_exporter on every server
  2. Set up disk space and SSL expiry alerts (the two most common outages)
  3. Add service health checks (is nginx/mysql/php-fpm running?)
  4. Review dashboards weekly — look for trends, not spikes
  5. Add more specific alerts only when you get burnt by something

The goal is not to monitor everything. It is to know about problems before your users do.