Server monitoring is one of those things everyone agrees is important and nobody wants to set up. The good news: you only need a few metrics to catch 90% of problems before they escalate. The bad news: most default monitoring setups track the wrong things or alert on noise.
The four resources that matter
Every server problem eventually shows up as one of these:
- CPU — sustained high usage, I/O wait, or single-core saturation
- Memory — running out of RAM, swapping, or OOM killer events
- Disk — running out of space or inodes, high I/O latency
- Network — bandwidth saturation, packet loss, connection limits
Everything else (load average, context switches, open files) is a symptom of one of these four.
Quick CLI checks
Before installing anything, learn to read the standard Linux tools:
# CPU: top processes, load, I/O wait
top -bn1 | head -20
# Memory: actual usage minus buffers/cache
free -h
# Disk usage: watch for >85% on any partition
df -h
# Disk I/O: look for high await (latency)
iostat -x 1 3
# Network: connections by state
ss -tuln | wc -l
ss -s
# Inodes: often overlooked, same consequences as disk full
df -i
The most common silent killer is inode exhaustion on small-file workloads (email servers, session storage, cache directories).
Setting up automated monitoring
Option 1: Netdata (beginner-friendly, zero config)
# One-line install
wget -O /tmp/netdata-kickstart.sh https://get.netdata.cloud/kickstart.sh && sh /tmp/netdata-kickstart.sh
Netdata gives you a real-time dashboard with hundreds of metrics, auto-detection of services, and pre-configured alarms. It is the best option if you want monitoring without configuration work.
Option 2: Prometheus + Node Exporter + Grafana (power user)
# Install node_exporter
wget https://github.com/prometheus/node_exporter/releases/download/v1.9.0/node_exporter-1.9.0.linux-amd64.tar.gz
tar xzf node_exporter-1.9.0.linux-amd64.tar.gz
sudo mv node_exporter-1.9.0.linux-amd64/node_exporter /usr/local/bin/
# Run as systemd service
sudo tee /etc/systemd/system/node_exporter.service << 'EOF'
[Unit]
Description=Prometheus Node Exporter
After=network.target
[Service]
ExecStart=/usr/local/bin/node_exporter
Restart=always
[Install]
WantedBy=multi-user.target
EOF
sudo systemctl daemon-reload
sudo systemctl enable --now node_exporter
This exposes metrics on port 9100 that Prometheus scrapes. Grafana visualises them. More setup, but infinitely customisable.
Option 3: Monit (lightweight process + resource monitoring)
sudo apt install monit
Monit is tiny and watches specific processes. It restarts them if they stop and can alert you. Good as a second layer alongside something like Netdata.
What to alert on
Do not alert on “CPU over 80%”. Alert on:
| Metric | Threshold | Why |
|---|---|---|
| Disk usage | > 85% | You still have time to react |
| Disk usage | > 95% | Critical — clear logs or expand immediately |
| Memory + swap | swap > 1GB AND memory < 10% free | System is thrashing |
| CPU iowait | > 30% for 5+ minutes | Disk bottleneck, not CPU |
| Service down | port not listening | Simple, unambiguous |
| SSL expiry | < 14 days | No excuse for expired certs |
| Disk inode usage | > 85% | Silent disk-full scenario |
Setting up alerts with Monit
# /etc/monit/monitrc
set mailserver smtp.example.com
set alert your@email.com
check filesystem root with path /
if space usage > 85% then alert
if inode usage > 85% then alert
check system example.com
if loadavg (1min) > 8 for 5 cycles then alert
if memory usage > 90% then alert
if swap usage > 25% then alert
check process nginx with pidfile /var/run/nginx.pid
start program = "/usr/bin/systemctl start nginx"
stop program = "/usr/bin/systemctl stop nginx"
if failed port 80 protocol http then restart
Disk monitoring specifics
The fastest way to fill a disk on a web server:
- Log files — set up logrotate properly
- WordPress backup plugins — they store backups in
wp-content/uploads/and never clean up - MySQL binary logs — set
expire_logs_days - Session files — PHP sessions never cleaned up
- Docker images — run
docker system prune -fweekly
# Find what's using space
du -sh /* 2>/dev/null | sort -rh | head -10
# Find large files
find / -type f -size +100M -exec ls -lh {} \; 2>/dev/null
# Check largest directories
du -sh /var/* | sort -rh | head -10
Practical monitoring workflow
- Install Netdata or node_exporter on every server
- Set up disk space and SSL expiry alerts (the two most common outages)
- Add service health checks (is nginx/mysql/php-fpm running?)
- Review dashboards weekly — look for trends, not spikes
- Add more specific alerts only when you get burnt by something
The goal is not to monitor everything. It is to know about problems before your users do.