Symptom: Your Server is Unresponsive and You Don’t Know Why
You SSH into your VPS and everything feels sluggish. Commands take seconds to respond. Your website loads slowly or times out. Maybe you got an alert from your monitoring system—or worse, a customer complaint.
The top command shows load numbers, but what do they actually mean? Is it CPU? Memory? Disk I/O? A runaway process?
This guide walks you through diagnosing high server load systematically, so you can identify the root cause and fix it before it escalates.
Quick Diagnosis: Check Load Average First
Run this immediately:
uptime
You’ll see something like:
load average: 4.52, 3.21, 2.89
These three numbers represent the average number of processes waiting for CPU time over the last 1, 5, and 15 minutes.
Rule of thumb: If load average consistently exceeds your number of CPU cores, your server is overloaded. On a 4-core VPS, a load of 5+ indicates contention.
Check your CPU cores:
nproc
If nproc returns 4 and your load is 8, you have twice as many processes competing for CPU than you can handle.
Root Cause #1: CPU Saturation
Symptoms
- Load average high (above CPU cores)
%Cpu(s)in top shows 90%+ in “us” (user) or “sy” (system)- Commands feel slow even for simple operations
How to Confirm
top -b -n1 | head -20
Look at the CPU line. High %us means user processes are consuming CPU. High %sy means kernel operations (often I/O).
Find the culprit:
ps aux --sort=-%cpu | head -10
This shows the top 10 CPU-consuming processes.
Fix
Single runaway process:
kill -9 <PID>
Consistent high CPU from an app:
- Optimize the application code
- Scale vertically (upgrade to more CPU cores)
- Scale horizontally (add another server and load balance)
Cryptominer or malware:
If an unknown process is consuming CPU, investigate immediately:
ls -l /proc/<PID>/exe
This shows the actual binary path. If it’s suspicious, kill it and audit your security.
Verify the Fix
uptime && top -b -n1 | head -5
Load should drop within seconds of killing a runaway process.
Root Cause #2: Memory Exhaustion
Symptoms
- Load average high but CPU not fully utilized
freeshows almost no available memory- System feels “swap-thrashing” (periodic freezes)
- dmesg shows “Out of memory” messages
How to Confirm
free -h
Look at the “available” column, not “free”. If available is under 10% of total, you’re memory-constrained.
Find memory hogs:
ps aux --sort=-%mem | head -10
Check for OOM kills:
dmesg -T | grep -i "out of memory" | tail -20
Fix
Single process consuming too much:
kill -9 <PID>
MySQL/PostgreSQL using too much:
Adjust buffer pool sizes in config:
# MySQL: /etc/mysql/mysql.conf.d/mysqld.cnf
innodb_buffer_pool_size = 1G # Set to ~70% of available RAM
Multiple apps competing:
- Move some services to another VPS
- Upgrade to more RAM
- Add swap (temporary fix, not ideal for production):
fallocate -l 2G /swapfile
chmod 600 /swapfile
mkswap /swapfile
swapon /swapfile
Verify the Fix
free -h && uptime
Available memory should increase, load should stabilize.
Root Cause #3: Disk I/O Bottleneck
Symptoms
- Load average high, CPU shows high
%wa(I/O wait) - Commands involving disk access are slow
iostatshows high await times
How to Confirm
iostat -x 1 5
Look for:
%utilnear 100% = disk is saturatedawaitover 50ms = slow response times
Find processes doing I/O:
iotop -o
(Requires sudo and apt install iotop)
Fix
Database doing heavy reads:
- Add indexes to reduce full table scans
- Enable query caching
- Move database to dedicated storage
Log files filling disk:
du -sh /var/log/* | sort -rh | head -10
Set up log rotation or send logs to centralized logging.
Storage is genuinely slow:
- Upgrade to SSD if using HDD
- Move to NVMe for high-I/O workloads
Verify the Fix
iostat -x 1 3
%util should drop below 80%, await under 20ms for SSD.
Root Cause #4: Network Saturation
Symptoms
- Load normal, but services feel slow
- High network traffic on monitoring
- Packet loss or high latency
How to Confirm
iftop -n
(Requires sudo apt install iftop)
Or use:
nload
Check for bandwidth saturation. If you’re hitting your VPS network cap, traffic will queue.
Fix
- Enable compression for web traffic (gzip/brotli)
- Use a CDN for static assets
- Upgrade to higher bandwidth tier
- Rate-limit abusive clients
Root Cause #5: Zombie or Defunct Processes
Symptoms
- Load higher than expected for actual work
ps auxshows processes with statusZ
How to Confirm
ps aux | awk '$8 ~ /Z/ {print}'
If this returns results, you have zombie processes.
Fix
Zombies can’t be killed directly—they’re already dead. You must restart their parent process.
ps -o ppid= -p <zombie_PID>
Find the parent PID and restart that service. If init (PID 1) is the parent, a reboot may be necessary.
Diagnostic Flowchart
Follow this sequence to systematically identify the cause:
- Run
uptime— Is load above CPU cores?
? No: Check network, not load. End guide.
? Yes: Continue to step 2. - Run
top -b -n1 | head -5
? High%wa(I/O wait)? ? Disk I/O issue (Root Cause #3)
? High%usor%sy? ? Continue to step 3. - Run
free -h
? Available memory under 10%? ? Memory issue (Root Cause #2)
? Memory OK? ? CPU saturation (Root Cause #1) - Still unclear? Check for zombies (Root Cause #5) or network saturation (Root Cause #4).
Prevention: Monitoring That Catches Problems Early
Reactive debugging is stressful. Set up proactive monitoring:
1. Install Node Exporter for Metrics
docker run -d --name node-exporter \
--net="host" \
--pid="host" \
-v "/:/host:ro,rslave" \
quay.io/prometheus/node-exporter:latest \
--path.rootfs=/host
This exposes CPU, memory, disk, and network metrics.
2. Set Up Prometheus + Grafana
See our Grafana setup guide for a complete monitoring stack.
3. Configure Alerts
Set alerts for:
- Load > 80% of CPU cores for 5 minutes
- Memory available < 10%
- Disk utilization > 90%
- Disk await > 100ms
This way, you catch issues before customers do.
4. Use CWH Managed Monitoring
Not ready to self-host a monitoring stack? Canadian Web Hosting offers managed monitoring with proactive alerting—our team watches your infrastructure 24/7 and escalates before problems become outages.
When to Escalate
Sometimes the issue is beyond your control:
- Hypervisor problems: If your VPS host is oversubscribed, even an idle VM will feel slow. Contact support.
- Hardware failure: Repeated disk errors or kernel panics may indicate failing hardware. Open a ticket.
- DDoS attack: If traffic is 100x normal and you can’t filter it, you need network-level protection.
CWH’s Managed Security team can help with DDoS mitigation and incident response.
Recommended Hosting for Diagnostics and Production
You need root access to run the diagnostic commands in this guide. That means shared hosting won’t work—you need a VPS or dedicated server.
For small workloads: A Cloud VPS with 2-4 CPU cores and 4-8GB RAM is sufficient for most self-hosted apps.
For production databases: Consider a dedicated server or Enterprise Cloud with guaranteed resources and automatic failover.
For high I/O workloads: NVMe storage is essential—traditional SSDs will bottleneck under heavy database load.
All CWH VPS plans include Canadian data centres (Vancouver, Toronto), SOC 2 Type II compliance, and 24/7 support.
Related Troubleshooting Guides
- Why Your Docker Container Keeps Restarting — container-specific diagnostics
- Fix Slow WordPress Admin — WordPress-specific performance
- Why Your WooCommerce Store is Slow — e-commerce performance troubleshooting
- When Graylog Disk Fills Up — logging stack recovery
- tini: The Missing Init System for Docker Containers — preventing zombie processes
Be First to Comment