Downtime has a pattern. It rarely arrives without warning — it builds from small, observable signals that nobody noticed until the situation became urgent. A PHP-FPM process consuming twice its normal memory. A disk filling steadily for three days. A service that has been failing and restarting silently for a week.
Expensive monitoring platforms catch these signals. So do free ones. And so do the native tools already installed on every Ubuntu VPS — if you know how to use them.
This guide gives you a practical, lightweight monitoring workflow using journalctl, htop, and a small custom health script. No additional software to install, no monthly fees, and enough visibility to catch most problems before they reach your users.
Why native tools are enough to start
Many teams deploy elaborate monitoring stacks (Datadog, Grafana, Prometheus) before they have enough traffic to need them. These tools are excellent — but they require setup time, ongoing maintenance, and often a learning curve.
For a VPS running a business site, API, or web application, the native Linux toolset covers the fundamentals:
- What processes are using resources right now? (htop)
- What did a service log around the time a problem occurred? (journalctl)
- Is anything failing silently? (systemctl)
- Is disk or memory under pressure? (df, free)
Once you have mastered these, adding a hosted monitoring service on top makes sense. Starting there first often just adds complexity without clarity.
Tool 1: htop — real-time process monitoring
htop is an interactive process viewer. Unlike top, which ships with Ubuntu, htop uses color, allows mouse interaction, and shows a more readable view of CPU cores and memory.
Install it if not present:
bashsudo apt install htop -y
Run:
bashhtop
Reading the htop display:
The top section shows CPU bars (one per core) and memory/swap bars. If one or two CPU bars are consistently maxed out while others are idle, you likely have a single-threaded process creating a bottleneck — a database query, a stuck cron job, or a runaway PHP process.
The process list below sorts by CPU usage by default. Press M to sort by memory instead. Press F9 to send a signal to a selected process. Press q to exit.
Key things to look for:
- A single process using >80% of one CPU core persistently — investigate that process
- Swap usage above zero — your server is memory-constrained and using slow disk as RAM
- Total RAM usage above 90% — at risk of OOM (Out of Memory) killer terminating processes
- Many zombie processes (
Zstatus) — usually a sign of a poorly managed parent process
Tool 2: journalctl — reading service logs
journalctl reads logs from systemd's journal — the central log store for every service managed by systemd. This includes Nginx, PHP-FPM, MySQL, your application services, and system events.
View the last 100 lines from a specific service:
bashsudo journalctl -u nginx -n 100 --no-pager sudo journalctl -u php8.3-fpm -n 100 --no-pager sudo journalctl -u mysql -n 100 --no-pager
Follow a service log in real time (useful during active debugging):
bashsudo journalctl -u nginx -f
View logs from a specific time window:
bash# Last 2 hours sudo journalctl -u nginx --since "2 hours ago" # Specific window sudo journalctl -u nginx --since "2026-04-09 14:00" --until "2026-04-09 15:00"
View all system logs since last boot:
bashsudo journalctl -b
Filter for errors only across all services:
bashsudo journalctl -p err --since "today"
This last command is one of the most useful for routine checks. It shows every error-level log entry from all services today, across a single view. Running this once a day takes two minutes and can surface problems invisible in normal operation.
Tool 3: systemctl — check for failing services
A service that crashes and restarts does not always show obvious symptoms. The site may appear to be working intermittently, or you may see occasional errors that are hard to reproduce. Checking for failed services takes seconds:
bashsystemctl --failed
If this shows any services, investigate immediately:
bashsudo systemctl status service-name --no-pager
The status output shows the last few log lines and the exit code, which usually tells you why the service is failing.
For services that are restarting frequently, check the restart count:
bashsudo journalctl -u service-name --since "today" | grep -i "start\|stop\|fail"
A service that has restarted 40 times today is a problem worth understanding, even if it seems functional at the moment.
Tool 4: disk and inode checks
Check disk space:
bashdf -h
Low disk space is one of the most common causes of sudden failures — web servers stop accepting uploads, databases stop writing, logs stop recording. The default is to notice only when it hits 100%.
A better approach: check at 80% and act. Set up an alert (covered below) rather than checking manually.
Check inode usage:
bashdf -i
This is the check most people skip until a mysterious failure occurs. Inodes are filesystem metadata entries — each file consumes one. You can run out of inodes while still having gigabytes of free space, which causes errors like "no space left on device" even though df -h looks fine.
High inode usage is common on mail servers (many small files), cache directories, or servers that generate large numbers of temporary files. If your inode usage is above 80%, audit which directories have the most files:
bashfind / -xdev -printf '%h\n' 2>/dev/null | sort | uniq -c | sort -rn | head -20
Tool 5: a simple health check script
Manual checks help you investigate. Automated checks protect you while you are not watching.
Create a lightweight health check script:
bashnano ~/health-check.sh
bash#!/usr/bin/env bash set -e HOSTNAME=$(hostname) DATE=$(date '+%Y-%m-%d %H:%M') CPU_IDLE=$(top -bn1 | awk '/^%Cpu/ {print $8}') MEM_USED=$(free | awk '/Mem:/ {printf("%.0f"), $3/$2*100}') DISK_USED=$(df / | awk 'NR==2 {gsub("%","",$5); print $5}') SWAP_USED=$(free | awk '/Swap:/ {if($2>0) printf("%.0f"), $3/$2*100; else print "0"}') echo "[$DATE] $HOSTNAME | CPU idle: ${CPU_IDLE}% | RAM: ${MEM_USED}% | Disk: ${DISK_USED}% | Swap: ${SWAP_USED}%" # Alert thresholds if [ "$MEM_USED" -gt 90 ]; then echo "WARNING: RAM usage at ${MEM_USED}%" fi if [ "$DISK_USED" -gt 80 ]; then echo "WARNING: Disk usage at ${DISK_USED}%" fi if [ "$SWAP_USED" -gt 10 ]; then echo "WARNING: Swap in use at ${SWAP_USED}% — RAM may be insufficient" fi
Make executable:
bashchmod +x ~/health-check.sh
Schedule it to run hourly and log results:
bashcrontab -e
Add:
cron0 * * * * /home/ubuntu/health-check.sh >> /var/log/health-check.log 2>&1
Review the log weekly:
bashtail -50 /var/log/health-check.log
Trend lines in this log are more useful than any single reading. CPU idle dropping from 70% to 40% over two weeks is a signal worth acting on before it reaches 10%.
Adding external uptime monitoring
The health script monitors resources. External uptime monitoring checks whether your server is actually reachable from the outside — a different failure mode.
UptimeRobot offers free monitoring that checks your site every 5 minutes from multiple locations and sends email or SMS alerts immediately on failure. It takes 5 minutes to set up and gives you a 90-day history of uptime and response time.
For more granular monitoring — per-minute checks, multi-location testing, Core Web Vitals tracking — paid services like Better Uptime or Pingdom are worth the cost for production systems that directly affect revenue.
A practical monitoring routine
Once your tools are in place:
- Daily:
journalctl -p err --since "today"— 2 minutes, catches silent errors - Weekly: Review
health-check.logfor trends; runsystemctl --failed - Monthly: Check disk growth rate; review PHP-FPM and Nginx error logs for patterns
This routine takes under 30 minutes per month and gives you the observability to catch most problems before they escalate.
Final recommendation
Good monitoring is not about the most sophisticated tooling — it is about the consistent habit of looking at the right signals. The tools in this guide are already on your server. The barrier is just building the routine of using them.
Start with the health check script and external uptime monitoring. Add journalctl reviews when something behaves unexpectedly. Graduate to a full monitoring platform when your traffic and revenue justify the additional tooling investment.











Discussion
Have a question or tip about this topic? Share it below — your comment will appear after review.