What "uptime monitoring" actually means
Uptime monitoring answers two different questions, and you need both to run a reliable service:
- Is my server reachable from the outside? (external/black-box monitoring) — does a real client over the internet get a healthy response?
- Is my server healthy on the inside? (internal/white-box monitoring) — CPU, memory, disk, and individual services like Nginx, MySQL, or your app process.
External checks catch outages your users see. Internal checks catch the causes — a disk filling up, a runaway process, swap thrashing — often before they turn into an outage. This guide sets up both on a Linux VPS, plus alerting that reaches you in seconds. Commands assume Ubuntu 22.04/24.04 or Debian on a Skyline Cloud server; adapt package names for AlmaLinux/RHEL.
Step 1 — External uptime checks
The simplest external check is an HTTP request from a machine other than your server. Run this from a second host (or your laptop) to confirm the site answers:
curl -sS -o /dev/null -w "HTTP %{http_code} in %{time_total}s\n" \
https://example.com/
A healthy result looks like HTTP 200 in 0.184s. To monitor continuously without writing code, use a hosted checker such as UptimeRobot or a self-hosted one like Uptime Kuma. Self-hosting keeps your monitoring data in the Kingdom — useful for PDPL and NCA alignment.
Run Uptime Kuma on a separate, small VPS (never the server it watches) using Docker:
docker run -d --restart=always \
-p 3001:3001 \
-v uptime-kuma:/app/data \
--name uptime-kuma louislam/uptime-kuma:1
Open http://<monitor-ip>:3001, create your admin account, then Add New Monitor:
- Monitor Type: HTTP(s)
- URL:
https://example.com/health(use a lightweight health endpoint, not the homepage) - Heartbeat Interval: 60 seconds
- Retries: 2 (avoids alerting on a single blip)
- Accepted Status Codes:
200-299
Always monitor a dedicated /health endpoint that confirms your app and its database are up, not just that the web server returns a page.
Step 2 — Internal health with node_exporter and a check script
For a single server, you do not need a full Prometheus stack. A short script run by cron or a systemd timer covers the essentials. Create /usr/local/bin/health-check.sh:
#!/usr/bin/env bash
set -euo pipefail
THRESH_DISK=90 # percent
THRESH_MEM=90 # percent
WEBHOOK="https://hooks.example.com/your-webhook"
alert() {
curl -fsS -X POST -H 'Content-Type: application/json' \
-d "{\"text\":\"[$(hostname)] $1\"}" "$WEBHOOK" || true
}
# Disk usage on /
disk=$(df --output=pcent / | tail -1 | tr -dc '0-9')
[ "$disk" -ge "$THRESH_DISK" ] && alert "Disk at ${disk}% on /"
# Memory usage
mem=$(free | awk '/Mem:/ {printf "%d", $3/$2*100}')
[ "$mem" -ge "$THRESH_MEM" ] && alert "Memory at ${mem}%"
# Critical service must be active
for svc in nginx mysql; do
systemctl is-active --quiet "$svc" || alert "Service $svc is DOWN"
done
Make it executable and test it:
sudo chmod +x /usr/local/bin/health-check.sh
sudo /usr/local/bin/health-check.sh
Schedule it with a systemd timer, which is more reliable than cron for logging and missed-run handling. Create /etc/systemd/system/health-check.service:
[Unit]
Description=Server health check
[Service]
Type=oneshot
ExecStart=/usr/local/bin/health-check.sh
And /etc/systemd/system/health-check.timer:
[Unit]
Description=Run health check every 2 minutes
[Timer]
[Install]
WantedBy=timers.target
Enable it:
sudo systemctl daemon-reload
sudo systemctl enable --now health-check.timer
systemctl list-timers health-check.timer
For richer metrics and history, install Prometheus node_exporter, which exposes CPU, memory, disk, and network as metrics on port 9100:
sudo apt update && sudo apt install -y prometheus-node-exporter
sudo systemctl enable --now prometheus-node-exporter
curl -s http://localhost:9100/metrics | head
Bind it to localhost or restrict port 9100 in your firewall so the metrics are not public:
sudo ufw allow from <monitor-ip> to any port 9100 proto tcp
Step 3 — Alerting that actually reaches you
An alert is only useful if it arrives fast and through more than one channel. Configure at least two so a single failing provider does not silence you.
| Channel | Best for | Latency | Notes |
|---|---|---|---|
| Audit trail, non-urgent | Seconds–minutes | Use a real SMTP service, not the server itself | |
| Webhook (Slack/Teams) | Team visibility | Seconds | Easy to wire into the script above |
| SMS / push | True emergencies | Seconds | Reserve for "site is down" only |
Send alert email through a proper SMTP relay — never rely on the monitored server's own mail, because if the server is down it cannot warn you. Use business email hosting or any SMTP provider. A minimal msmtp-based email alert:
sudo apt install -y msmtp msmtp-mta
printf 'Subject: ALERT %s\n\n%s\n' "$(hostname)" "Disk high" \
| msmtp -a default you@example.com
In Uptime Kuma, add notifications under Settings → Notifications (Email/SMTP, Slack, Telegram, or a generic webhook) and attach them to each monitor.
Step 4 — Tune thresholds and avoid alert fatigue
Bad alerting is worse than none — people learn to ignore it. Follow these rules:
- Require 2+ failed checks before alerting (the
Retriessetting) to ignore transient blips. - Alert on symptoms users feel, like HTTP 5xx or high latency, not every internal metric.
- Set sane thresholds: 90% disk, 90% memory sustained, latency above your normal p95.
- Send a recovery notification so you know when the issue clears.
- Review alerts monthly and delete or tune anything that cried wolf.
Verify the whole pipeline
Test the alert path end to end before you rely on it. Temporarily lower a threshold or stop a non-critical service:
sudo systemctl stop nginx # triggers the service-down alert
# confirm the alert arrives, then:
sudo systemctl start nginx
If the alert lands in your inbox and chat within a couple of minutes, your monitoring is real. An untested alert pipeline is the same as no monitoring.
Wrapping up
You now have external uptime checks, internal health monitoring via a systemd timer and node_exporter, and multi-channel alerting that you have actually verified. Keep your monitoring host separate from the servers it watches, keep the data in-Kingdom for PDPL and NCA compliance, and revisit thresholds as traffic grows.
Need a reliable, in-Kingdom VPS to host your apps and your monitoring stack — with local Arabic support and transparent pricing? Create your Skyline Cloud account and deploy in minutes.
Comments
0 total · 0 threads