TLS certificate expiry on self-hosted services
The single most common "my self-hosted thing broke" cause isn't a software bug. It's a TLS certificate that expired because the auto-renewal cron quietly stopped running three months ago. This post is about making that stop happening.
Why this keeps going wrong
- Let's Encrypt certs are 90 days. That's short by design, to keep you renewing. It also means every configuration mistake shows up ~85 days later, long after you'd remember what you changed.
- Renewal crons are silent on failure.
certbot renewwrites to a log nobody reads unless the cert is already dead. - Reverse proxies cache the old cert. Caddy and Traefik usually get this right. Raw nginx + certbot doesn't always reload on renewal; you end up serving an expired cert even though the on-disk file is fresh.
- DNS-01 renewals depend on API tokens. Cloudflare rotated the required API scopes a year ago; every Cloudflare-DNS-01 setup older than that is silently broken.
The 60-30-14 rule
Alert on three progressively louder thresholds:
- 60 days: info-level notification. Probably fine, but a heads-up if you're going on holiday.
- 30 days: warning. Something's probably wrong with auto-renewal. Investigate within a week.
- 14 days: critical. Get up and fix this today.
Most renewals happen around 30 days remaining (Let's Encrypt's default), so a 30-day alert that doesn't clear within 48 hours means something is actually broken — not just "hasn't triggered yet."
How to actually monitor it
Option A: run openssl from cron
for host in grafana.example.com git.example.com plex.example.com; do
expiry=$(echo | openssl s_client -servername $host -connect $host:443 2>/dev/null \
| openssl x509 -noout -enddate | cut -d= -f2)
days=$(( ($(date -j -f "%b %e %H:%M:%S %Y %Z" "$expiry" +%s) - $(date +%s)) / 86400 ))
if [ $days -lt 30 ]; then
echo "ALERT: $host expires in $days days" | mail -s "TLS expiry" you@example.com
fi
done
Works. Writes itself out of your memory in six months.
Option B: Noxen
Every Noxen scan inspects the TLS certificate on each TLS-capable open port (443, 465, 636, 993, 995, 8443, 9443), parses the full X.509 structure, and emits findings for:
- Expires in < 14 days (HIGH).
- Expires in < 30 days (MEDIUM).
- Weak signature algorithm (SHA-1, MD5).
- RSA key < 2048 bits.
- Self-signed with no SAN.
- Negotiated protocol is TLS 1.0 / 1.1 / SSLv3.
- Negotiated cipher is RC4 / 3DES / CBC-without-GCM.
The diff-from-yesterday view highlights when a cert has renewed — so you don't have to guess whether the renewal cron ran. Absence of a "cert expiry changed" entry after 85 days is itself the alert: "renewal should have happened by now; it didn't."
Longer-term fixes
- Switch to Caddy. Automatic TLS is the default; there's no separate renewal job to forget. For most homelab reverse-proxy use, Caddy removes the whole problem class.
- Use DNS-01 wherever possible. HTTP-01 requires port 80 open, which breaks when your ISP blocks it or your LAN-only service isn't reachable from the internet.
- Pin certbot to DNS-01 with a long-lived API token. For Cloudflare: a scoped token with just DNS Edit + Zone Read on the specific zone.
- Monitor renewals, not just expiry. If you know renewal happens at T-30 days, alert on "cert didn't renew" at T-25, not on "cert expired" at T-0.
The deeper point
Expiring certs are a visible failure mode. Every single one of them was preceded by something silent: a renewal that didn't run, a token that got rotated, a config that didn't reload. The fix is to make the silent thing loud — a week-by-week monitoring loop that notices "hey, this should have changed by now, why hasn't it?"
That's the diff-from-yesterday pattern, applied to TLS. And it generalises: every silent failure is fixable once you can answer, quickly, "what should have changed and didn't?"
Scan your Linux fleet from your Mac
Noxen runs nightly agentless audits over SSH and shows only what changed since the last scan — new CVEs, config drift, newly exposed admin services. Mac-native control plane, no SaaS round-trip.