Proxmox security checklist — VE web UI, cluster comms, and storage hardening
Proxmox VE is the hypervisor sitting under most serious homelabs. That position is the whole problem. Compromise a single web app and you lose the app; compromise the PVE host and you lose every VM, every container, every dataset, and every backup that was reachable from that node. This checklist is the twelve things to get right on a Proxmox box before the rest of the fleet starts depending on it being trustworthy.
The shape of the audit borrows from the pre-exposure hardening checklist and the 30-minute homelab security baseline, but Proxmox has its own quirks: the web UI on port 8006, two parallel auth realms, corosync, privileged versus unprivileged LXC, and a kernel that ships from Proxmox itself rather than upstream Debian. Each step below is a single audit-style check with the exact command and the fix line next to it.
Step 1 — Treat the PVE host as your fleet's blast radius
The single mental shift that fixes most Proxmox misconfiguration
is this: root on PVE means root on every guest.
The hypervisor sees raw block storage, raw memory, raw network
taps. A privileged container escape, a qemu CVE, or a stolen
root@pam session gives an attacker the kind of
position that no in-guest hardening can recover from.
Which means a Proxmox box does not get the same threat model as a single Linux server. It is closer in spirit to a small domain controller. Don't run other workloads on it. Don't ssh into it casually. Don't expose its web UI. Don't accept "convenient" deviations on it that you would accept on a throwaway guest. The blast radius justifies the paranoia.
Everything below follows from that — the UI lockdown, the realm split, the corosync isolation, the backup posture. If a step feels excessive, recheck against the blast radius. It usually isn't.
Step 2 — Keep the web UI on port 8006 off the public internet
The PVE web UI listens on https://<host>:8006/
by default. Treat that port the same way you would treat a
Kubernetes API server: never reachable from the public internet,
no exceptions. The same logic from
exposed
admin surfaces applies, only worse — a Pi-hole panel
compromise costs you DNS; a Proxmox panel compromise costs you
everything underneath it.
Confirm what's listening:
# On the PVE host:
ss -tlnp | grep -E '8006|3128|5900|22'
# From off-network:
nmap -Pn -p 8006,22,3128,5900-5999 your.pve.public.ip
If port 8006 is reachable from outside your LAN, fix it now. Two patterns work well:
- LAN-only + VPN. Bind the UI to a management interface, firewall everything else, and reach it over Tailscale, WireGuard, or a Cloudflare Tunnel private app. This is the right answer for a homelab.
- Reverse proxy + auth proxy. Front the UI with Caddy or Nginx terminating a real Let's Encrypt cert, then layer Authelia / Authentik in front. The proxy enforces SSO and 2FA before any request reaches pveproxy.
For the certificate itself, use the built-in ACME integration rather than the self-signed default — the self-signed cert conditions you and every cluster member to click through TLS warnings, which is exactly the wrong habit:
# Register an ACME account (Let's Encrypt):
pvenode acme account register default you@example.com
# Configure the domain on this node (DNS or HTTP-01):
pvenode config set --acme domains=pve.lan.example.com
# Order the certificate:
pvenode acme cert order
Renewals are automatic from then on. Schedule a certificate expiry alert at 30 days out anyway — ACME can fail quietly when DNS or rate limits get in the way.
Step 3 — Enable two-factor on every privileged user
Proxmox supports TOTP and WebAuthn natively in the realm config.
There is no reason not to use it. A leaked
root@pam password without 2FA is full hypervisor
access; the same password with WebAuthn is a phishing
annoyance and nothing more.
In the UI: Datacenter → Permissions → Two Factor.
Enable TOTP on root@pam first, then on every
account holding the Administrator role. WebAuthn
with a hardware key is the higher tier — use it on the
account you'd be most upset to lose.
For the CLI side, audit which accounts even exist:
cat /etc/pve/user.cfg
# user:<username>@<realm>:<enable>:<expire>:<firstname>:<lastname>:<email>:<comment>:<keys>:
# List configured TFA:
pveum user list
Anything in user.cfg you don't recognise is a
finding. Disable, then delete. The "I'll keep it around in
case" instinct is the one that bites you nine months later.
Step 4 — Audit the PAM realm versus the PVE realm
Proxmox has two parallel auth realms and the difference matters.
root@pam is the Linux root user — same
shell, same uid 0, full host access. root@pve (or
any @pve account) lives entirely inside Proxmox's
own user database and can only do what PVE permissions allow.
For day-to-day operations, do not use root@pam.
Create a per-admin account in the PVE realm with the
Administrator role, enable 2FA on it, and use
that. Keep root@pam for the rare cases where
Linux-level access is genuinely required — and ssh in for
those rather than logging in through the web UI.
# Create a PVE-realm admin (interactive prompts for password):
pveum useradd admin@pve -comment "primary admin"
pveum aclmod / -user admin@pve -role Administrator
# Disable web-UI login for root if you want belt-and-braces:
# (then keep ssh access via key auth as the only path)
Then walk /etc/pve/user.cfg and
/etc/pve/priv/acl.cfg and confirm every grant
maps to a person you still trust at a role you still want them
to hold. ACL drift on a long-running cluster is its own
species of bug.
Step 5 — Separate and encrypt cluster communication
A multi-node Proxmox cluster gossips state via corosync, and a flat single-network deployment is the default. That works, but it puts cluster heartbeat, replication and management on the same wire as your guest traffic. Two failure modes follow: a noisy guest can starve corosync into split-brain, and a compromised guest on the same VLAN gets a clearer view of cluster traffic than it should.
The fix is a dedicated cluster_network: a
separate L2 segment, ideally physical, for corosync only.
Configure it at pvecm create time on a new
cluster, or migrate via /etc/pve/corosync.conf
and a careful rolling update on an existing one.
# Cluster status (run on any node):
pvecm status
# Inspect the active corosync config:
cat /etc/pve/corosync.conf
# Look for: ring0_addr on the dedicated subnet,
# transport: knet, and crypto_cipher / crypto_hash
# both set (knet defaults handle this since corosync 3).
Confirm the cluster manager is using TLS for inter-node API
traffic by checking that pveproxy is serving the
ACME-issued cert from Step 2 on every node, not a stale
self-signed one. pvecm status should also show
every expected node as Quorate with no
(stale) markers.
Step 6 — Lock down storage permissions
Proxmox happily attaches NFS, iSCSI, CIFS, ZFS, Ceph, GlusterFS and plain directory storage. Every one of those has its own auth model, and the defaults are not all sensible. The checklist:
- NFS: never
no_root_squash, neverinsecure, and never an anonymous export. Bind exports to the cluster subnet only and use NFSv4 with Kerberos (sec=krb5p) where possible. - iSCSI: CHAP authentication on every target. Default unauthenticated targets are the textbook finding.
- CIFS/SMB: SMB3 minimum, signing required, per-share user accounts, no guest access.
- ZFS: dataset-level
aclmode=passthroughis convenient and bites you. Set explicit per-dataset permissions. Encrypt sensitive datasets withzfs create -o encryption=aes-256-gcmand protect the loaded key in a Proxmox key file with restricted mode 0400. - Ceph: dedicated cephx user per pool, never
client.adminfor day-to-day pools. Restrict the user's caps to exactly the pools they need.
Spot-check listening storage ports the same way you'd check any other admin surface:
# NFS, iSCSI, Ceph, GlusterFS — should be on the storage VLAN only:
ss -tlnp | grep -E '111|2049|3260|6789|24007'
# Confirm no anonymous NFS exports:
showmount -e localhost
exportfs -v
Step 7 — Prefer unprivileged LXC and isolated VMs
Privileged LXC containers share uid 0 with the host. The kernel is the only thing between a process inside the container and the rest of the cluster. Unprivileged containers map uid 0 inside the container to a high host uid (commonly 100000+), so even a full container compromise leaves the attacker as a low-privilege host user.
Default every new container to unprivileged. The UI checkbox is on by default in recent PVE versions; verify it.
# Create an unprivileged container explicitly:
pct create 200 local:vztmpl/debian-12-standard_*.tar.zst \
--unprivileged 1 \
--features nesting=0 \
--hostname app01 \
--net0 name=eth0,bridge=vmbr0,ip=dhcp \
--storage local-lvm
# Check existing containers:
pct list
for id in $(pct list | awk 'NR>1 {print $1}'); do
echo -n "CT $id unprivileged="; pct config $id | grep -c '^unprivileged: 1'
done
Keep AppArmor enabled on the host
(aa-status) and leave the default
lxc-container-default-cgns profile in place.
Disable features: nesting=1 unless you genuinely
need to run Docker inside the container — nested cgroups are
the kind of surface where escapes like
CVE-2024-21626 (runc working-directory
breakout) bite hardest.
For VMs, prefer Q35 + OVMF + secure boot where the guest OS supports it, and disable any qemu device you don't use. Each attached device is one more chunk of qemu code carrying its own CVE history. The patching gap on self-hosted services applies just as hard to qemu as it does to anything inside a guest.
Step 8 — Run encrypted, off-cluster backups and test restore
Proxmox Backup Server (PBS) is the right answer here, not vzdump-to-local-disk. Three reasons:
- Deduplicated and incremental. Reasonable nightly cadence stays cheap.
- Client-side encryption. The PBS datastore never sees plaintext blocks. A stolen PBS disk is opaque.
- Verification jobs. PBS re-reads stored chunks on a schedule and flags any that don't checksum. Detect bit-rot before the restore drill.
The deployment that survives a serious incident:
- PBS on a separate physical box (or a remote site).
- Client-side encryption with the key kept off the PVE cluster — a paper copy in a safe is unironically a sound idea.
- A prune policy that keeps daily for 14 days, weekly for 8 weeks, monthly for 12 months, yearly for 5 years. Tune to your storage budget.
- A garbage-collection job after every prune.
- A verification job weekly.
- A documented restore drill at least quarterly. Restore one VM and one CT to a scratch node and confirm the guest boots. Untested backups are not backups.
# On the PVE node, show backup jobs:
cat /etc/pve/jobs.cfg
# On PBS, list datastores and verify schedule:
proxmox-backup-manager datastore list
proxmox-backup-manager verify-job list
Step 9 — Stay on a security-tracked apt cadence
The pve-enterprise repo requires a subscription
and gets the most-tested packages first. The
pve-no-subscription repo is free and lags by a
small margin — typically days, occasionally a week or two on
a kernel. Both are signed by Proxmox; both ship the same
packages eventually.
The risk is not the choice of repo. The risk is forgetting
to update at all. Configure exactly one repo, disable the
others, and run apt on a weekly cadence
minimum:
# Confirm exactly one PVE repo is active:
grep -r '^deb' /etc/apt/sources.list /etc/apt/sources.list.d/
# Weekly maintenance:
apt update
apt list --upgradable
apt full-upgrade
pveversion -v # confirm proxmox-ve, pve-kernel-* are at expected
# Reboot if pve-kernel updated:
systemctl reboot
unattended-upgrades works on Proxmox the same
way it does on Debian, but be deliberate about it on a
hypervisor — a surprise kernel reboot on a node carrying live
VMs is a worse outcome than a one-week patching delay.
Configure it to install security updates only and to
not auto-reboot; do the reboot in a
maintenance window.
This is the same patching gap argument from continuous CVE scanning vs patching — the goal is not to patch faster, it's to know the gap exists.
Step 10 — Track upstream CVE hygiene for qemu, lxc and the kernel
Proxmox's exposure to upstream CVEs is broader than a typical Debian box. You inherit kernel CVEs, qemu CVEs, lxc CVEs, runc CVEs (for nested container workloads), corosync CVEs, and the Perl stack the management interface runs on. Two canonical examples worth keeping in mind:
- CVE-2024-21626 — runc working-directory breakout. Affects any host running runc-based containers, including LXC with nesting enabled and any Docker-in-LXC deployment. Patched in runc 1.1.12.
- CVE-2022-0185 — Linux kernel filesystem context heap overflow. Local root via unprivileged user namespaces. The kind of upstream-kernel CVE that ships in a long-running PVE box without anyone noticing until a feed flags it.
The audit command on PVE is pveversion -v after
every upgrade. Save the output. Diff next month. Anything
that didn't change for a kernel release cycle is suspect:
pveversion -v
# proxmox-ve: 8.x.x (running kernel: 6.x.x-pve)
# pve-manager: 8.x.x
# pve-kernel-6.x: 8.x.x
# qemu-server: 8.x.x
# libpve-storage-perl: 8.x.x
# ...
# Cross-reference the kernel and qemu versions against your
# distro CVE tracker (the relevant Debian tracker for Proxmox):
For per-distro live trackers covering the Debian base that Proxmox sits on, the Debian 12 CVE tracker is the right cross-reference. It rebuilds daily and shows the exact source-package fix version per release.
Step 11 — Re-audit on a quarterly cadence
Proxmox state changes constantly. New VMs get added, old ones
get deleted but their snapshots linger. Storage gets attached
for "just one project" and stays for two years. A colleague
gets added to /etc/pve/user.cfg for a migration
and never gets pruned. The firewall picks up a rule for a
service that was decommissioned six months ago.
One-and-done audits don't catch any of this. The realistic cadence is quarterly: every three months, walk the whole checklist again with fresh eyes. The monthly homelab checklist is the lighter touch — the Proxmox pass is the heavier one.
A reasonable quarterly pass is:
- External port scan of the PVE host and any cluster members. Anything new is a finding.
cat /etc/pve/user.cfgand/etc/pve/priv/acl.cfg. Any unfamiliar account is a finding.pvecm status. Any non-quorate or stale node is a finding.pveversion -vcompared to last quarter.- Storage attachments — drop the ones for projects that no longer exist.
- A restore drill from PBS to scratch hardware.
- Walk every LXC container's
unprivileged: 1flag.
Common mistakes
- Exposing the web UI through a Cloudflare Tunnel without an auth proxy. The tunnel terminates TLS and routes — it does not authenticate. Add Authelia, Authentik, or Cloudflare Access on top.
- Using root@pam for daily work. Every
click in the UI as
root@pamis a click that loses if your laptop is compromised. Per-user PVE-realm admin accounts with 2FA. - Disabling AppArmor "because it broke a container once". The container is wrong; the profile is doing its job. Carve a narrower exception or switch to a VM.
- Storing PBS backups on the same hardware as the cluster. A power-supply fire takes both. PBS goes on a separate box or off-site.
- Trusting
vzdumpsnapshots as backups. They live on the same storage as the guest. Power failure mid-write, ZFS pool corruption, or a ransomware-style attack against the host takes them with it. - Skipping the restore drill. Until you boot a restored guest end-to-end, you have a folder of binary blobs, not a backup.
FAQ
Does any of this change if I'm running a single-node Proxmox box, not a cluster?
Steps 5 (cluster comms) and parts of Step 4 (PVE-realm admin accounts for delegation) get smaller, but everything else applies identically. The blast-radius argument doesn't depend on cluster membership — a single PVE node still sits below every guest it runs.
Is the no-subscription repo really fine?
Yes, with one caveat: it lags the enterprise repo on security backports by a small amount (typically days). For a homelab, that's acceptable. For a paying-customer-facing production cluster, pay for the subscription — partly for the patch cadence, partly for the support channel.
Should I run Docker directly on the PVE host?
No. Run Docker inside an unprivileged LXC, or inside a VM, or ideally inside a VM running a dedicated container host like Talos or Fedora CoreOS. Mixing Docker's iptables rules, bridge networks, and storage drivers with PVE's is a rich source of drift and CVE exposure.
What's the single most overlooked item?
Step 8's restore drill. Everyone configures PBS; very few people actually restore from it before the day they need to. Treat the restore drill as part of the backup, not as a separate exercise.
Where Noxen fits in
Noxen detects Proxmox VE among the ~70 admin surfaces it
catalogues, flags an exposed port 8006 the moment it
appears, audits the SSH configuration on the underlying
Debian host, and matches the installed package set against
a signed daily CVE feed — kernel, qemu, lxc, runc included.
Scheduled scans catch the drift between quarterly audits,
so a stale account in user.cfg or a regression
to PasswordAuthentication yes shows up the
morning after, not six months later.
It runs from your Mac over SSH, with no agent on the PVE host. The same audit you'd otherwise run by hand from this checklist, every night, with diffs.
Try Noxen — $79 one-time, agentless, diff-from-yesterday reports for your Proxmox cluster and the Linux fleet underneath it.
Scan your Linux fleet from your Mac
Noxen runs nightly agentless audits over SSH and shows only what changed since the last scan — new CVEs, config drift, newly exposed admin services. Mac-native control plane, no SaaS round-trip.