Container breakout in homelab Proxmox — CVE-2024-21626 explained
You scrolled past a forum thread, copy-pasted a one-liner to
pull a community LXC template, and a minute later you had a
working Jellyfin container on your Proxmox box. In a parallel
universe where that template was hostile and your
runc was a few weeks out of date, the container's
first action would have been to write a payload into
/root/.ssh/authorized_keys on the host. This
post is about the bug that made that scenario real for several
weeks in early 2024, why homelab Proxmox installs were
especially exposed, whether unprivileged LXC saves you (mostly
yes), and how to be sure your machines are clean today.
What runc does and why one CVE here affects everything
runc is the OCI-compliant container runtime that
sits underneath almost everything called "containers" on a
Linux box. When Docker starts a container, when containerd
starts a Kubernetes pod, when Podman runs a rootless workload,
when Proxmox boots an LXC, the program that actually allocates
the cgroup, mounts the rootfs, pivots into the new namespace,
and execs the entrypoint is runc (or a runtime
that ships runc's code, like
crun's shared ancestry with runc's earlier
designs). It's the most concentrated point of trust in the
container ecosystem: a bug in runc is a bug in
every container engine that uses it.
That centralisation is the reason a CVE in a 25,000-line Go
program ends up mattering for users who have never heard of
runc. The Docker user pulls an image; somewhere down the
stack, runc applies that image as a chroot, mount namespace,
pid namespace, network namespace, cgroup configuration, and
finally an execve() into the entrypoint. Each
step is a syscall sequence that has to leave no host-side
artefacts visible to the workload. The "Leaky Vessels" cluster
of CVEs is about one of those steps leaving an artefact
behind.
The Leaky Vessels chain (Jan 2024 disclosure)
On 2024-01-31 the runc maintainers shipped 1.1.12, and Snyk
Security Labs (research lead Rory McNamara) disclosed four
CVEs they had grouped under the name "Leaky Vessels":
CVE-2024-21626 in runc, plus
CVE-2024-23651,
CVE-2024-23652,
and CVE-2024-23653
in BuildKit (the engine behind docker build). The
BuildKit bugs need a malicious Dockerfile to fire;
the runc bug needs only a malicious image. The runc one is the
worse of the four for most homelabs because most homelab boxes
pull more images than they build.
The mechanism in CVE-2024-21626 is, at the intuition level,
uncomfortably simple. While runc is setting up the
container it opens an internal file descriptor against a
host-side path (one of the cgroup or rootfs directories it
needs to manipulate before pivoting). That descriptor is
supposed to be closed — or marked close-on-exec — before the
container's first process starts. In 1.1.11 and earlier, under
a specific sequencing of calls, it wasn't. The descriptor
survived the pivot into the container's rootfs and was
inherited by the container's first process.
That alone wouldn't be an escape if the descriptor just sat
idle. The escape uses a property of Linux's
/proc filesystem: an entry under
/proc/self/fd/N resolves through the kernel's
open-file table, not the calling process's chroot. The file it
points to is wherever it was opened from — which, for the
leaked descriptor, is the host filesystem. A process inside
the container can run chdir("/proc/self/fd/N"),
and its current working directory is now somewhere on the
host, outside the chroot. From there, every relative path it
uses (../../etc/shadow,
../../root/.ssh/authorized_keys) lands on the
host.
The OCI image format lets the image author set both the
working directory (WorkingDir in the manifest) and
the entrypoint, which means the entire escape can be encoded
in the image. The host operator never gets a chance to
intervene between "container starts" and "host file written."
Snyk published a proof-of-concept image; we deliberately don't
reproduce the primitive here. The mechanism is what matters
for understanding the threat model.
Why homelab Proxmox users are exposed in a way enterprises usually aren't
Enterprises have an image-provenance story, even if it's a
lazy one. There's a private registry; images come from a CI
pipeline that the security team has at least glanced at;
pulling random-username/jellyfin:latest from
Docker Hub onto a production node would be a meeting. None of
that applies to a homelab Proxmox install.
The homelab pattern, repeated across thousands of
r/homelab / r/selfhosted threads, is roughly: "I want
Jellyfin, I want it now, what's the easiest way?" The
reply links a community LXC template, a TurnKey appliance, a
bash one-liner that pipes curl into sh,
or a forum-hosted tarball. Sometimes the source is
reputable (the Proxmox Helper Scripts project, well-known
Docker Hub maintainers). Often it isn't, and the homelab
operator has neither the time nor the tools to tell the
difference.
In that environment, "what could a hostile image actually do?"
stops being theoretical. Pre-Leaky-Vessels, the answer was
"a lot of things inside the container, but it can't touch
the host." Between January 31 2024 and whenever your
Proxmox box pulled the
pve-container / lxc-pve update that
month, the answer was "plus, optionally, full root on the
host."
The matching pattern with the xz backdoor disclosed two months later is worth noting. Both incidents are supply-chain compromises whose mitigation is "trust your inputs" — but the attacker's vehicle is different. xz / liblzma was a compromise of a transitive build dependency; Leaky Vessels is a compromise of the runtime that executes whatever you pulled. Different layer, same homelab-shape lesson: you can neither audit every image nor every library, so the defence has to be a continuously-refreshed CVE feed plus a habit of patching the runtime layer aggressively.
Does unprivileged LXC save you?
Short answer for this specific CVE: yes, almost completely. Longer answer with caveats: stop reading at "yes" only if every container you've ever started on the box was unprivileged. Otherwise patch.
Unprivileged LXC uses Linux user namespaces to map the
container's root (UID 0 inside) to a high
unprivileged UID on the host (commonly UID 100000 or higher).
A process inside the container that thinks it's writing
/root/.ssh/authorized_keys as UID 0 is actually
operating as UID 100000 on the host, which can't write to
/root at all. The Leaky Vessels primitive still
arrives in the host directory — but the syscall that would
modify a host file gets denied by ordinary Unix permissions
before it does any damage. The blast radius collapses to "the
attacker can read whatever the unprivileged UID can read,"
which is much less than full host root.
Three reasons that's still not a substitute for patching:
- You probably have some privileged containers.
Proxmox defaults to unprivileged in modern releases, but any
container created with Unprivileged container: No
in the GUI, or any Docker container started without
--userns-remap, runs as root mapped to host root. One privileged container is enough to break the argument. - The descriptor still leaks information.
Even when writes fail, the leaked path lets a hostile
workload enumerate host files that the unprivileged UID can
read. Depending on file modes, that can include
/etc/passwd, package manifests, and other material useful for the next stage of an attack. - The next breakout CVE won't respect the same boundary.
Unprivileged LXC blocks this specific primitive; future
runtime CVEs may not be neutralised the same way. The
durable defence is "patch
runcon the same cadence as the kernel."
How to check your homelab is patched
Three checks, in order of speed:
1. Read the runc version directly
runc --version
Anything < 1.1.12 from upstream is vulnerable. Distro
packaging will append a backport suffix
(1.1.5+ds1-1+deb12u1 on Debian 12, for example) —
the suffix is the part that tells you the security backport
landed, even though the upstream version number looks older.
2. Read what Docker / containerd is actually using
docker info | grep -A2 Runtimes
containerd --version
Docker bundles its own runc binary; that's the
one that will execute when you run a container, even if a
different runc is on the host
$PATH. docker info prints the
runtime its daemon is configured against. On most homelab
installs the bundled and host versions match — when they
don't, trust docker info.
3. Read the Proxmox stack
pveversion -v | grep -E 'pve-container|lxc-pve|libcrun|runc'
Proxmox VE 8.x landed the fix as a pve-container
and lxc-pve update in February 2024, distributed
through both pve-enterprise and
pve-no-subscription repositories. If you haven't
run apt-get update && apt-get dist-upgrade
on your PVE node since early 2024, you're vulnerable. The
Proxmox security
checklist walks through the safe upgrade sequence for a
single-node and a cluster setup.
Per-distro fix versions at a glance
- Debian 12 (bookworm): fixed in
runc 1.1.5+ds1-1+deb12u1. - Ubuntu 24.04 and 22.04 LTS: fixed via the USN-6644-1 series.
- Rocky / RHEL 9: fixed via the RHSA-2024:0606 erratum.
- Rocky / RHEL 8: fixed via the RHSA-2024:0607 erratum.
- Proxmox VE 8.x: updated
pve-containerandlxc-pvepackages from February 2024 onward.
The broader "you can never audit every image" reality
Step back from the specific CVE and the homelab container story gets uncomfortable. A typical self-hosted stack pulls from somewhere between five and fifty distinct image sources: LinuxServer.io for media tooling, official Docker Hub images for databases, community LXC templates for whatever the GUI offers, GitHub Container Registry for niche projects, occasionally a tarball from a project's release page. Each one is a separate trust decision, and the homelab operator made almost none of them deliberately.
The honest answer is that you can't audit every image, and pretending otherwise is theatre. What you can do is shrink the exploit-window — the time between an attacker landing a malicious image (or a runtime CVE like this one) and your tooling noticing — by combining three things:
- A short patch cycle on the runtime layer. Auto-apply distro security updates so that runc, containerd, Docker, kernel, and lxc-pve all roll forward without you thinking about it. The patching gap covers why "I'll patch when I have time" isn't a strategy.
- Continuous CVE matching against installed package versions. This is the gap Noxen is built to close — SSH-based package inventory plus distro-aware CVE matching, refreshed daily. Agentless scanning explains why pulling the data over SSH beats installing an agent on every node.
- Pin known-good image sources. Treat
Docker Hub like an FTP site you don't fully trust: pin
digest hashes for the images you actually use
(
image@sha256:…), pull them through a local registry mirror, and review the diff before bumping a digest. Not glamorous, but it's the only thing that meaningfully reduces image-supply-chain blast radius.
Hardening beyond the patch
A patched runc closes this specific bug. The next
one will need a defence-in-depth posture to mitigate. Five
things worth doing while you're already on the box:
- Default to unprivileged LXC. Proxmox VE 8.x makes this the default when creating containers through the GUI; verify each existing container's Options → Features → Unprivileged container setting and migrate any that are still privileged unless you have a documented reason. See Step 7 of the Proxmox checklist.
- Enable user namespaces for Docker too.
Set
"userns-remap": "default"in/etc/docker/daemon.json. Same mechanism as unprivileged LXC, different runtime. - Confirm AppArmor or SELinux is enforcing.
Proxmox ships with AppArmor profiles for LXC; Docker ships
with a default profile (
docker-default) and Podman with SELinux. If you've disabled either to make something work, re-enable it with a narrower override rather than leaving it off. - Stay on cgroup v2. v2's unified hierarchy closed entire classes of escape patterns; modern Proxmox and most current distros default to it. Older boxes still pinned to v1 (cgroupfs hybrid mode) are worth migrating.
- Don't run the container engine on the same box as anything else valuable. If a host's job is "run containers," then a runtime breakout on it costs you the container hosts but not the password manager VM or the backup target. The 30-minute homelab baseline has the broader segmentation argument.
Where Noxen fits in
Noxen scans every Linux host you enrol over SSH, inventories
installed packages — including runc,
containerd, docker-ce, and
Proxmox's pve-container / lxc-pve —
and matches versions against a signed daily CVE feed. A host
running a vulnerable runc shows up as a high-severity finding
with a link to the
CVE-2024-21626 reference
card and the recommended fix version. It won't audit
container images for you — that's a different problem, and an
honest tool draws the line — but it will make sure no
vulnerable runc sits unpatched on a homelab box for weeks at
a time. $79 one-time, Mac-native, no SaaS round-trip; see the
pricing page for what's in the box.
Scan your Linux fleet from your Mac
Noxen runs nightly agentless audits over SSH and shows only what changed since the last scan — new CVEs, config drift, newly exposed admin services. Mac-native control plane, no SaaS round-trip.