Skip to content

protect binfmt_misc from cross-distro wipe at shutdown#40621

Merged
benhillis merged 1 commit into
masterfrom
user/benhill/binfmt_protect
May 27, 2026
Merged

protect binfmt_misc from cross-distro wipe at shutdown#40621
benhillis merged 1 commit into
masterfrom
user/benhill/binfmt_protect

Conversation

@benhillis

@benhillis benhillis commented May 21, 2026

Copy link
Copy Markdown
Member

Summary of the Pull Request

Fixes the cross-distro binfmt_misc wipe at systemd shutdown. When any systemd-enabled WSL2 distro terminates, systemd-shutdown's disable_binfmt() writes -1 to /proc/sys/fs/binfmt_misc/status, which clears the entire kernel-global registry. Because WSL distros share that registry, every other running distro loses WSLInterop and Windows interop breaks with:

/bin/bash: line 1: /mnt/c/Windows/system32/cmd.exe: cannot execute binary file: Exec format error

This change bind-mounts a read-only file over /proc/sys/fs/binfmt_misc/status in each per-distro mount namespace so the wipe write fails with EROFS and systemd-shutdown continues normally. Per-entry registration/unregistration is unaffected.

PR Checklist

Detailed Description of the Pull Request / Additional comments

Root cause. binfmt_misc is a single kernel-global registry shared across the WSL VM (WSL distros do not isolate it via a user namespace). systemd-shutdown calls disable_binfmt() during clean shutdown which writes -1 to /proc/sys/fs/binfmt_misc/status; that one write clears every entry in the registry — including WSLInterop — for every running distro.

Fix. Each per-distro init bind-mounts /run/wsl/binfmt-status-lock (a regular file containing enabled\n) over /proc/sys/fs/binfmt_misc/status, then remounts the bind-mount read-only. The mount lives in the per-distro mount namespace and is inherited by systemd. When systemd-shutdown later runs, the write to /status fails with EROFS; systemd''s binfmt_mounted_and_writable() helper deliberately tolerates this case, so systemd-shutdown logs a warning and continues normally. Reads of /status still return enabled\n, so callers that probe for binfmt_misc availability keep working. Per-entry unregister (echo -1 > /proc/sys/fs/binfmt_misc/<name>) and runtime registration (echo ... > /proc/sys/fs/binfmt_misc/register) target different files and are unaffected.

If the read-only remount fails, the writable bind-mount is detached so we don''t leave a writable shadow over the real /status.

The existing [boot] protectBinfmt wsl.conf key (default true) now controls the bind-mount and remains as a kill switch for users who want to manage binfmt_misc themselves.

WSLInterop is also re-registered from mini_init with the F (fix-binary) flag so the kernel opens the interpreter at registration time and the entry remains valid across mount namespaces.

What this does not change. A distro can still override its own WSLInterop entry locally (e.g., via /usr/lib/binfmt.d/dummy.conf). The fix only prevents one distro from wiping the registry for everyone else.

Validation Steps Performed

Built locally on Windows and deployed init + initrd.img to C:\Program Files\WSL\tools\. Ran each test individually with the deployed bits:

  • BinfmtStatusIsLocked — passes. Verifies /status is a mountpoint, echo -1 > /status fails with EROFS, WSLInterop survives the wipe attempt, /register and per-entry unregister still work, interop still functions, and the protectBinfmt=false kill switch removes the bind-mount.
  • BinfmtSurvivesDistroTermination — passes. Imports a systemd-enabled peer distro, terminates it (triggering systemd shutdown), asserts the primary distro''s cmd.exe interop still works and the WSLInterop entry retains the F flag.
  • Interop, Systemd* (System, User, Disabled, NoClearTmpUnit, KillInitTerminatesDistro), InitReadonly, InitPermissions, WslConfWarnings — all pass.
Copilot AI review requested due to automatic review settings May 21, 2026 18:58

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a WSL2/systemd shutdown regression where systemd-shutdown can clear the kernel-global binfmt_misc registry (by writing -1 to /proc/sys/fs/binfmt_misc/status), breaking interop in other concurrently running distros. The fix hardens each distro’s mount namespace by bind-mounting a read-only lock file over /proc/sys/fs/binfmt_misc/status, and updates WSLInterop registration to use the F (fix-binary) flag, with new unit tests covering both the mechanism and an end-to-end cross-distro scenario.

Changes:

  • Add LockBinfmtStatusReadOnly() to bind-mount a read-only file over /proc/sys/fs/binfmt_misc/status (per distro mount namespace) to block registry wipes at shutdown.
  • Register WSLInterop with the F flag in VM paths to keep the interpreter resolved across mount namespaces.
  • Add/replace Windows unit tests validating the lock behavior and regression coverage across distro termination.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.

File Description
test/windows/UnitTests.cpp Replaces prior systemd/binfmt coverage with two tests: mechanism validation and cross-distro regression scenario.
src/linux/init/main.cpp Switches WSLInterop registration string to a VM-specific macro that includes the F flag.
src/linux/init/init.cpp Removes prior systemd service override generation and introduces LockBinfmtStatusReadOnly() invoked during systemd boot.
src/linux/init/binfmt.h Adds a VM-specific interop registration macro using F (fix-binary) flag and documents flag behavior.
Comment thread src/linux/init/init.cpp Outdated
Comment thread src/linux/init/init.cpp Outdated
Comment thread test/windows/UnitTests.cpp Outdated
Comment thread test/windows/UnitTests.cpp Outdated
@benhillis benhillis force-pushed the user/benhill/binfmt_protect branch 2 times, most recently from f70720d to 11d86eb Compare May 21, 2026 19:54
Copilot AI review requested due to automatic review settings May 21, 2026 19:54
@benhillis benhillis changed the title fix(init): protect binfmt_misc from cross-distro wipe at shutdown May 21, 2026

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

Comment thread src/linux/init/init.cpp
Comment thread src/linux/init/init.cpp
Comment thread test/windows/UnitTests.cpp
@benhillis benhillis force-pushed the user/benhill/binfmt_protect branch from 11d86eb to d24c5fa Compare May 21, 2026 20:20
Windows interop in every running WSL2 distro silently breaks whenever a
sibling systemd-enabled distro shuts down, surfacing to users as:

    /bin/bash: line 1: /mnt/c/Windows/system32/cmd.exe:
    cannot execute binary file: Exec format error

Root cause: `systemd-shutdown` calls `disable_binfmt()` during clean
shutdown, which writes `-1` to `/proc/sys/fs/binfmt_misc/status`.
binfmt_misc is a single kernel-global registry shared across the WSL VM
(distros do not isolate it via a user namespace), so that one write wipes
every entry -- including WSLInterop -- for every running distro.

Fix: each per-distro init bind-mounts a read-only file over
`/proc/sys/fs/binfmt_misc/status` in its own mount namespace before
exec'ing the distro's init. systemd-shutdown's wipe write then fails with
EROFS; systemd logs a warning and continues normally (its
`binfmt_mounted_and_writable()` helper deliberately tolerates this
case). Per-entry unregister (`echo -1 > .../<name>`) and runtime
registration (`echo ... > .../register`) target different files and are
unaffected, so callers retain full control over their own binfmt entries.

`LockBinfmtStatusReadOnly` is idempotent: it bails early if binfmt_misc
isn't mounted, no-ops if `/status` already resolves to our lock file,
and recovers from a stale foreign mount via `umount2(MNT_DETACH)`
followed by a retry. The existing `[boot] protectBinfmt` wsl.conf key
(default true) now controls the bind-mount and acts as a kill switch for
users who want to manage binfmt_misc themselves.

WSLInterop is also re-registered from mini_init with the `F`
(fix-binary) flag so the interpreter is opened at registration time and
remains valid across mount namespaces.

Tests:
  * `BinfmtStatusIsLocked` -- mechanism test: `/status` is its own
    mountpoint, writes fail with EROFS, WSLInterop survives the wipe
    attempt, /register and per-entry unregister still work, and the
    `protectBinfmt=false` kill switch removes the bind-mount.
  * `BinfmtSurvivesDistroTermination` -- end-to-end regression test:
    imports a systemd-enabled peer distro, terminates it, and asserts
    that the primary distro's Windows interop still works.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 23, 2026 00:01
@benhillis benhillis force-pushed the user/benhill/binfmt_protect branch from d24c5fa to 26c5835 Compare May 23, 2026 00:01

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated no new comments.

@benhillis benhillis marked this pull request as ready for review May 23, 2026 01:54
@benhillis benhillis requested a review from a team as a code owner May 23, 2026 01:54

@OneBlue OneBlue left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. This is the 3rd iteration of us trying to resolve this issue. Hopefully at some point we'll have kernel support to namespace the binfmt_misc interpreters, but until then I think the best we'll be able to do.

One small caveat of this approach would be that if a distro has an explicit systemd mount over binfmt_misc, it would override the read-only mount, but AFAIK distros don't do that

@benhillis benhillis merged commit f67086e into master May 27, 2026
12 checks passed
@benhillis benhillis deleted the user/benhill/binfmt_protect branch May 27, 2026 22:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

3 participants