Protection opt-out: --allow-degraded / --disable per-protection by dzerik · Pull Request #71 · multikernel/sandlock

dzerik · 2026-05-27T07:37:07Z

Fixes #17.

Implements the opt-out polarity we agreed on in the design ack comment on #17: default behaviour is Strict for every protection, and two new builder methods (allow_degraded(Protection) and disable(Protection)) opt out per-protection. The result is that callers on a v5 kernel (RHEL 9, Ubuntu 22.04, etc.) can write a single line — .disable(Protection::SignalScope).disable(Protection::AbstractUnixSocketScope) — and get the v5-level FS + REFER + truncate + TCP + ioctl-dev sandbox without the two v6 IPC scopes, exactly as you described it in your first comment on this issue.

The hard MIN_ABI = 6 floor in landlock.rs is gone; with the default ProtectionPolicy::strict_all() on a v6 host every protection still resolves to Active, so the pre-refactor floor is preserved exactly. The constant itself stays for downstream backwards-compat (it now expresses "minimum ABI when every protection is in Strict").

Layers

Same RFC-chain shape as #43 / #46 / #54. Commit prefixes mark the boundary:

core (8 commits, 15b09ce..30ad30c): Protection enum and its per-variant min_abi(); ProtectionState (Strict / Degradable / Disabled) and ProtectionPolicy; ProtectionStatus runtime view; Resolved 4-way (Active / Degraded / Disabled / StrictlyUnavailable) at the syscall boundary; Sandbox::protection_policy field defaulting to strict_all(); confine_inner walks Protection::all() and returns ConfinementError::ProtectionUnavailable { protection, required_abi, host_abi } for any strict + unavailable combination; compute_fs_mask / compute_net_mask / compute_scope_mask derive Landlock attrs from the resolution; Sandbox::active_protections() exposes the runtime view; sandlock check learns a per-protection availability table.

ffi (1 commit, 265b3c1): C ABI for Protection, two builder setters with move-semantics, and sandlock_protection_min_abi() introspection. The C header declares the discriminants and the new functions.

python (1 commit, 53af1d1): Protection IntEnum re-exported at the package top level; allow_degraded and disable kwargs on the Sandbox dataclass (last-write-wins to mirror ProtectionPolicy::set); ctypes bindings call through to the C ABI.

cli (2 commits, b443597..2be594b): sandlock check extended with the per-protection availability table; sandlock run learns --allow-degraded <name> and --disable <name> (repeatable; case-insensitive kebab-case).

docs (1 commit, a43c1d6): a "Protection opt-out" section in both docs/extension-handlers.md (Rust) and docs/python-handlers.md (Python), and a one-line README pointer.

maintainer-lens follow-up (3 commits, ceae31c..0d5e5fa, added after a deep code review pass): FFI input validation so an out-of-range discriminant from C or Python is rejected at the boundary instead of triggering UB at a Rust match over a #[repr(C)] enum; canonical-name rename (the previous Protection::AbstractUnixScope was missing the noun Socket and didn't agree with the Python ABSTRACT_UNIX_SOCKET_SCOPE spelling — the four bindings now all use AbstractUnixSocketScope / abstract-unix-socket-scope, with the old CLI spelling kept as an alias); 14 mask-contract tests asserting the actual Landlock attribute bits produced by each compute_*_mask for each (host ABI, ProtectionState) cell, plus a compute_scope_mask precondition docstring and debug_assert!.

ci (1 commit, 8c1d36f): ubuntu-22.04 added to the Rust matrix so the v3 path is exercised on a real kernel on every push; a Report Landlock ABI step prints the host's sandlock check output to each job's log for visibility.

Public API surface added

Trying to keep this minimal per your standing #36 priority. Everything new under sandlock_core:::

Protection (enum, 6 variants — one per kernel ABI floor); Protection::min_abi(); Protection::all()
ProtectionState (enum); ProtectionPolicy (struct + strict_all()/state()/iter(); set() is #[doc(hidden)] pub so the FFI-tests can drive resolution directly)
ProtectionStatus (enum, 4-way runtime view)
Sandbox::active_protections() (runtime accessor)
SandboxBuilder::allow_degraded(Protection) -> Self; SandboxBuilder::disable(Protection) -> Self
Sandbox::protection_policy (public field, mirrors the rest of Sandbox)
ConfinementError::ProtectionUnavailable { protection, required_abi, host_abi } (existing enum variant)
landlock::compute_fs_mask / compute_net_mask (already pub for downstream tests in this repo; compute_scope_mask deliberately stayed pub(crate))

C ABI: sandlock_protection_t enum (6 named discriminants), sandlock_protection_min_abi(uint32_t) -> uint32_t, sandlock_sandbox_builder_allow_degraded, sandlock_sandbox_builder_disable. Setter functions take uint32_t for the discriminant (not the enum type) so an out-of-range value is rejected at the boundary; min_abi(unknown) returns 0 as a sentinel.

Python: Protection IntEnum re-exported; two new kwargs on the Sandbox dataclass.

Three states per protection

State	Capable host	Incapable host	Use case
`Strict` (default)	Active	`ConfinementError::ProtectionUnavailable` at build/run	matches the pre-refactor `MIN_ABI=6` behaviour
`Degradable` (`allow_degraded`)	Active	silently skipped (observable via `active_protections()` and `sandlock check`)	"use the protection where the kernel has it, don't fail the build"
`Disabled` (`disable`)	not enforced even though available	not enforced	"this workload genuinely needs the capability the protection blocks"

Disabled deliberately works on a capable kernel — per your answer to question 3 in the design thread.

Validation

Tests: 301 lib (includes 14 new mask_contract_tests asserting Landlock bits per cell), 18 integration (tests/integration/test_protection.rs covers the policy-state and resolve() mechanics), 10 FFI integration, 12 Python (tests/test_protection.py). The mask-contract tests catch the bug class that the original 18 integration tests miss — i.e. a regression that would mis-compute handled_access_fs or scoped would now fail a test instead of silently degrading the sandbox.

VM matrix (full protocol with reproducer recipe attached out-of-band; this is the relevant table):

Distro	Kernel	`sandlock check` ABI	Default strict	Smallest `--disable` that produces `exit=0`
Rocky 9.6 ← #17 reporter's env	5.14.0-570.17.1.el9_6	v5 honest	fails with `required protection SignalScope is not available: host Landlock ABI is v5, requires v6`	`--disable signal-scope --disable abstract-unix-socket-scope`
Rocky 9.7	5.14.0-611.5.1.el9_7	v6 reported (RHEL backport)	fails inside `landlock_create_ruleset` with `EINVAL` — backport reports v6 but does not provide v5/v6 attrs (see finding F1 below)	`--disable fs-ioctl-dev --disable signal-scope --disable abstract-unix-socket-scope`
Ubuntu 22.04	5.15.0-179-generic	v1 in this multipass image	fails with `required protection FsRefer is not available: host Landlock ABI is v1, requires v2`	full `--disable` of every v2+ protection
Debian 12 bookworm	6.1.0-48-cloud-amd64	v2 (build-clipped)	fails on v3+ requirements	full `--disable` of every v3+ protection
Fedora 41	6.11.4-301.fc41	v5 honest, vanilla	fails on the two v6 scopes	`--disable signal-scope --disable abstract-unix-socket-scope` (also exercised `--allow-degraded` for the same two — same `exit=0`)

Every cell runs a stock git clone of this branch (no local patches; the seccomp fallback fix already merged in #63 is now in main), cargo build --release, cargo test --release --lib -p sandlock-core, the integration tests, and then a sandlock run of /usr/bin/true with the listed --disable flags.

Two findings worth flagging

F1 — RHEL 9.7 reports ABI v6 but the kernel does not provide it. The version returned by landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION) is 6 on 9.7, but the actual ruleset creation with v5/v6 attrs fails with EINVAL. The opt-out covers it (the user --disables the affected protections and the ruleset assembles cleanly), but it does mean the sandlock check ABI line cannot be trusted as a capability statement on backport distros. Per-protection probing at confine_inner is the reliable signal, which is what this PR already does. Not requesting a change — flagged for context.

F2 — Initial pre-rebase implementation of compute_fs_mask only masked off Disabled protections, not Degraded ones. A test against compute_fs_mask(v4, policy_with_degradable_ioctl_dev) would have asserted the IOCTL_DEV bit absent and gotten it present, then landlock_create_ruleset would have failed with EINVAL because v4 doesn't know the bit. Fixed in bf9490d (Disabled | Degraded matched together), pinned by the new fs_mask_degraded_protections_get_masked_off_on_low_abi_host test in 0d5e5fa.

CI

ubuntu-22.04 added to .github/workflows/ci.yml. With the existing matrix [ubuntu-latest, ubuntu-24.04-arm] both runners now report ABI v6 or higher, so the v3/v4 path was unreached by real-kernel CI. A Report Landlock ABI step prints the host ABI to the job log on every runner for visibility — verified on the first fork-internal dispatch:

Runner	Image kernel	`sandlock check` ABI
ubuntu-22.04	`6.8.0-azure`	v4
ubuntu-24.04-arm	`6.17.0-azure`	v6
ubuntu-latest	`6.17.0-azure`	v7

(Ubuntu LTS labels actually ship Azure-specific kernels, so ubuntu-22.04 is not stock 5.15 — see actions/runner-images/images/ubuntu/Ubuntu2204-Readme.md.)

Known coverage gap — Landlock ABI v5 is unreachable on any GitHub-hosted runner today. The hosted Ubuntu images jump from v4 (22.04) to v6+ (24.04), so the FsIoctlDev-only code path (a kernel that has v5 but not v6 — exactly the production fleet shape on Rocky 9.6 / Fedora 41) cannot be exercised against a real landlock_create_ruleset syscall in this CI. The v5 cells are covered by the synthetic-ABI landlock::mask_contract_tests (which run on every runner) and by the out-of-band VM matrix on Rocky 9.6 (kernel 5.14.0-570.x.el9_6) and Fedora 41 (kernel 6.11.4). If you want real-kernel v5 coverage in CI we'd need a self-hosted runner pointed at a v5 box; happy to advise on configuration, but the infrastructure decision is yours.

The integration tests are split per cell: on ubuntu-22.04 only test_protection runs (the policy/resolution-mechanics subset that uses a synthetic ABI and is host-ABI independent). The remaining integration suite runs on ≥v6 runners — those tests fundamentally assume a v6+ host because they construct default Sandbox::builder() whose ProtectionPolicy::strict_all() requires every Protection to resolve to Active. Refactoring them to adapt to whatever the host can provide is a separate task; the v3/v4 path is exercised here through the new landlock::mask_contract_tests (which run on every cell) plus the out-of-band VM matrix above.

workflow_dispatch is added to the triggers so future manual reruns don't need a push commit.

Scope discipline — what is NOT in this PR

Overlayfs / branchfs COW backends — untouched; this PR is entirely Landlock-side.
Any change to syscalls outside the Landlock attribute computation.

Reference

Issue with design ack: Support for landlock ABI v5? #17
Precursor seccomp fallback (merged 2026-05-26): seccomp: fall back to NEW_LISTENER on kernels without WAIT_KILLABLE_RECV #63

congwang-mk

Thanks for the PR! Nice work.

congwang-mk · 2026-05-30T00:29:34Z

 See `crates/sandlock-ffi/tests/c/handler_smoke.c` for the canonical
 end-to-end example.

+## Protection opt-out


This looks unrelated to docs/extension-handlers.md, better to move it to docs/sandbox-reference.md?

Moved to docs/sandbox-reference.md. Also repointed the README anchor that referenced the old location.

congwang-mk · 2026-05-30T00:29:58Z

  was actually dispatched — the supervisor handles cleanup on all
  paths.

+## Protection opt-out


Ditto, maybe move to python/README.md ?

Moved to python/README.md.

congwang-mk · 2026-05-30T00:30:54Z

+    /// state arrives with the public builder API in a later change.
+    /// Deserialized sandboxes get `ProtectionPolicy::default()`, which
+    /// is identical to `strict_all()`.
+    #[serde(skip)]


Are you sure this is safe? I am not sure, worth a double check.

Traced this — you're right to flag it. The #[serde(skip)] interacts with checkpoint, not just the profile path.

Sandbox derives Serialize/Deserialize, and that derive is used by Checkpoint::save/load via bincode (checkpoint.rs:455 serialize, :547 deserialize) — distinct from the TOML profile path, which goes through a separate ProfileInput struct that never sees protection_policy. So policy.dat does not carry protection_policy, and Checkpoint::load(...).policy.protection_policy is always strict_all() regardless of the original sandbox.

Severity today is limited: Checkpoint::load reconstructs the struct but nothing re-confines from .policy automatically (there's no restore-and-reconfine path, and no CLI restore/resume). So today it's silent metadata loss, not a live break. But it's latent: once a restore-and-reconfine path lands — the natural endpoint of checkpoint/restore — it will silently apply strict_all() instead of the original opt-out, breaking restore on exactly the v5 hosts this feature exists to serve.

Root cause is the now-stale doc comment you flagged separately: the skip was justified by "no builder API yet to make policy non-default." This PR adds that builder API, so the justification is gone.

There's a design fork on the fix, and it's your call on the checkpoint contract:

A — store in checkpoint: drop the skip, derive Serialize/Deserialize on Protection/ProtectionState/ProtectionPolicy (plain enums + a HashMap, bincode-clean), so the checkpoint is self-contained. Caveat: this shifts the bincode field layout, so existing policy.dat files become unreadable — acceptable under the pre-1.0 no-backcompat stance, but worth naming.

B — re-supply on restore: keep protection_policy out of the checkpoint entirely; a future restore takes the policy as an argument and validates it against the current host ABI. Cleaner separation (checkpoint = process state, policy = orthogonal, validated fresh), but it changes the eventual restore signature.

I lean A as the least-invasive fix that stops the data loss now, but B may fit your checkpoint model better. Which do you prefer?

It seems A is cleaner and more portable?

Done (option A). Removed the skip; derived Serialize/Deserialize on Protection/ProtectionState/ProtectionPolicy so the policy rides in the checkpoint. Added a destructive test (protection_policy_survives_bincode_round_trip) that fails if the skip is restored. As noted: this shifts the bincode field layout, so pre-existing policy.dat files won't load — the pre-1.0 break we discussed.

congwang-mk · 2026-05-30T00:32:57Z

+        (ProtectionState::Degradable, true) => Resolved::Active,
+        (ProtectionState::Degradable, false) => Resolved::Degraded,
+    }
+}


There are two resolver types modeling the same thing:

landlock::Resolved { Active, Degraded, Disabled, StrictlyUnavailable } via landlock::resolve()

protection::ProtectionStatus { Active, Degraded, Disabled, Unavailable } via
ProtectionStatus::resolve()

The two resolve functions are identical match arms:

match (policy.state(p), available) {
(Disabled, _) => …Disabled,
(Strict, true) => …Active,
(Strict, false) => …{Strictly}Unavailable,
(Degradable, true) => …Active,
(Degradable, false)=> …Degraded,
}

Agreed — the two are the same five-way mapping with one variant renamed (StrictlyUnavailable vs Unavailable). The split was unintentional: ProtectionStatus is the public-facing view (returned by active_protections()), Resolved grew as an internal helper for the mask computation, and they drifted into duplicate resolve() bodies.

Consolidation options:

(a) Drop Resolved entirely; use ProtectionStatus everywhere, including the internal mask path. One enum, one resolve(). Smallest surface — nothing internal-only remains.

(b) Keep both names but have one resolve() and a From conversion, if you want the internal/public type distinction preserved.

I lean (a) — there's no behavioural difference between the two, so a single public ProtectionStatus + single resolve() is the least surface to maintain. Any reason to keep an internal-only Resolved that I'm missing?

Yes, (a) is better

Done (option a). Dropped landlock::Resolved entirely; ProtectionStatus is the single resolver, used by the mask-compute path too. StrictlyUnavailable folded into Unavailable.

congwang-mk · 2026-05-30T00:34:57Z


+    /// Allow the named protection to degrade silently if the host kernel ABI lacks support.
+    /// Repeatable. Accepted values: fs-refer, fs-truncate, net-tcp, fs-ioctl-dev,
+    /// signal-scope, abstract-unix-scope-socket.


Nit: abstract-unix-scope-socket -> abstract-unix-socket-scope.

Fixed — help text now reads abstract-unix-socket-scope.

congwang-mk · 2026-05-30T00:35:35Z

+    /// Per-protection enforcement policy. Default
+    /// (`ProtectionPolicy::strict_all()`) preserves the historical hard
+    /// `MIN_ABI = 6` behaviour. Builder methods to deviate from
+    /// strict-all are added in a follow-up.


Fixed — rewrote the comment. The "added in a follow-up" note was stale (the builder API is in this PR), and the field is now serialized (see the checkpoint thread).

congwang-mk · 2026-05-30T00:39:18Z

    branches: [main]
  pull_request:
    branches: [main]
+  workflow_dispatch:


Is this related?

You're right — out of scope here. Reverted ci.yml to match main; I'll send the ubuntu-22.04 matrix addition as a separate CI-focused PR.

congwang-mk · 2026-05-30T00:43:02Z

+        "net-tcp" => Ok(Protection::NetTcp),
+        "fs-ioctl-dev" => Ok(Protection::FsIoctlDev),
+        "signal-scope" => Ok(Protection::SignalScope),
+        "abstract-unix-socket-scope" | "abstract-unix-scope-socket" => {


Any reason to keep this abstract-unix-scope-socket ?

Removed — no reason to keep it. The old spelling never shipped in a release, so nothing depends on it. parse_protection now accepts only abstract-unix-socket-scope.

The Protection setters took `sandlock_protection_t` and matched on it exhaustively, so a C or Python caller passing an integer outside the known discriminant range (0..=5) produced undefined behaviour at the Rust match — `#[repr(C)]` enums are UB to construct from arbitrary bits. Change the three entry-points (`sandlock_protection_min_abi`, `sandlock_sandbox_builder_allow_degraded`, `sandlock_sandbox_builder_disable`) to accept `u32` and route every incoming value through `try_protection_from_raw`. Unknown values are now handled at the boundary: - `min_abi(unknown)` returns 0 — a sentinel that cannot collide with any real `min_abi()` (those start at 2). - The builder setters return the input pointer untouched, mirroring the null-builder convention already used elsewhere in the C ABI. The Python wrapper adds a stricter guard: an out-of-range int raises `ValueError` at SDK boundary rather than silently no-op'ing through the FFI, because the Python contract should fail loudly on a typed mistake. Update the C header to declare the new signatures (`uint32_t` instead of the enum type) and document the sentinel and no-op behaviour. The `sandlock_protection_t` enum is kept as a labelling type for callers who want the names; passing an enum constant still works because C implicitly promotes to `uint32_t`. Tests: - 3 new FFI regression tests cover the boundary: min_abi sentinel, setter no-op, and "bad call then good call" to catch builder corruption in the bad path. - 4 new Python tests cover ValueError on out-of-range, negative, and well-formed plain-int inputs.

The previous name omitted the noun "Socket" — reading "abstract unix scope" does not parse, and the kernel constant is `LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET` (where SCOPE is the family, not part of the protection's name). The other v6 scope already uses the `Signal` + `Scope` pattern; mirror it. Before this commit the same protection was spelled four different ways across the bindings: | Layer | Old name | |--------|--------------------------------| | Rust | `Protection::AbstractUnixScope` | | C ABI | `AbstractUnixScopeSocket` | | Python | `ABSTRACT_UNIX_SOCKET_SCOPE` | | CLI | `abstract-unix-scope-socket` | After this commit all four agree on the canonical `AbstractUnixSocketScope` / `abstract-unix-socket-scope` form, which also matches the existing `Protection::SignalScope` / `signal-scope` pattern. Updates touch: - `Protection` enum (and every match arm in core / FFI / tests). - C ABI: the discriminant value at index 5 is unchanged (`PROT_ABSTRACT_UNIX_SOCKET_SCOPE` / `SANDLOCK_PROTECTION_ABSTRACT_UNIX_SOCKET_SCOPE` in the header already match this spelling). - CLI parser: the primary string is now `abstract-unix-socket-scope`; the previous `abstract-unix-scope-socket` is kept as an alias so any out-in-the-wild script still parses. Help text and error message updated to the canonical name. - Python re-export: `Protection.ABSTRACT_UNIX_SOCKET_SCOPE` was already canonical; the IntEnum is unchanged. No behaviour change. 287 lib + 18 integration + 10 FFI + 12 Python tests still pass.

…ondition The 18 integration tests in `test_protection.rs` exercise policy-state storage and `resolve()` resolution mechanics — necessary, but they do not verify the *observable* Landlock attrs that exit `confine_inner`. A regression that mis-computes the `handled_access_fs` or `scoped` masks would have left every existing test green while silently degrading the security boundary at the syscall layer. Add 14 unit tests for the three mask helpers (`compute_scope_mask`, `compute_fs_mask`, `compute_net_mask`) that check the actual Landlock bits produced for each (Protection, host_abi, ProtectionState) cell that matters. Tests live alongside the helpers in `landlock.rs` so they can call the `pub(crate)` `compute_scope_mask` without widening the public surface. Coverage: - scope_mask: strict-v6 sets both scope bits; disable(SignalScope) clears only SIGNAL; disable(AbstractUnixSocketScope) clears only ABSTRACT_UNIX_SOCKET; disable both → mask=0; Degradable scopes on a v5 host → mask=0. - fs_mask: strict-v6 includes REFER+TRUNCATE+IOCTL_DEV; each `Disabled` clears exactly one bit; Degraded FsIoctlDev on a v4 host omits the IOCTL_DEV bit (pins the bf9490d fix). - net_mask: handle_net=false → (0, false); strict no-wildcard → (BIND|CONNECT, false); Disabled NetTcp → (0, false); Degradable NetTcp on a v3 host → (0, false). Also document the `compute_scope_mask` precondition explicitly: callers must filter `Resolved::StrictlyUnavailable` upstream (`confine_inner` does, via the `Protection::all()` walk). A `debug_assert!` per scope protection pins the invariant in test builds, so a future caller that forgets the upstream guard fails loudly instead of silently producing a mask=0.

`ubuntu-latest` and `ubuntu-24.04-arm` both run kernel 6.8 — Landlock ABI v4. That leaves the v3 path (FsTruncate as the highest available protection, NetTcp / FsIoctlDev / both v6 scopes unavailable) exercised only by synthetic-ABI unit tests, never by a real landlock_create_ruleset on a v3 kernel. Add `ubuntu-22.04` (kernel 5.15, ABI v3 vanilla) so the v3 path stays covered on every push and PR even as the runner images roll forward. A future regression that mishandles "v3 host: bits above v3 must not be requested" would now fail a real-kernel integration test, not just a unit test against a synthetic ABI value. Also add a `Report Landlock ABI` step that runs `sandlock check` and prints the host's ABI line in the job log. This makes it possible to diagnose a Landlock-version-sensitive regression by glancing at the CI log without re-running the job locally. CI matrix coverage after this commit: - ubuntu-22.04 → kernel 5.15 → ABI v3 (new) - ubuntu-latest → kernel 6.8 → ABI v4 - ubuntu-24.04-arm → kernel 6.8 → ABI v4 (arm64) ABIs v5 and v6 are not yet reachable on GitHub's hosted runners (stock ubuntu-latest is below 6.7 / 6.12); the per-protection availability matrix for v5 and v6 is still covered by the synthetic- ABI unit tests in `landlock::mask_contract_tests` and the out-of-band VM matrix protocol.

Header is now cbindgen-generated (upstream switched in multikernel#87). The manual header edits from the original Protection commits are dropped; the C ABI for the Protection setters is regenerated from the #[no_mangle] Rust definitions instead. CI verifies the committed header matches a fresh generation.

Drop the landlock::Resolved enum and the free landlock::resolve() function; the protection module's ProtectionStatus enum and its resolve() associated function now drive both the internal mask-compute path and the public runtime accessor. One enum, one resolver. The StrictlyUnavailable variant maps onto ProtectionStatus::Unavailable. ProtectionStatus::resolve is promoted to #[doc(hidden)] pub so the synthetic-ABI integration tests can drive it directly, mirroring the access the removed landlock::resolve previously offered.

The Sandbox.protection_policy field was #[serde(skip)], so a checkpoint silently dropped the policy and a restored sandbox reset to strict_all(). On a host where a disable() opt-out was required (e.g. a v5 kernel that cannot provide a v6 scope) this broke restore: confine then failed with ProtectionUnavailable. Derive Serialize/Deserialize on Protection, ProtectionState and ProtectionPolicy and remove the serde(skip) so the policy is stored in the checkpoint. Update the now-stale field doc (the builder API shipped in this PR and the field is serialized). Add a destructive round-trip test asserting a disabled protection survives bincode ser/de instead of resetting to Strict.

The --allow-degraded/--disable help text listed the old abstract-unix-scope-socket spelling; correct it to the canonical abstract-unix-socket-scope (matching LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET). parse_protection accepted abstract-unix-scope-socket as a backward-compat alias. It never shipped in a release, so there is nothing to stay compatible with — remove it and accept only the canonical name.

…ME.md The Protection opt-out documentation belongs with the Sandbox reference, not the extension-handlers guide. Move the Rust section from extension-handlers.md to sandbox-reference.md (adding the per-protection ABI floor table and the checkpoint-persistence note), and the Python section from python-handlers.md to python/README.md. Repoint the top-level README and the Python cross-reference at the new home.

The branch added ubuntu-22.04 to the test matrix, a workflow_dispatch trigger, and a Landlock-ABI report step. That CI-matrix work is out of scope for this PR; restore ci.yml to upstream's version and send the matrix change as a separate follow-up.

dzerik · 2026-06-02T11:01:11Z

Pushed an update addressing the review. Also rebased onto current main — the branch had drifted ~55 commits behind (overlayfs/BranchFS removal, the syscalls-crate adoption, and the cbindgen header generation all landed since the original PR).

fix: repair TCP port-remapping slow path #3 (serde skip): option A — protection_policy now rides in the checkpoint; Serialize/Deserialize derived on the Protection types; destructive round-trip test added. Pre-1.0 policy.dat format break noted.
Support userspace network stack #4 (dual resolver): option a — dropped landlock::Resolved, ProtectionStatus is the single resolver.
Support OCI runtime spec #5 / Uncapped buffer length in sendto_on_behalf + missing null checks in dry-run FFI #8 (CLI naming): help text + parser now canonical abstract-unix-socket-scope; the never-shipped alias is gone.
File descriptor leak in supervisor after SECCOMP_IOCTL_NOTIF_ADDFD #6 (stale comment): rewritten.
Support UDP port remapping #1 / TCP port remapping fails with EADDRINUSE when virtual port is already in use #2 (docs): Protection reference moved to docs/sandbox-reference.md and python/README.md.
feat: HTTP-level ACL for AI agent sandboxing #7 (CI): reverted; the ubuntu-22.04 matrix will come as a separate CI PR.
cbindgen: manual header edits removed; sandlock.h is regenerated from the #[no_mangle] defs and the committed header matches a fresh generation.

Verified: 334 core lib + 19 integration (incl. the new checkpoint test) + 10 FFI + 12 Python, all green. Smoke-tested end-to-end on five real kernels spanning ABI v1/v2/v5/v6 (Ubuntu 22.04, Debian 12, Fedora 41, Rocky 9.6, Rocky 9.7) — --disable produces a working sandbox on each, and the checkpoint round-trip test passes on all of them.

congwang-mk · 2026-06-02T22:19:26Z

Blocking: `disable()` of an FS protection fails with EINVAL when the sandbox has writable paths

Following up on the protection opt-out. disable(Protection::FsRefer) (and FsTruncate, FsIoctlDev) does not just have surprising semantics, it causes confine to hard-fail whenever the sandbox has any writable path, which is nearly every real sandbox. I verified this against the kernel docs, the code path, and a runtime repro on a v7 host.

Root cause

compute_fs_mask removes the disabled bit from handled_access_fs, but the per-path access masks are policy-independent and still request that bit:

landlock.rs:423: let fs_write_mask = write_access(abi); includes REFER (v2+), TRUNCATE (v3+), and IOCTL_DEV (v5+).
landlock.rs:433: add_path_rule(.., fs_write_mask) passes that mask to the kernel for every writable directory.

landlock_add_rule(2) returns EINVAL when "rule_attr->allowed_access is not a subset of the ruleset handled accesses." So with disable(FsRefer) the ruleset no longer handles REFER, yet the writable-path rule still grants REFER, the subset check fails, and confinement aborts.

This mirrors the net path, which the PR handled correctly by gating rule installation on net_tcp_active. The FS path rules did not get the matching treatment, so only NetTcp and the two scope protections behave; the three FS protections do not.

Runtime repro (forked child per case, kernel 6.18, Landlock ABI v7)

[baseline_strict_write_tmp]          restrict_self: Operation not permitted   (reached restrict_self; /tmp rule installed fine)
[disable_fsrefer_write_tmp]          add path rule for "/tmp": Invalid argument (os error 22)   EINVAL
[disable_fstruncate_write_tmp]       add path rule for "/tmp": Invalid argument (os error 22)   EINVAL
[disable_fsioctldev_write_tmp]       add path rule for "/tmp": Invalid argument (os error 22)   EINVAL
[disable_fsrefer_no_writable_paths]  restrict_self: Operation not permitted   (no EINVAL)

(The baseline EPERM at restrict_self is just the bare-fork harness not setting PR_SET_NO_NEW_PRIVS; the real sandbox-launch path does. The point is that the baseline gets past add_path_rule, while the three disable cases die at it with EINVAL, and the no-writable-path case does not.)

Why the existing tests miss it

The mask_contract_tests exercise compute_fs_mask in isolation and never install a path rule, so the handled-vs-rule inconsistency is invisible to them.
The VM matrix only --disabled the two scope protections (plus fs-ioctl-dev on the Rocky 9.7 backport box) and ran /usr/bin/true with no --fs-write, so the writable-path code path was never exercised with a disabled FS bit.

Suggested fix

Intersect the per-path masks with the resolved handled mask, so a rule is a subset by construction:

let fs_write_mask = write_access(abi) & handled_access_fs;
// and where ACCESS_FILE is applied to non-directory paths in add_path_rule:
//   allowed_access = (access & ACCESS_FILE) & handled_access_fs

A regression test should call confine_filesystem (forked) with disable(FsRefer) plus fs_write("/tmp") and assert success, since no current test installs a path rule under a non-default policy.

Separate semantic note on FsRefer

Even after the mask fix, disable(FsRefer) cannot do what the disable() rustdoc promises ("workloads that legitimately need the capability the protection blocks"). Per landlock(7), REFER is "the only access right which is denied by default by any ruleset, even if the right is not specified as handled at ruleset creation time," and "the only way to make a ruleset grant this right is to explicitly allow it for a specific directory by adding a matching rule." So in sandlock's model, REFER being Active (handled, and granted on writable paths via write_access) is what permits controlled cross-directory rename within writable areas; removing it can only make the sandbox stricter, never looser. disable(FsRefer) is therefore close to meaningless and is a footgun. Worth either a doc caveat or rejecting disable(FsRefer) outright. The other five protections map onto the disable/degrade model correctly.

congwang-mk · 2026-06-02T22:21:26Z

-    if abi < MIN_ABI {
-        return Err(SandlockError::Runtime(
-            crate::error::SandboxRuntimeError::Confinement(
-                ConfinementError::InsufficientAbi {


InsufficientAbi can be removed together?

Yes — removed. InsufficientAbi had zero construction or match sites after the per-protection rewrite in this PR; ProtectionUnavailable (carrying protection + required_abi + host_abi) fully replaced the old global "ABI too low" path. Build + 334 lib tests green without it.

The per-protection availability resolution (ProtectionUnavailable, carrying protection + required_abi + host_abi) replaced the old global "ABI too low" check during the Protection work. InsufficientAbi has had zero construction or match sites since then — it was orphaned by the confine_inner per-protection rewrite. Remove it.

congwang-mk reviewed May 30, 2026

View reviewed changes

dzerik added 23 commits June 2, 2026 13:19

core: introduce Protection enum with per-variant ABI floor

c4d071d

core: add ProtectionState and ProtectionPolicy with Strict default

047c14f

core: add Sandbox::protection_policy field defaulting to strict-all

f0b8b9e

core: add ConfinementError::ProtectionUnavailable variant

1c6067b

core: per-protection availability resolution in confine_inner

6d6c38d

core: mask Degraded fs protections; consolidate net_wildcard computation

014746e

cli: extend 'sandlock check' with per-protection availability report

85bbc5d

core: add Sandbox::active_protections() runtime accessor

ce11c8f

core: SandboxBuilder::allow_degraded and ::disable polarity-out methods

93a398a

ffi: C ABI for Protection + allow_degraded / disable builders

75b3693

python: Sandbox allow_degraded / disable kwargs + Protection IntEnum

b9a69a4

cli: --allow-degraded and --disable flags for sandlock run

5540dd2

docs: document Protection opt-out (allow_degraded / disable)

40ff410

dzerik force-pushed the follow-up-c-protection-foundation branch from 22a7dc9 to ebc6c79 Compare June 2, 2026 10:46

congwang-mk reviewed Jun 2, 2026

View reviewed changes

Conversation

dzerik commented May 27, 2026

Layers

Public API surface added

Three states per protection

Validation

Two findings worth flagging

CI

Scope discipline — what is NOT in this PR

Reference

Uh oh!

congwang-mk left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dzerik commented Jun 2, 2026

Uh oh!

congwang-mk commented Jun 2, 2026

Blocking: disable() of an FS protection fails with EINVAL when the sandbox has writable paths

Root cause

Runtime repro (forked child per case, kernel 6.18, Landlock ABI v7)

Why the existing tests miss it

Suggested fix

Separate semantic note on FsRefer

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Blocking: `disable()` of an FS protection fails with EINVAL when the sandbox has writable paths