Skip to content

Protection opt-out: --allow-degraded / --disable per-protection#71

Open
dzerik wants to merge 24 commits into
multikernel:mainfrom
dzerik:follow-up-c-protection-foundation
Open

Protection opt-out: --allow-degraded / --disable per-protection#71
dzerik wants to merge 24 commits into
multikernel:mainfrom
dzerik:follow-up-c-protection-foundation

Conversation

@dzerik
Copy link
Copy Markdown
Contributor

@dzerik dzerik commented May 27, 2026

Fixes #17.

Implements the opt-out polarity we agreed on in the design ack comment on #17: default behaviour is Strict for every protection, and two new builder methods (allow_degraded(Protection) and disable(Protection)) opt out per-protection. The result is that callers on a v5 kernel (RHEL 9, Ubuntu 22.04, etc.) can write a single line — .disable(Protection::SignalScope).disable(Protection::AbstractUnixSocketScope) — and get the v5-level FS + REFER + truncate + TCP + ioctl-dev sandbox without the two v6 IPC scopes, exactly as you described it in your first comment on this issue.

The hard MIN_ABI = 6 floor in landlock.rs is gone; with the default ProtectionPolicy::strict_all() on a v6 host every protection still resolves to Active, so the pre-refactor floor is preserved exactly. The constant itself stays for downstream backwards-compat (it now expresses "minimum ABI when every protection is in Strict").

Layers

Same RFC-chain shape as #43 / #46 / #54. Commit prefixes mark the boundary:

core (8 commits, 15b09ce..30ad30c): Protection enum and its per-variant min_abi(); ProtectionState (Strict / Degradable / Disabled) and ProtectionPolicy; ProtectionStatus runtime view; Resolved 4-way (Active / Degraded / Disabled / StrictlyUnavailable) at the syscall boundary; Sandbox::protection_policy field defaulting to strict_all(); confine_inner walks Protection::all() and returns ConfinementError::ProtectionUnavailable { protection, required_abi, host_abi } for any strict + unavailable combination; compute_fs_mask / compute_net_mask / compute_scope_mask derive Landlock attrs from the resolution; Sandbox::active_protections() exposes the runtime view; sandlock check learns a per-protection availability table.

ffi (1 commit, 265b3c1): C ABI for Protection, two builder setters with move-semantics, and sandlock_protection_min_abi() introspection. The C header declares the discriminants and the new functions.

python (1 commit, 53af1d1): Protection IntEnum re-exported at the package top level; allow_degraded and disable kwargs on the Sandbox dataclass (last-write-wins to mirror ProtectionPolicy::set); ctypes bindings call through to the C ABI.

cli (2 commits, b443597..2be594b): sandlock check extended with the per-protection availability table; sandlock run learns --allow-degraded <name> and --disable <name> (repeatable; case-insensitive kebab-case).

docs (1 commit, a43c1d6): a "Protection opt-out" section in both docs/extension-handlers.md (Rust) and docs/python-handlers.md (Python), and a one-line README pointer.

maintainer-lens follow-up (3 commits, ceae31c..0d5e5fa, added after a deep code review pass): FFI input validation so an out-of-range discriminant from C or Python is rejected at the boundary instead of triggering UB at a Rust match over a #[repr(C)] enum; canonical-name rename (the previous Protection::AbstractUnixScope was missing the noun Socket and didn't agree with the Python ABSTRACT_UNIX_SOCKET_SCOPE spelling — the four bindings now all use AbstractUnixSocketScope / abstract-unix-socket-scope, with the old CLI spelling kept as an alias); 14 mask-contract tests asserting the actual Landlock attribute bits produced by each compute_*_mask for each (host ABI, ProtectionState) cell, plus a compute_scope_mask precondition docstring and debug_assert!.

ci (1 commit, 8c1d36f): ubuntu-22.04 added to the Rust matrix so the v3 path is exercised on a real kernel on every push; a Report Landlock ABI step prints the host's sandlock check output to each job's log for visibility.

Public API surface added

Trying to keep this minimal per your standing #36 priority. Everything new under sandlock_core:::

  • Protection (enum, 6 variants — one per kernel ABI floor); Protection::min_abi(); Protection::all()
  • ProtectionState (enum); ProtectionPolicy (struct + strict_all()/state()/iter(); set() is #[doc(hidden)] pub so the FFI-tests can drive resolution directly)
  • ProtectionStatus (enum, 4-way runtime view)
  • Sandbox::active_protections() (runtime accessor)
  • SandboxBuilder::allow_degraded(Protection) -> Self; SandboxBuilder::disable(Protection) -> Self
  • Sandbox::protection_policy (public field, mirrors the rest of Sandbox)
  • ConfinementError::ProtectionUnavailable { protection, required_abi, host_abi } (existing enum variant)
  • landlock::compute_fs_mask / compute_net_mask (already pub for downstream tests in this repo; compute_scope_mask deliberately stayed pub(crate))

C ABI: sandlock_protection_t enum (6 named discriminants), sandlock_protection_min_abi(uint32_t) -> uint32_t, sandlock_sandbox_builder_allow_degraded, sandlock_sandbox_builder_disable. Setter functions take uint32_t for the discriminant (not the enum type) so an out-of-range value is rejected at the boundary; min_abi(unknown) returns 0 as a sentinel.

Python: Protection IntEnum re-exported; two new kwargs on the Sandbox dataclass.

Three states per protection

State Capable host Incapable host Use case
Strict (default) Active ConfinementError::ProtectionUnavailable at build/run matches the pre-refactor MIN_ABI=6 behaviour
Degradable (allow_degraded) Active silently skipped (observable via active_protections() and sandlock check) "use the protection where the kernel has it, don't fail the build"
Disabled (disable) not enforced even though available not enforced "this workload genuinely needs the capability the protection blocks"

Disabled deliberately works on a capable kernel — per your answer to question 3 in the design thread.

Validation

Tests: 301 lib (includes 14 new mask_contract_tests asserting Landlock bits per cell), 18 integration (tests/integration/test_protection.rs covers the policy-state and resolve() mechanics), 10 FFI integration, 12 Python (tests/test_protection.py). The mask-contract tests catch the bug class that the original 18 integration tests miss — i.e. a regression that would mis-compute handled_access_fs or scoped would now fail a test instead of silently degrading the sandbox.

VM matrix (full protocol with reproducer recipe attached out-of-band; this is the relevant table):

Distro Kernel sandlock check ABI Default strict Smallest --disable that produces exit=0
Rocky 9.6 ← #17 reporter's env 5.14.0-570.17.1.el9_6 v5 honest fails with required protection SignalScope is not available: host Landlock ABI is v5, requires v6 --disable signal-scope --disable abstract-unix-socket-scope
Rocky 9.7 5.14.0-611.5.1.el9_7 v6 reported (RHEL backport) fails inside landlock_create_ruleset with EINVAL — backport reports v6 but does not provide v5/v6 attrs (see finding F1 below) --disable fs-ioctl-dev --disable signal-scope --disable abstract-unix-socket-scope
Ubuntu 22.04 5.15.0-179-generic v1 in this multipass image fails with required protection FsRefer is not available: host Landlock ABI is v1, requires v2 full --disable of every v2+ protection
Debian 12 bookworm 6.1.0-48-cloud-amd64 v2 (build-clipped) fails on v3+ requirements full --disable of every v3+ protection
Fedora 41 6.11.4-301.fc41 v5 honest, vanilla fails on the two v6 scopes --disable signal-scope --disable abstract-unix-socket-scope (also exercised --allow-degraded for the same two — same exit=0)

Every cell runs a stock git clone of this branch (no local patches; the seccomp fallback fix already merged in #63 is now in main), cargo build --release, cargo test --release --lib -p sandlock-core, the integration tests, and then a sandlock run of /usr/bin/true with the listed --disable flags.

Two findings worth flagging

F1 — RHEL 9.7 reports ABI v6 but the kernel does not provide it. The version returned by landlock_create_ruleset(NULL, 0, LANDLOCK_CREATE_RULESET_VERSION) is 6 on 9.7, but the actual ruleset creation with v5/v6 attrs fails with EINVAL. The opt-out covers it (the user --disables the affected protections and the ruleset assembles cleanly), but it does mean the sandlock check ABI line cannot be trusted as a capability statement on backport distros. Per-protection probing at confine_inner is the reliable signal, which is what this PR already does. Not requesting a change — flagged for context.

F2 — Initial pre-rebase implementation of compute_fs_mask only masked off Disabled protections, not Degraded ones. A test against compute_fs_mask(v4, policy_with_degradable_ioctl_dev) would have asserted the IOCTL_DEV bit absent and gotten it present, then landlock_create_ruleset would have failed with EINVAL because v4 doesn't know the bit. Fixed in bf9490d (Disabled | Degraded matched together), pinned by the new fs_mask_degraded_protections_get_masked_off_on_low_abi_host test in 0d5e5fa.

CI

ubuntu-22.04 added to .github/workflows/ci.yml. With the existing matrix [ubuntu-latest, ubuntu-24.04-arm] both runners now report ABI v6 or higher, so the v3/v4 path was unreached by real-kernel CI. A Report Landlock ABI step prints the host ABI to the job log on every runner for visibility — verified on the first fork-internal dispatch:

Runner Image kernel sandlock check ABI
ubuntu-22.04 6.8.0-azure v4
ubuntu-24.04-arm 6.17.0-azure v6
ubuntu-latest 6.17.0-azure v7

(Ubuntu LTS labels actually ship Azure-specific kernels, so ubuntu-22.04 is not stock 5.15 — see actions/runner-images/images/ubuntu/Ubuntu2204-Readme.md.)

Known coverage gap — Landlock ABI v5 is unreachable on any GitHub-hosted runner today. The hosted Ubuntu images jump from v4 (22.04) to v6+ (24.04), so the FsIoctlDev-only code path (a kernel that has v5 but not v6 — exactly the production fleet shape on Rocky 9.6 / Fedora 41) cannot be exercised against a real landlock_create_ruleset syscall in this CI. The v5 cells are covered by the synthetic-ABI landlock::mask_contract_tests (which run on every runner) and by the out-of-band VM matrix on Rocky 9.6 (kernel 5.14.0-570.x.el9_6) and Fedora 41 (kernel 6.11.4). If you want real-kernel v5 coverage in CI we'd need a self-hosted runner pointed at a v5 box; happy to advise on configuration, but the infrastructure decision is yours.

The integration tests are split per cell: on ubuntu-22.04 only test_protection runs (the policy/resolution-mechanics subset that uses a synthetic ABI and is host-ABI independent). The remaining integration suite runs on ≥v6 runners — those tests fundamentally assume a v6+ host because they construct default Sandbox::builder() whose ProtectionPolicy::strict_all() requires every Protection to resolve to Active. Refactoring them to adapt to whatever the host can provide is a separate task; the v3/v4 path is exercised here through the new landlock::mask_contract_tests (which run on every cell) plus the out-of-band VM matrix above.

workflow_dispatch is added to the triggers so future manual reruns don't need a push commit.

Scope discipline — what is NOT in this PR

  • Overlayfs / branchfs COW backends — untouched; this PR is entirely Landlock-side.
  • Any change to syscalls outside the Landlock attribute computation.

Reference

Copy link
Copy Markdown
Contributor

@congwang-mk congwang-mk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! Nice work.

Comment thread docs/extension-handlers.md Outdated
See `crates/sandlock-ffi/tests/c/handler_smoke.c` for the canonical
end-to-end example.

## Protection opt-out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unrelated to docs/extension-handlers.md, better to move it to docs/sandbox-reference.md?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to docs/sandbox-reference.md. Also repointed the README anchor that referenced the old location.

Comment thread docs/python-handlers.md Outdated
was actually dispatched — the supervisor handles cleanup on all
paths.

## Protection opt-out
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ditto, maybe move to python/README.md ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moved to python/README.md.

Comment thread crates/sandlock-core/src/sandbox.rs Outdated
/// state arrives with the public builder API in a later change.
/// Deserialized sandboxes get `ProtectionPolicy::default()`, which
/// is identical to `strict_all()`.
#[serde(skip)]
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you sure this is safe? I am not sure, worth a double check.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Traced this — you're right to flag it. The #[serde(skip)] interacts with checkpoint, not just the profile path.

Sandbox derives Serialize/Deserialize, and that derive is used by Checkpoint::save/load via bincode (checkpoint.rs:455 serialize, :547 deserialize) — distinct from the TOML profile path, which goes through a separate ProfileInput struct that never sees protection_policy. So policy.dat does not carry protection_policy, and Checkpoint::load(...).policy.protection_policy is always strict_all() regardless of the original sandbox.

Severity today is limited: Checkpoint::load reconstructs the struct but nothing re-confines from .policy automatically (there's no restore-and-reconfine path, and no CLI restore/resume). So today it's silent metadata loss, not a live break. But it's latent: once a restore-and-reconfine path lands — the natural endpoint of checkpoint/restore — it will silently apply strict_all() instead of the original opt-out, breaking restore on exactly the v5 hosts this feature exists to serve.

Root cause is the now-stale doc comment you flagged separately: the skip was justified by "no builder API yet to make policy non-default." This PR adds that builder API, so the justification is gone.

There's a design fork on the fix, and it's your call on the checkpoint contract:

  • A — store in checkpoint: drop the skip, derive Serialize/Deserialize on Protection/ProtectionState/ProtectionPolicy (plain enums + a HashMap, bincode-clean), so the checkpoint is self-contained. Caveat: this shifts the bincode field layout, so existing policy.dat files become unreadable — acceptable under the pre-1.0 no-backcompat stance, but worth naming.
  • B — re-supply on restore: keep protection_policy out of the checkpoint entirely; a future restore takes the policy as an argument and validates it against the current host ABI. Cleaner separation (checkpoint = process state, policy = orthogonal, validated fresh), but it changes the eventual restore signature.

I lean A as the least-invasive fix that stops the data loss now, but B may fit your checkpoint model better. Which do you prefer?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems A is cleaner and more portable?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (option A). Removed the skip; derived Serialize/Deserialize on Protection/ProtectionState/ProtectionPolicy so the policy rides in the checkpoint. Added a destructive test (protection_policy_survives_bincode_round_trip) that fails if the skip is restored. As noted: this shifts the bincode field layout, so pre-existing policy.dat files won't load — the pre-1.0 break we discussed.

Comment thread crates/sandlock-core/src/landlock.rs Outdated
(ProtectionState::Degradable, true) => Resolved::Active,
(ProtectionState::Degradable, false) => Resolved::Degraded,
}
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There are two resolver types modeling the same thing:

  • landlock::Resolved { Active, Degraded, Disabled, StrictlyUnavailable } via landlock::resolve()
  • protection::ProtectionStatus { Active, Degraded, Disabled, Unavailable } via
    ProtectionStatus::resolve()

The two resolve functions are identical match arms:

match (policy.state(p), available) {
(Disabled, _) => …Disabled,
(Strict, true) => …Active,
(Strict, false) => …{Strictly}Unavailable,
(Degradable, true) => …Active,
(Degradable, false)=> …Degraded,
}

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed — the two are the same five-way mapping with one variant renamed (StrictlyUnavailable vs Unavailable). The split was unintentional: ProtectionStatus is the public-facing view (returned by active_protections()), Resolved grew as an internal helper for the mask computation, and they drifted into duplicate resolve() bodies.

Consolidation options:

  • (a) Drop Resolved entirely; use ProtectionStatus everywhere, including the internal mask path. One enum, one resolve(). Smallest surface — nothing internal-only remains.
  • (b) Keep both names but have one resolve() and a From conversion, if you want the internal/public type distinction preserved.

I lean (a) — there's no behavioural difference between the two, so a single public ProtectionStatus + single resolve() is the least surface to maintain. Any reason to keep an internal-only Resolved that I'm missing?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, (a) is better

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done (option a). Dropped landlock::Resolved entirely; ProtectionStatus is the single resolver, used by the mask-compute path too. StrictlyUnavailable folded into Unavailable.

Comment thread crates/sandlock-cli/src/main.rs Outdated

/// Allow the named protection to degrade silently if the host kernel ABI lacks support.
/// Repeatable. Accepted values: fs-refer, fs-truncate, net-tcp, fs-ioctl-dev,
/// signal-scope, abstract-unix-scope-socket.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: abstract-unix-scope-socket -> abstract-unix-socket-scope.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — help text now reads abstract-unix-socket-scope.

Comment thread crates/sandlock-core/src/sandbox.rs Outdated
/// Per-protection enforcement policy. Default
/// (`ProtectionPolicy::strict_all()`) preserves the historical hard
/// `MIN_ABI = 6` behaviour. Builder methods to deviate from
/// strict-all are added in a follow-up.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Stale?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed — rewrote the comment. The "added in a follow-up" note was stale (the builder API is in this PR), and the field is now serialized (see the checkpoint thread).

Comment thread .github/workflows/ci.yml Outdated
branches: [main]
pull_request:
branches: [main]
workflow_dispatch:
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this related?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right — out of scope here. Reverted ci.yml to match main; I'll send the ubuntu-22.04 matrix addition as a separate CI-focused PR.

Comment thread crates/sandlock-cli/src/main.rs Outdated
"net-tcp" => Ok(Protection::NetTcp),
"fs-ioctl-dev" => Ok(Protection::FsIoctlDev),
"signal-scope" => Ok(Protection::SignalScope),
"abstract-unix-socket-scope" | "abstract-unix-scope-socket" => {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason to keep this abstract-unix-scope-socket ?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Removed — no reason to keep it. The old spelling never shipped in a release, so nothing depends on it. parse_protection now accepts only abstract-unix-socket-scope.

dzerik added 23 commits June 2, 2026 13:19
The Protection setters took `sandlock_protection_t` and matched on it
exhaustively, so a C or Python caller passing an integer outside the
known discriminant range (0..=5) produced undefined behaviour at the
Rust match — `#[repr(C)]` enums are UB to construct from arbitrary
bits.

Change the three entry-points (`sandlock_protection_min_abi`,
`sandlock_sandbox_builder_allow_degraded`,
`sandlock_sandbox_builder_disable`) to accept `u32` and route every
incoming value through `try_protection_from_raw`. Unknown values
are now handled at the boundary:

- `min_abi(unknown)` returns 0 — a sentinel that cannot collide with
  any real `min_abi()` (those start at 2).
- The builder setters return the input pointer untouched, mirroring
  the null-builder convention already used elsewhere in the C ABI.

The Python wrapper adds a stricter guard: an out-of-range int raises
`ValueError` at SDK boundary rather than silently no-op'ing through
the FFI, because the Python contract should fail loudly on a typed
mistake.

Update the C header to declare the new signatures (`uint32_t`
instead of the enum type) and document the sentinel and no-op
behaviour. The `sandlock_protection_t` enum is kept as a labelling
type for callers who want the names; passing an enum constant still
works because C implicitly promotes to `uint32_t`.

Tests:
- 3 new FFI regression tests cover the boundary: min_abi sentinel,
  setter no-op, and "bad call then good call" to catch builder
  corruption in the bad path.
- 4 new Python tests cover ValueError on out-of-range, negative,
  and well-formed plain-int inputs.
The previous name omitted the noun "Socket" — reading "abstract unix
scope" does not parse, and the kernel constant is
`LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET` (where SCOPE is the family, not
part of the protection's name). The other v6 scope already uses the
`Signal` + `Scope` pattern; mirror it.

Before this commit the same protection was spelled four different
ways across the bindings:

| Layer  | Old name                       |
|--------|--------------------------------|
| Rust   | `Protection::AbstractUnixScope` |
| C ABI  | `AbstractUnixScopeSocket`       |
| Python | `ABSTRACT_UNIX_SOCKET_SCOPE`    |
| CLI    | `abstract-unix-scope-socket`    |

After this commit all four agree on the canonical
`AbstractUnixSocketScope` / `abstract-unix-socket-scope` form, which
also matches the existing `Protection::SignalScope` /
`signal-scope` pattern.

Updates touch:
- `Protection` enum (and every match arm in core / FFI / tests).
- C ABI: the discriminant value at index 5 is unchanged
  (`PROT_ABSTRACT_UNIX_SOCKET_SCOPE` / `SANDLOCK_PROTECTION_ABSTRACT_UNIX_SOCKET_SCOPE`
  in the header already match this spelling).
- CLI parser: the primary string is now `abstract-unix-socket-scope`;
  the previous `abstract-unix-scope-socket` is kept as an alias so
  any out-in-the-wild script still parses. Help text and error
  message updated to the canonical name.
- Python re-export: `Protection.ABSTRACT_UNIX_SOCKET_SCOPE` was
  already canonical; the IntEnum is unchanged.

No behaviour change. 287 lib + 18 integration + 10 FFI + 12 Python
tests still pass.
…ondition

The 18 integration tests in `test_protection.rs` exercise policy-state
storage and `resolve()` resolution mechanics — necessary, but they do
not verify the *observable* Landlock attrs that exit `confine_inner`.
A regression that mis-computes the `handled_access_fs` or `scoped`
masks would have left every existing test green while silently
degrading the security boundary at the syscall layer.

Add 14 unit tests for the three mask helpers (`compute_scope_mask`,
`compute_fs_mask`, `compute_net_mask`) that check the actual
Landlock bits produced for each (Protection, host_abi,
ProtectionState) cell that matters. Tests live alongside the
helpers in `landlock.rs` so they can call the `pub(crate)`
`compute_scope_mask` without widening the public surface.

Coverage:

- scope_mask: strict-v6 sets both scope bits; disable(SignalScope)
  clears only SIGNAL; disable(AbstractUnixSocketScope) clears only
  ABSTRACT_UNIX_SOCKET; disable both → mask=0; Degradable scopes on
  a v5 host → mask=0.
- fs_mask: strict-v6 includes REFER+TRUNCATE+IOCTL_DEV; each
  `Disabled` clears exactly one bit; Degraded FsIoctlDev on a v4
  host omits the IOCTL_DEV bit (pins the bf9490d fix).
- net_mask: handle_net=false → (0, false); strict no-wildcard →
  (BIND|CONNECT, false); Disabled NetTcp → (0, false); Degradable
  NetTcp on a v3 host → (0, false).

Also document the `compute_scope_mask` precondition explicitly:
callers must filter `Resolved::StrictlyUnavailable` upstream
(`confine_inner` does, via the `Protection::all()` walk). A
`debug_assert!` per scope protection pins the invariant in test
builds, so a future caller that forgets the upstream guard fails
loudly instead of silently producing a mask=0.
`ubuntu-latest` and `ubuntu-24.04-arm` both run kernel 6.8 — Landlock
ABI v4. That leaves the v3 path (FsTruncate as the highest available
protection, NetTcp / FsIoctlDev / both v6 scopes unavailable)
exercised only by synthetic-ABI unit tests, never by a real
landlock_create_ruleset on a v3 kernel.

Add `ubuntu-22.04` (kernel 5.15, ABI v3 vanilla) so the v3 path stays
covered on every push and PR even as the runner images roll forward.
A future regression that mishandles "v3 host: bits above v3 must not
be requested" would now fail a real-kernel integration test, not
just a unit test against a synthetic ABI value.

Also add a `Report Landlock ABI` step that runs `sandlock check` and
prints the host's ABI line in the job log. This makes it possible to
diagnose a Landlock-version-sensitive regression by glancing at the
CI log without re-running the job locally.

CI matrix coverage after this commit:
- ubuntu-22.04        → kernel 5.15 → ABI v3 (new)
- ubuntu-latest       → kernel 6.8  → ABI v4
- ubuntu-24.04-arm    → kernel 6.8  → ABI v4 (arm64)

ABIs v5 and v6 are not yet reachable on GitHub's hosted runners
(stock ubuntu-latest is below 6.7 / 6.12); the per-protection
availability matrix for v5 and v6 is still covered by the synthetic-
ABI unit tests in `landlock::mask_contract_tests` and the
out-of-band VM matrix protocol.
Header is now cbindgen-generated (upstream switched in multikernel#87). The manual
header edits from the original Protection commits are dropped; the C ABI
for the Protection setters is regenerated from the #[no_mangle] Rust
definitions instead. CI verifies the committed header matches a fresh
generation.
Drop the landlock::Resolved enum and the free landlock::resolve()
function; the protection module's ProtectionStatus enum and its
resolve() associated function now drive both the internal mask-compute
path and the public runtime accessor. One enum, one resolver.

The StrictlyUnavailable variant maps onto ProtectionStatus::Unavailable.
ProtectionStatus::resolve is promoted to #[doc(hidden)] pub so the
synthetic-ABI integration tests can drive it directly, mirroring the
access the removed landlock::resolve previously offered.
The Sandbox.protection_policy field was #[serde(skip)], so a checkpoint
silently dropped the policy and a restored sandbox reset to strict_all().
On a host where a disable() opt-out was required (e.g. a v5 kernel that
cannot provide a v6 scope) this broke restore: confine then failed with
ProtectionUnavailable.

Derive Serialize/Deserialize on Protection, ProtectionState and
ProtectionPolicy and remove the serde(skip) so the policy is stored in
the checkpoint. Update the now-stale field doc (the builder API shipped
in this PR and the field is serialized). Add a destructive round-trip
test asserting a disabled protection survives bincode ser/de instead of
resetting to Strict.
The --allow-degraded/--disable help text listed the old
abstract-unix-scope-socket spelling; correct it to the canonical
abstract-unix-socket-scope (matching LANDLOCK_SCOPE_ABSTRACT_UNIX_SOCKET).

parse_protection accepted abstract-unix-scope-socket as a backward-compat
alias. It never shipped in a release, so there is nothing to stay
compatible with — remove it and accept only the canonical name.
…ME.md

The Protection opt-out documentation belongs with the Sandbox reference,
not the extension-handlers guide. Move the Rust section from
extension-handlers.md to sandbox-reference.md (adding the per-protection
ABI floor table and the checkpoint-persistence note), and the Python
section from python-handlers.md to python/README.md. Repoint the
top-level README and the Python cross-reference at the new home.
The branch added ubuntu-22.04 to the test matrix, a workflow_dispatch
trigger, and a Landlock-ABI report step. That CI-matrix work is out of
scope for this PR; restore ci.yml to upstream's version and send the
matrix change as a separate follow-up.
@dzerik dzerik force-pushed the follow-up-c-protection-foundation branch from 22a7dc9 to ebc6c79 Compare June 2, 2026 10:46
@dzerik
Copy link
Copy Markdown
Contributor Author

dzerik commented Jun 2, 2026

Pushed an update addressing the review. Also rebased onto current main — the branch had drifted ~55 commits behind (overlayfs/BranchFS removal, the syscalls-crate adoption, and the cbindgen header generation all landed since the original PR).

Verified: 334 core lib + 19 integration (incl. the new checkpoint test) + 10 FFI + 12 Python, all green. Smoke-tested end-to-end on five real kernels spanning ABI v1/v2/v5/v6 (Ubuntu 22.04, Debian 12, Fedora 41, Rocky 9.6, Rocky 9.7) — --disable produces a working sandbox on each, and the checkpoint round-trip test passes on all of them.

@congwang-mk
Copy link
Copy Markdown
Contributor

Blocking: disable() of an FS protection fails with EINVAL when the sandbox has writable paths

Following up on the protection opt-out. disable(Protection::FsRefer) (and FsTruncate, FsIoctlDev) does not just have surprising semantics, it causes confine to hard-fail whenever the sandbox has any writable path, which is nearly every real sandbox. I verified this against the kernel docs, the code path, and a runtime repro on a v7 host.

Root cause

compute_fs_mask removes the disabled bit from handled_access_fs, but the per-path access masks are policy-independent and still request that bit:

  • landlock.rs:423: let fs_write_mask = write_access(abi); includes REFER (v2+), TRUNCATE (v3+), and IOCTL_DEV (v5+).
  • landlock.rs:433: add_path_rule(.., fs_write_mask) passes that mask to the kernel for every writable directory.

landlock_add_rule(2) returns EINVAL when "rule_attr->allowed_access is not a subset of the ruleset handled accesses." So with disable(FsRefer) the ruleset no longer handles REFER, yet the writable-path rule still grants REFER, the subset check fails, and confinement aborts.

This mirrors the net path, which the PR handled correctly by gating rule installation on net_tcp_active. The FS path rules did not get the matching treatment, so only NetTcp and the two scope protections behave; the three FS protections do not.

Runtime repro (forked child per case, kernel 6.18, Landlock ABI v7)

[baseline_strict_write_tmp]          restrict_self: Operation not permitted   (reached restrict_self; /tmp rule installed fine)
[disable_fsrefer_write_tmp]          add path rule for "/tmp": Invalid argument (os error 22)   EINVAL
[disable_fstruncate_write_tmp]       add path rule for "/tmp": Invalid argument (os error 22)   EINVAL
[disable_fsioctldev_write_tmp]       add path rule for "/tmp": Invalid argument (os error 22)   EINVAL
[disable_fsrefer_no_writable_paths]  restrict_self: Operation not permitted   (no EINVAL)

(The baseline EPERM at restrict_self is just the bare-fork harness not setting PR_SET_NO_NEW_PRIVS; the real sandbox-launch path does. The point is that the baseline gets past add_path_rule, while the three disable cases die at it with EINVAL, and the no-writable-path case does not.)

Why the existing tests miss it

  • The mask_contract_tests exercise compute_fs_mask in isolation and never install a path rule, so the handled-vs-rule inconsistency is invisible to them.
  • The VM matrix only --disabled the two scope protections (plus fs-ioctl-dev on the Rocky 9.7 backport box) and ran /usr/bin/true with no --fs-write, so the writable-path code path was never exercised with a disabled FS bit.

Suggested fix

Intersect the per-path masks with the resolved handled mask, so a rule is a subset by construction:

let fs_write_mask = write_access(abi) & handled_access_fs;
// and where ACCESS_FILE is applied to non-directory paths in add_path_rule:
//   allowed_access = (access & ACCESS_FILE) & handled_access_fs

A regression test should call confine_filesystem (forked) with disable(FsRefer) plus fs_write("/tmp") and assert success, since no current test installs a path rule under a non-default policy.

Separate semantic note on FsRefer

Even after the mask fix, disable(FsRefer) cannot do what the disable() rustdoc promises ("workloads that legitimately need the capability the protection blocks"). Per landlock(7), REFER is "the only access right which is denied by default by any ruleset, even if the right is not specified as handled at ruleset creation time," and "the only way to make a ruleset grant this right is to explicitly allow it for a specific directory by adding a matching rule." So in sandlock's model, REFER being Active (handled, and granted on writable paths via write_access) is what permits controlled cross-directory rename within writable areas; removing it can only make the sandbox stricter, never looser. disable(FsRefer) is therefore close to meaningless and is a footgun. Worth either a doc caveat or rejecting disable(FsRefer) outright. The other five protections map onto the disable/degrade model correctly.

if abi < MIN_ABI {
return Err(SandlockError::Runtime(
crate::error::SandboxRuntimeError::Confinement(
ConfinementError::InsufficientAbi {
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

InsufficientAbi can be removed together?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes — removed. InsufficientAbi had zero construction or match sites after the per-protection rewrite in this PR; ProtectionUnavailable (carrying protection + required_abi + host_abi) fully replaced the old global "ABI too low" path. Build + 334 lib tests green without it.

The per-protection availability resolution (ProtectionUnavailable,
carrying protection + required_abi + host_abi) replaced the old global
"ABI too low" check during the Protection work. InsufficientAbi has
had zero construction or match sites since then — it was orphaned by
the confine_inner per-protection rewrite. Remove it.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Support for landlock ABI v5?

2 participants