Skip to content

feat: add cgroup CPU isolation for sandboxed walltime runs#427

Open
GuillaumeLagrange wants to merge 2 commits into
mainfrom
cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes
Open

feat: add cgroup CPU isolation for sandboxed walltime runs#427
GuillaumeLagrange wants to merge 2 commits into
mainfrom
cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes

Conversation

@GuillaumeLagrange

Copy link
Copy Markdown
Contributor

Add an alternative to systemd-run where the caller can prepare cgroups and forward the info through CODSPEED_WALLTIME_ISOLATION=CGROUP:/path/to/cgroup.
The runner expects this group to have two leaves: support and bench.
When spawning the bench process, the runner

  • self places into support group
  • spawns the benchmark process and have it move itself in the bench group

@greptile-apps

greptile-apps Bot commented Jun 29, 2026

Copy link
Copy Markdown

Greptile Summary

This PR adds hook-based CPU isolation for walltime benchmark runs. The main changes are:

  • New Isolation lifecycle for none, hook, and systemd modes.
  • Hook setup, benchmark wrapping, and teardown moved into isolation.rs.
  • Profiler wrapping now uses a requires_sudo decision.
  • Perf and Samply updated to match the new profiler contract.

Confidence Score: 4/5

This is close, but the hook setup failure path should be fixed before merging.

  • The profiler sudo contract is updated consistently across the changed call sites.
  • Hook mode can continue after pre-bench setup fails.
  • That can produce walltime results with the process tree placed in the wrong cgroup.

src/executor/wall_time/isolation.rs

Important Files Changed

Filename Overview
src/executor/wall_time/executor.rs Resolves isolation once, keeps it alive for teardown, and passes the sudo requirement through the run path.
src/executor/wall_time/isolation.rs Owns hook and systemd isolation selection, but the pre-bench hook failure path can leave hook isolation only partly applied.
src/executor/wall_time/profiler/mod.rs Updates the profiler trait contract to describe sudo requirements instead of isolation state.
src/executor/wall_time/profiler/perf/mod.rs Updates perf wrapping to use the new sudo requirement flag.
src/executor/wall_time/profiler/samply/mod.rs Updates Samply wrapping to use the new sudo requirement flag.

Reviews (4): Last reviewed commit: "refactor(walltime): make the runner mech..." | Re-trigger Greptile

Comment thread src/executor/wall_time/isolation.rs Outdated
Comment thread src/executor/wall_time/isolation.rs Outdated
@codspeed-hq

codspeed-hq Bot commented Jun 29, 2026

Copy link
Copy Markdown

Merging this PR will not alter performance

⚠️ Unknown Walltime execution environment detected

Using the Walltime instrument on standard Hosted Runners will lead to inconsistent data.

For the most accurate results, we recommend using CodSpeed Macro Runners: bare-metal machines fine-tuned for performance measurement consistency.

✅ 7 untouched benchmarks


Comparing cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes (757ec17) with main (7ce2c98)

Open in CodSpeed

@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes branch from a15e7ed to 40f9325 Compare June 29, 2026 15:51
Comment thread src/executor/wall_time/isolation.rs Outdated
IsolationMode::Cgroup { scope_dir } => {
// Move the runner (and the profiler it later spawns) onto the
// system cores before wrapping the benchmark for the bench cores.
place_runner_in_support(scope_dir)?;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Support leaf race In cgroup mode, this moves the runner into <scope>/support/cgroup.procs before the code waits for any leaf to exist. If the privileged setup creates support and bench asynchronously, this write can fail with a missing support/cgroup.procs path even though the cgroup layout would be ready shortly after. The run then exits before the benchmark starts, so the support leaf needs the same readiness handling as the bench leaf.

The walltime executor pins the benchmark to dedicated cores with
`systemd-run --scope --slice=codspeed.slice`, but that needs a reachable host
systemd — absent inside the macro-agent sandbox, where the runner is PID 1 of
a private namespace with no host authority.

Add a `Cgroup` isolation mode for that case, selected by
`CODSPEED_ISOLATION=CGROUP:<dir>`. The privileged side delegates that scope
cgroup already split and pinned into `all` / `support` / `bench` leaves; the
runner just relocates with two plain `cgroup.procs` writes — itself into
`support`, so it and the profiler stay off the bench cores, and the benchmark
into `bench` via a small `bash` shim. No cpuset arithmetic and no privilege:
the benchmark stays in the profiler's process subtree, so it records
unprivileged.

Unlike the systemd scope, this keeps the benchmark a descendant of the
profiler, so the `isolate` flag threaded through the profilers becomes
`requires_sudo`, which is true only for the systemd mode.

Refs COD-3012
Co-Authored-By: Claude <noreply@anthropic.com>
@GuillaumeLagrange GuillaumeLagrange force-pushed the cod-3012-cpu-isolation-on-macro-runner-for-sandboxed-processes branch from 40f9325 to d2f2793 Compare June 29, 2026 16:24
The runner hard-coded two isolation mechanisms (a `systemd-run` scope and
per-spawn cgroup-dir plumbing) and split the work between a `HookScriptsGuard`
that ran the pre/post-bench hooks and `isolation.rs` that did the wrapping.

Replace both with a single `Isolation` type that owns the whole lifecycle:
`resolve()` runs the pre-bench hook, `wrap_bench()` pins the leaf, and `Drop`
runs post-bench. The machine now owns all cpuset logic behind three hooks
(`codspeed-{pre,wrap,post}-bench`); the runner only invokes them and is
otherwise oblivious to how cores are attributed. Discovery is by hook presence:
an executable `codspeed-wrap-bench` selects the hook path (no sudo, the
benchmark stays a descendant of the profiler); its absence falls back to the
existing `systemd-run --scope --slice=codspeed.slice` so gen-1 images without
the hook keep working unchanged. `requires_sudo` is now true only for that
fallback.

The pre-bench hook is invoked with the runner's PID so the machine places the
runner (and the profiler it spawns) onto the system cores, leaving the runner
with no cgroup writes of its own.

Refs COD-3012
Co-Authored-By: Claude <noreply@anthropic.com>
Comment on lines +129 to +136
Ok(o) if o.status.success() => {}
Ok(o) => debug!(
"{hook} exited {}: {}",
o.status,
String::from_utf8_lossy(&o.stderr)
),
Err(e) => debug!("failed to run {hook}: {e}"),
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Hook setup can fail silently

This branch logs a failing codspeed-pre-bench hook and still selects hook isolation. When the pre hook fails because the support cgroup is not ready, permissions are wrong, or the hook install is partial, the runner continues with codspeed-wrap-bench and runs the profiler without sudo. The walltime run can then complete with the runner or profiler outside the intended support leaf, which produces incorrectly isolated benchmark results instead of a clear failure or fallback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant