ci: optimize self-hosted KubeVirt runners and CI pipeline by mangelajo · Pull Request #787 · jumpstarter-dev/jumpstarter

mangelajo · 2026-06-12T14:42:08Z

Summary

Split KubeVirt runners into small (4Gi) and large (16Gi) VM flavors for better concurrency
Route unit test / build jobs to arc-runner-kubevirt-small, E2E test jobs to arc-runner-kubevirt-large
Run package tests in parallel (make test -j4 --output-sync on GNU Make, -j4 on BSD)
Suppress log noise in CI with --log-level=CRITICAL --log-cli-level=CRITICAL via PYTEST_ADDOPTS (project defaults unchanged for local dev)
Skip apt-get install when CI dependencies are already pre-baked in the golden image
Increase Renode monitor connect timeout from 10s to 45s (smaller VMs need more startup time)
Add download logging/timeout to u-boot test fixture for CI debuggability
Detect --output-sync support via make --help instead of version string (works on macOS BSD make)

Test plan

All python-tests matrix jobs pass (Linux small + macOS, Python 3.11/3.12/3.13)
E2E tests pick up arc-runner-kubevirt-large label
Log output is suppressed in CI but not locally
macOS jobs run with -j4 without --output-sync (BSD make)
Pre-baked dependencies skip apt-get install (check "Setup Linux dependencies" step)

🤖 Generated with Claude Code

Switch E2E and pytest amd64/Linux jobs from ubuntu-24.04 to arc-runner-kubevirt, running on self-hosted KubeVirt VMs on the beast cluster. ARM64 and macOS jobs are unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

coderabbitai · 2026-06-12T14:42:21Z

Important

Review skipped

Draft detected.

Please check the settings in the CodeRabbit UI or the .coderabbit.yaml file in this repository. To trigger a single review, invoke the @coderabbitai review command.

⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: c31365b6-4454-4fdc-b112-269ddabf32bf

You can disable this status message by setting the reviews.review_status to false in the CodeRabbit configuration file.

Use the checkbox below for a quick retry:

🔍 Trigger review

📝 Walkthrough

Walkthrough

This PR updates GitHub Actions workflow runner configurations across the E2E and Python test pipelines, migrating from ubuntu-24.04 to arc-runner-kubevirt for job execution while maintaining the existing matrix structures for architecture coverage.

Changes

Runner Infrastructure Migration

Layer / File(s)	Summary
E2E workflow jobs runner migration `.github/workflows/e2e.yaml`	Build controller and operator image jobs, build-python-wheels, e2e-tests, and compatibility jobs switch from `ubuntu-24.04` to `arc-runner-kubevirt` in their job matrices and runner configurations.
Python tests workflow runner migration `.github/workflows/python-tests.yaml`	Pytest-matrix job updates its `runs-on` strategy from `ubuntu-24.04` to `arc-runner-kubevirt` while retaining `macos-15`.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Poem

🐰 In the CI clouds we hop and bound,
New runners found, a better ground,
Arc-runner-kubevirt, swift and true,
Let tests and builds run spry and new! ✨

🚥 Pre-merge checks | ✅ 5

✅ Passed checks (5 passed)

Check name	Status	Explanation
Docstring Coverage	✅ Passed	No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check	✅ Passed	Check skipped because no linked issues were found for this pull request.
Title check	✅ Passed	The title 'ci: optimize self-hosted KubeVirt runners and CI pipeline' is partially related to the changeset, which focuses on switching to KubeVirt runners, but overstates the scope by claiming 'optimize CI pipeline' when the changes are primarily runner migrations.
Description check	✅ Passed	The PR description comprehensively aligns with the changeset, detailing infrastructure changes (KubeVirt runner migration), performance improvements (parallel testing), and CI optimizations (log suppression, timeout adjustments).

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

✨ Finishing Touches

🧪 Generate unit tests (beta)

Create PR with unit tests
Commit unit tests in branch kubevirt-runners

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

kirkbrauer

This looks good to me, hopefully we get quite a speedup here!

mangelajo · 2026-06-15T07:31:53Z

I am experimenting @kirkbrauer , the operator I am trying for the github actions in K8s + kubevirt seems to be a bit slow to rotate VMs, and grab jobs. Unless we get to improve that we would have to look for an alternative option

Enable stderr capture for the DUT network exporter in e2e tests. The exporter was crashing with exit code 1 but its stderr was discarded, making it impossible to diagnose the failure. Changes: - Enable captureStderr for the DUT network exporter in BeforeAll - Use port-based log names to avoid collisions between exporters - Add DumpLogs in AfterAll so errors appear in test output Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On Ubuntu, nft, dnsmasq, and sysctl live in /usr/sbin which is not in PATH for non-root users. The runtime commands work fine because they go through sudo, but the shutil.which() startup check fails. Extend the search path to include /usr/sbin and /sbin. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The sysctl binary lives in /usr/sbin which isn't in PATH for non-root users on Ubuntu. The read-only sysctl call (without sudo) fails with FileNotFoundError. Resolve the full path at call time using the same /usr/sbin search path. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The which side_effect functions need to accept **kwargs since the driver now passes path= to shutil.which(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The sysctl calls now resolve to full path via _resolve_tool. Mock it in tests so assertions match the bare command name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

All tests that call get_interface_forwarding or set_interface_forwarding now mock _resolve_tool so they don't depend on whether sysctl exists on the build host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On kubevirt runners with pre-baked images, Renode is already present. Skip the 200MB download and install when the binary is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add pkg-test-all-parallel target that runs pkg-test-all with -j6 --output-sync so per-package test output is buffered and not interleaved. Use it in the CI workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

macOS ships BSD make which doesn't support --output-sync. Check for GNU Make 4+ before using parallel flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

macOS BSD make supports -j but not --output-sync. Use -j6 on both platforms so macOS also benefits from parallel execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Build jobs and unit tests use arc-runner-kubevirt-small (3Gi). E2E and compat tests use arc-runner-kubevirt-large (16Gi). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use -p no:logging to suppress all log output in CI, preventing DEBUG/INFO/WARNING/ERROR noise from polluting test logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Prevents DEBUG/INFO/WARNING/ERROR log output from polluting test output while keeping the logging plugin active for caplog tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Check for each binary/package before installing. On golden images with everything pre-installed, this skips apt-get entirely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

On smaller VMs under parallel test load, Renode can take longer than 10s to start up and bind its monitor port, causing DEADLINE_EXCEEDED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

- Revert --log-level=CRITICAL from project pyproject.toml (keep local dev defaults clean) - Add --log-level=CRITICAL and --log-cli-level=CRITICAL to PYTEST_ADDOPTS in CI workflow (suppresses both captured and live log output) - Reduce make parallelism from -j6 to -j4 to ease resource pressure on smaller VMs Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

BSD make on macOS doesn't support --output-sync and the GNU Make version check wasn't reliably detecting it. Checking make --help for the flag directly is more portable. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

The download had no timeout, no progress logging, and raw exceptions on failure — making CI failures impossible to diagnose in interleaved parallel output. Now logs download progress, sets a 120s timeout, and reports a clear pytest.fail message on network errors. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

With --output-sync, make buffers each target's output and flushes on completion. When a test fails, the failure details (traceback, FAILURES section) get truncated or lost entirely, making CI failures impossible to diagnose. Interleaved output is noisy but at least complete. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

When LOGS_DIR is set, each pkg-test-% target writes stdout/stderr to a separate log file and records failures via .failed markers (exit-code based, not grep). The test-report target prints full logs for failed packages and exits non-zero. Local dev behavior is unchanged. CI now sets LOGS_DIR, uploads all logs as artifacts (7-day retention), and uses fail-fast: false so all matrix jobs run to completion. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Without this, Python buffers stdout when piped through tee, hiding print() diagnostics until the test finishes or hangs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

RPM extraction hangs silently on resource-constrained VMs — add prints before and after to pinpoint where it stalls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

rpmfile decompresses the entire zstd CPIO payload (572MB) into memory to extract a 1.6MB file. On 4Gi VMs running parallel tests this triggers the OOM killer. Use rpm2cpio + cpio on Linux which streams without buffering; fall back to rpmfile on macOS. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

…).st_size Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

ci: use self-hosted KubeVirt runners for amd64 jobs

eb6328c

Switch E2E and pytest amd64/Linux jobs from ubuntu-24.04 to arc-runner-kubevirt, running on self-hosted KubeVirt VMs on the beast cluster. ARM64 and macOS jobs are unchanged. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

kirkbrauer approved these changes Jun 12, 2026

View reviewed changes

mangelajo marked this pull request as draft June 15, 2026 07:31

mangelajo and others added 13 commits June 17, 2026 12:28

Fix tcpdump test mocks to accept path kwarg from shutil.which

3507d59

The which side_effect functions need to accept **kwargs since the driver now passes path= to shutil.which(). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix iproute test to mock _resolve_tool for sysctl path

8895804

The sysctl calls now resolve to full path via _resolve_tool. Mock it in tests so assertions match the bare command name. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Mock _resolve_tool in all sysctl tests for platform independence

f314b93

All tests that call get_interface_forwarding or set_interface_forwarding now mock _resolve_tool so they don't depend on whether sysctl exists on the build host. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Skip Renode download when already installed

0b06d19

On kubevirt runners with pre-baked images, Renode is already present. Skip the 200MB download and install when the binary is available. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Run package unit tests in parallel (6 jobs) in CI

ed42d55

Add pkg-test-all-parallel target that runs pkg-test-all with -j6 --output-sync so per-package test output is buffered and not interleaved. Use it in the CI workflow. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Simplify CI to use make test -j6 --output-sync directly

af51db6

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fall back to sequential make test on macOS

0351a0d

macOS ships BSD make which doesn't support --output-sync. Check for GNU Make 4+ before using parallel flags. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Enable parallel test execution on macOS (without --output-sync)

4d20995

macOS BSD make supports -j but not --output-sync. Use -j6 on both platforms so macOS also benefits from parallel execution. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use small/large runner labels for kubevirt VMs

5564df6

Build jobs and unit tests use arc-runner-kubevirt-small (3Gi). E2E and compat tests use arc-runner-kubevirt-large (16Gi). Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Disable logging output in CI test runs

a240d4e

Use -p no:logging to suppress all log output in CI, preventing DEBUG/INFO/WARNING/ERROR noise from polluting test logs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mangelajo force-pushed the kubevirt-runners branch from 400dc29 to a240d4e Compare June 19, 2026 12:42

mangelajo and others added 3 commits June 19, 2026 15:03

Revert logging suppression — tests use caplog

d654f38

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Suppress log noise in tests with --log-level=CRITICAL

07dbbed

Prevents DEBUG/INFO/WARNING/ERROR log output from polluting test output while keeping the logging plugin active for caplog tests. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Skip apt-get install when CI dependencies are pre-baked

dfc6182

Check for each binary/package before installing. On golden images with everything pre-installed, this skips apt-get entirely. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mangelajo changed the title ~~ci: use self-hosted KubeVirt runners for amd64 jobs~~ ci: optimize self-hosted KubeVirt runners and CI pipeline Jun 19, 2026

mangelajo and others added 7 commits June 19, 2026 15:33

Increase Renode monitor connect timeout from 10s to 45s

10f2849

On smaller VMs under parallel test load, Renode can take longer than 10s to start up and bind its monitor port, causing DEADLINE_EXCEEDED. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Use tee for per-package test logs so output is visible during the run

7d2a217

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

mangelajo and others added 5 commits June 19, 2026 17:17

Set PYTHONUNBUFFERED=1 in test log pipeline for real-time output

05e381a

Without this, Python buffers stdout when piped through tee, hiding print() diagnostics until the test finishes or hangs. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Add extraction progress logging to uboot test fixture

ba70ce5

RPM extraction hangs silently on resource-constrained VMs — add prints before and after to pinpoint where it stalls. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Install rpm2cpio and cpio if missing for uboot test extraction

af71296

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Fix py.path.local stat call in uboot fixture — use .size() not .stat(…

bb1eb49

…).st_size Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ci: optimize self-hosted KubeVirt runners and CI pipeline#787

ci: optimize self-hosted KubeVirt runners and CI pipeline#787
mangelajo wants to merge 29 commits into
mainfrom
kubevirt-runners

mangelajo commented Jun 12, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

kirkbrauer left a comment

Uh oh!

mangelajo commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mangelajo commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Test plan

Uh oh!

coderabbitai Bot commented Jun 12, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review skipped

Walkthrough

Changes

Estimated code review effort

Poem

Uh oh!

kirkbrauer left a comment

Choose a reason for hiding this comment

Uh oh!

mangelajo commented Jun 15, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

mangelajo commented Jun 12, 2026 •

edited

Loading

coderabbitai Bot commented Jun 12, 2026 •

edited

Loading