Skip to content

test(qemu): replace Fedora Cloud image with Alpine Linux tiny image#809

Draft
mangelajo wants to merge 1 commit into
mainfrom
use-alpine-image-for-qemu-tests
Draft

test(qemu): replace Fedora Cloud image with Alpine Linux tiny image#809
mangelajo wants to merge 1 commit into
mainfrom
use-alpine-image-for-qemu-tests

Conversation

@mangelajo

@mangelajo mangelajo commented Jun 18, 2026

Copy link
Copy Markdown
Member

Motivation

Fedora Cloud qcow2 images used in QEMU driver tests weigh ~556 MB (x86_64) + ~519 MB (aarch64) = ~1.1 GB total, taking ~1 minute to download on every CI run. Alpine Linux 3.22.4 UEFI tiny images reduce that to ~278 MB total — a 74% reduction — and download in ~6 s on GitHub Actions runners, making a cache unnecessary.

Fedora 43 Alpine 3.22.4 tiny Reduction
x86_64 556 MB 127 MB 77%
aarch64 519 MB 151 MB 71%
Total ~1,076 MB ~278 MB 74%

Alpine's nocloud tiny images support UEFI boot on both architectures and use tiny-cloud (a minimal cloud-init alternative) that reads the standard NoCloud CIDATA vfat volume the QEMU driver already generates.

Driver changes (driver.py)

tiny-cloud differs from full cloud-init in two ways:

  1. Hostname: tiny-cloud reads @hostname from the YAML key hostname in meta-data, not local-hostname. Add hostname alongside local-hostname so both implementations set it correctly.

  2. Password: tiny-cloud ignores plain_text_passwd in the users stanza. Add a runcmd entry to set the password via chpasswd. Credentials are escaped with shlex.quote() so special characters are safe. Full cloud-init images are unaffected.

hostport=0 support: Hostfwd now allows hostport=0, which tells QEMU's user-mode networking to pick a free port automatically. After QEMU starts, the driver queries the actual assigned port via QMP human-monitor-command info usernet, logs it at INFO level:

hostfwd 'ssh': resolved port 0 -> 127.0.0.1:<port> (guest port 22)

and stores it in _resolved_hostports so get_hostfwd_port() can return it to the client side.

Client changes (client.py)

shell() now checks for an ssh hostfwd entry by calling get_hostfwd_port('ssh'). If present, it opens a direct fabric.Connection to 127.0.0.1:<resolved_port>, bypassing the jumpstarter streaming layer entirely. If no ssh hostfwd is configured (KeyError), it falls back to the original FabricAdapter path over vsock.

This is necessary because Alpine uses OpenRC, not systemd, so systemd-ssh-generator never runs and sshd only listens on TCP. Fedora cloud images ship systemd-ssh-generator (systemd ≥ 256) which automatically binds sshd to AF_VSOCK port 22 inside VMs — that is why the original vsock path worked with Fedora but not Alpine.

Test changes (driver_test.py)

  • Pass hostfwd={'ssh': {'hostport': 0, 'guestport': 22}} so qemu.shell() uses the TCP path with a QEMU-assigned port; no probe socket needed.
  • Wait for bootstrap_complete: done on the serial console before attempting login, guaranteeing tiny-cloud has set the password and sshd is ready (avoids the race that existed with Fedora too, where the extra setenforce console round-trip happened to provide enough delay).
  • Press Enter if the GRUB countdown appears to skip the 10 s wait.
  • Update post-login prompt from bash [user@host ~]$ → Alpine ash host:~$.
  • Remove sudo setenforce 0 (no SELinux on Alpine).
  • Relax uname -r assertion to a non-empty check.

CI changes (python-tests.yaml)

Remove the actions/cache steps entirely. Alpine images download in ~6 s on GitHub runners (measured from CI logs), which is faster than cache restore overhead for files this size.

@mangelajo mangelajo requested a review from raballew June 18, 2026 10:36
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

Warning

Review limit reached

@mangelajo, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 18 minutes and 36 seconds. Learn how PR review limits work.

Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file).

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits.

🚦 How do rate limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 720f4279-6480-428a-ab0a-72658840bcf0

📥 Commits

Reviewing files that changed from the base of the PR and between 83153d4 and 9a25775.

📒 Files selected for processing (4)
  • .github/workflows/python-tests.yaml
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py
📝 Walkthrough

Walkthrough

Migrates the QEMU integration test from Fedora Cloud to Alpine 3.22.4 UEFI tiny while introducing dynamic SSH forwarding port resolution. The driver's cidata() gains hostname and password-initialization support via chpasswd runcmd for Alpine compatibility. Host port resolution is added via QEMU info usernet parsing, allowing hostport=0 to request ephemeral port assignment. The SSH client switches to direct TCP connections using resolved ports. The test uses port-0 ephemeral forwarding, Alpine images with cache fallback, and GRUB-aware console automation. The CI workflow downloads Alpine artifacts.

Changes

Alpine Migration with Dynamic SSH Port Resolution

Layer / File(s) Summary
Hostfwd port-0 support and dynamic resolution
python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
Hostfwd.hostport validation relaxed to allow 0 for ephemeral assignment. Qemu._resolved_hostports dict field added to store actual assigned ports. QemuPower.on() parses QEMU info usernet to resolve port-0 entries. Qemu.get_hostfwd_port() consults resolved mapping before configured values.
Cloud-init Alpine compatibility: hostname and chpasswd runcmd
python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
shlex import added for safe credential quoting. cidata() meta-data gains hostname key from self.hostname. user-data gains runcmd that pipes username:password into chpasswd to initialize user password at boot for Alpine cloud-init.
SSH client connection via direct TCP forwarding
python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py
fabric.Connection imported. QemuClient.shell() prefers direct TCP connection to 127.0.0.1:<port> after resolving SSH hostfwd port; falls back to FabricAdapter tunneling on KeyError.
Test: Alpine image and ephemeral SSH port
python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py
Configures hostfwd with hostport=0 for ephemeral SSH port selection. Switches flashed image from Fedora Cloud qcow2 to Nocloud Alpine 3.22.4 qcow2 with cached-image-first fallback to CDN download. Adds GRUB countdown detection (sends Enter on "automatically in …"); removes SELinux setenforce 0 and exact kernel version assertion; replaces with non-empty uname -r validation after login.
CI: Alpine cloud image caching and download
.github/workflows/python-tests.yaml
Replaces Fedora Cloud cache/download steps with Alpine 3.22.4-uefi-tiny cache key and downloads nocloud_alpine-3.22.4-${arch}-uefi-tiny-r0.qcow2 for aarch64 and x86_64 from Alpine CDN; cache save excluded from pull requests.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Possibly related PRs

  • jumpstarter-dev/jumpstarter#321: Originally introduced the QEMU driver and integration test scaffolding that this PR extends with hostfwd port resolution and Alpine cloud-init compatibility.
  • jumpstarter-dev/jumpstarter#402: Previously implemented hostfwd support; this PR extends it by allowing port-0 and adding runtime port resolution from QEMU's usernet state.

Poem

🐇 From Fedora's shore I bid farewell,
To Alpine's peaks where port-zero dwell.
Through GRUB's countdown, a keypress sent,
With fabric's TCP, SSH's ascent.
chpasswd whispers the password's song,
And dynamic ports guide users along! 🏔️

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 12.50% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately describes the main change: replacing Fedora Cloud images with Alpine Linux tiny images in QEMU driver tests.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, explaining motivation, image replacement details, driver changes, test changes, and SSH connectivity updates.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch use-alpine-image-for-qemu-tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In @.github/workflows/python-tests.yaml:
- Around line 123-129: The combined actions/cache@v5 action for the Alpine cloud
images step creates a cache poisoning risk because it saves artifacts from pull
request events to a shared static cache key that's used across all branches.
Split this into two separate steps: one using actions/cache/restore@v5 that runs
on all events including pull requests, and another using actions/cache/save@v5
that only runs on trusted events (add a condition like if: github.event_name !=
'pull_request'). Apply this to the cache step with the key
alpine-3.22.4-uefi-tiny and path python/packages/jumpstarter-driver-qemu/images
to ensure the cache is restored in PRs but only saved by trusted workflows.

In
`@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py`:
- Around line 64-69: The current port allocation in the socket.bind() call at
lines 66-68 creates a race condition where the port is released immediately
after reservation, allowing another process to claim it before QEMU attempts to
bind for hostfwd. Implement a retry mechanism that attempts to reserve a port
with socket.bind() and then immediately use it with QEMU; if the bind fails due
to address already in use, catch the exception and retry with a new port
allocation. Repeat this process until a port is successfully reserved and used
without collision.

In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`:
- Around line 536-538: The `runcmd` entry is constructing a shell command by
directly interpolating `self.username` and `self.password` into a single-quoted
string, which can be broken or exploited if these values contain single quotes,
newlines, or other special characters. Use Python's `shlex.quote()` function to
properly escape both `self.username` and `self.password` values before inserting
them into the shell command string in the runcmd list. This ensures that any
special characters in the credentials are safely handled as literal values
rather than being interpreted as shell syntax.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 1c4c2088-1fb5-49cd-8495-779f5d8a927e

📥 Commits

Reviewing files that changed from the base of the PR and between ada5fa8 and 23a0c05.

📒 Files selected for processing (3)
  • .github/workflows/python-tests.yaml
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py

Comment thread .github/workflows/python-tests.yaml Outdated
Comment thread python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py Outdated
@mangelajo mangelajo force-pushed the use-alpine-image-for-qemu-tests branch from 0001659 to e7cceca Compare June 18, 2026 11:56
@mangelajo mangelajo changed the title test: replace Fedora Cloud image with Alpine tiny image in QEMU tests test(qemu): replace Fedora Cloud image with Alpine Linux tiny image Jun 18, 2026
@mangelajo mangelajo force-pushed the use-alpine-image-for-qemu-tests branch 4 times, most recently from cb01714 to 8e61e54 Compare June 18, 2026 12:15

@raballew raballew left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On a high level, I don't like hardcoding this in so many different places plus we are completely missing out the renovate process here. I think we will need to ensure that this image gets constantly updated as well.

@mangelajo mangelajo force-pushed the use-alpine-image-for-qemu-tests branch 3 times, most recently from b2962b3 to 83153d4 Compare June 18, 2026 12:33
## Motivation

Fedora Cloud qcow2 images used in QEMU driver tests weigh ~556 MB
(x86_64) + ~519 MB (aarch64) = ~1.1 GB total, taking ~1 minute to
download on every CI run. Alpine Linux 3.22.4 UEFI tiny images reduce
that to ~127 MB + ~151 MB = ~278 MB — a 74% reduction — and download
in ~6 s on GitHub Actions runners, making a cache unnecessary.

## Image replacement

  x86_64:  556 MB → 127 MB  (77%)
  aarch64: 519 MB → 151 MB  (71%)
  Total:  ~1076 MB → ~278 MB (74%)

Alpine's nocloud tiny images support UEFI boot on both architectures
and use tiny-cloud (a minimal cloud-init alternative) that reads the
standard NoCloud CIDATA vfat volume the QEMU driver already generates.

## Driver changes (driver.py)

tiny-cloud differs from full cloud-init in two ways:

1. Hostname: tiny-cloud reads @hostname from the YAML key 'hostname'
   in meta-data, not 'local-hostname'. Add 'hostname' alongside
   'local-hostname' so both implementations set it correctly.

2. Password: tiny-cloud ignores plain_text_passwd in the users stanza.
   Add a runcmd entry to set the password via chpasswd. Credentials are
   escaped with shlex.quote() so special characters are safe.
   Full cloud-init images are unaffected.

hostport=0 support: Hostfwd now allows hostport=0, which tells QEMU's
user-mode networking to pick a free port automatically. After QEMU
starts, the driver queries the actual assigned port via QMP
'human-monitor-command info usernet', logs it at INFO level:

  hostfwd 'ssh': resolved port 0 -> 127.0.0.1:<port> (guest port 22)

and stores it in _resolved_hostports so get_hostfwd_port() can return
it to the client side.

## Client changes (client.py)

shell() now checks for an 'ssh' hostfwd entry by calling
get_hostfwd_port('ssh'). If present, it opens a direct fabric
Connection to 127.0.0.1:<resolved_port>, bypassing the jumpstarter
streaming layer entirely. If no ssh hostfwd is configured (KeyError),
it falls back to the original FabricAdapter path over vsock.

This is necessary because Alpine uses OpenRC, not systemd, so
systemd-ssh-generator never runs and sshd only listens on TCP.
Fedora cloud images ship systemd-ssh-generator (systemd >= 256) which
automatically binds sshd to AF_VSOCK port 22 inside VMs — that is why
the original vsock path worked with Fedora but not Alpine.

## Test changes (driver_test.py)

- Pass hostfwd={'ssh': {hostport=0, guestport=22}} so qemu.shell()
  uses the TCP path with a QEMU-assigned port; no probe socket needed.
- Wait for 'bootstrap_complete: done' on the serial console before
  attempting login, guaranteeing tiny-cloud has set the password and
  sshd is ready (avoids the race that existed with Fedora too, where
  the extra setenforce console round-trip happened to provide enough
  delay).
- Press Enter if the GRUB countdown appears to skip the 10 s wait.
- Update post-login prompt from bash '[user@host ~]$' to Alpine ash
  'host:~$'.
- Remove 'sudo setenforce 0' (no SELinux on Alpine).
- Relax 'uname -r' assertion to a non-empty check.

## CI changes (python-tests.yaml)

Remove the actions/cache steps entirely. Alpine images download in ~6 s
on GitHub runners (measured), which is faster than cache restore
overhead for files this size.
@mangelajo mangelajo force-pushed the use-alpine-image-for-qemu-tests branch from 83153d4 to 9a25775 Compare June 18, 2026 12:37
@mangelajo

Copy link
Copy Markdown
Member Author

On a high level, I don't like hardcoding this in so many different places plus we are completely missing out the renovate process here. I think we will need to ensure that this image gets constantly updated as well.

no real need for this, this does not go into production or anything, is just an image that we boot to run the tests. But that's an existing problem from the past, nothing new.

run: |
brew install renode/tap/renode

- name: Cache Fedora Cloud images

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the cache was actually slower than the download.

# If the driver has an 'ssh' hostfwd entry, fetch the actual host port
# (resolving any port=0 assignment) and connect directly over TCP.
# Otherwise fall back to tunnelling through the jumpstarter stream (vsock).
try:

@mangelajo mangelajo Jun 18, 2026

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we had to implement this because alpine does not setup ssh over vhost, and we didn't want to pick a port that could potentially conflict with whatever is on the testing host.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py`:
- Around line 82-97: The try/except block in the context manager is too broad
and will catch KeyError exceptions raised by the caller's code inside the with
block, causing unintended fallback behavior. Restructure this using
try/except/else pattern: wrap only the self.call("get_hostfwd_port", "ssh") call
in the try block, move the Connection creation and yield statement to an else
block (which executes only on success), and keep the FabricAdapter fallback in
the except KeyError block. This ensures that only KeyError from the
get_hostfwd_port call triggers the fallback, while caller errors are properly
propagated.
- Line 7: The hardcoded Connection(host="127.0.0.1") usage in the SSH shell
connection block (around lines 84-90) ignores the custom Hostfwd.hostaddr
setting and breaks for non-local clients. Replace this hardcoded approach by
resolving into the ssh child which is a TcpNetwork instance at line 489 that
already contains the correct hostaddr and port configuration. Reference the
FabricAdapter fallback pattern at lines 92-97 as an example of the correct
approach for properly tunneling through jumpstarter's stream instead of creating
direct connections to 127.0.0.1.

In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`:
- Around line 385-403: After the for loop that processes usernet lines in the
hostfwd resolution block, add validation to ensure all zero-port entries were
successfully resolved. Create a set of unresolved ports by checking which
entries in zero_fwds are not present in self.parent._resolved_hostports, and
raise a RuntimeError with the list of missing keys if any are found.
Additionally, in the get_hostfwd_port() method (not shown in this diff), add a
check that raises RuntimeError if the hostport value equals 0, preventing silent
failures when unresolved ports are returned to callers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6d8109b7-9f5f-4cf2-bbc8-e2b176dad19a

📥 Commits

Reviewing files that changed from the base of the PR and between fcc4776 and 83153d4.

📒 Files selected for processing (4)
  • .github/workflows/python-tests.yaml
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py
💤 Files with no reviewable changes (1)
  • .github/workflows/python-tests.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
  • python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py

from contextlib import contextmanager

import click
from fabric import Connection

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

fd jumpstarter-driver-qemu -type d

Repository: jumpstarter-dev/jumpstarter

Length of output: 242


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py | head -100

Repository: jumpstarter-dev/jumpstarter

Length of output: 4055


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py | tail -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 913


🏁 Script executed:

grep -n "Connection\|127.0.0.1\|Hostfwd" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 172


🏁 Script executed:

fd jumpstarter-driver-network -type d

Repository: jumpstarter-dev/jumpstarter

Length of output: 242


🏁 Script executed:

find . -name "*jumpstarter-driver-network*" -type d 2>/dev/null | head -5

Repository: jumpstarter-dev/jumpstarter

Length of output: 116


🏁 Script executed:

find . -name "FabricAdapter*" -o -path "*/jumpstarter_driver_network/*" -name "*.py" 2>/dev/null | head -10

Repository: jumpstarter-dev/jumpstarter

Length of output: 979


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-network/jumpstarter_driver_network/adapters/fabric.py | head -60

Repository: jumpstarter-dev/jumpstarter

Length of output: 1419


🏁 Script executed:

grep -n "Hostfwd\|hostaddr" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | head -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 884


🏁 Script executed:

find . -name "*.py" -type f | xargs grep -l "Hostfwd\|hostaddr" 2>/dev/null | head -10

Repository: jumpstarter-dev/jumpstarter

Length of output: 228


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | grep -A 5 -B 5 "hostfwd\|Hostfwd" | head -50

Repository: jumpstarter-dev/jumpstarter

Length of output: 2610


🏁 Script executed:

grep -n "TcpNetwork\|get_hostfwd_port" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 398


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '571,590p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 871


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '485,495p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 443


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '455,475p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1112


🏁 Script executed:

grep -B5 -A10 "def get_hostfwd_port" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 604


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py | sed -n '78,98p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1024


🏁 Script executed:

grep -n "export\|get_hostfwd_port\|get_hostaddr" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 480


🏁 Script executed:

grep -n "def get_\|`@export`" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | grep -A1 "def get_"

Repository: jumpstarter-dev/jumpstarter

Length of output: 305


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '569,610p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 1847


🏁 Script executed:

grep -B5 -A10 "def get_hostfwd" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 604


🏁 Script executed:

cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '431,436p'

Repository: jumpstarter-dev/jumpstarter

Length of output: 327


Use the network child for SSH shell connections instead of hardcoding 127.0.0.1.

The Connection(host="127.0.0.1") at lines 84-90 ignores the custom Hostfwd.hostaddr setting and fails for non-local clients. When the client runs on a different machine, it connects to the client's loopback instead of the server's configured host address. The proper path is to resolve into the ssh child (a TcpNetwork instance at line 489) which already has the correct hostaddr and port. The fallback pattern using FabricAdapter at lines 92-97 demonstrates the correct approach for tunneling through jumpstarter's stream.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py` at
line 7, The hardcoded Connection(host="127.0.0.1") usage in the SSH shell
connection block (around lines 84-90) ignores the custom Hostfwd.hostaddr
setting and breaks for non-local clients. Replace this hardcoded approach by
resolving into the ssh child which is a TcpNetwork instance at line 489 that
already contains the correct hostaddr and port configuration. Reference the
FabricAdapter fallback pattern at lines 92-97 as an example of the correct
approach for properly tunneling through jumpstarter's stream instead of creating
direct connections to 127.0.0.1.

Comment on lines +82 to +97
try:
port = int(self.call("get_hostfwd_port", "ssh"))
with Connection(
host="127.0.0.1",
port=port,
user=self.username,
connect_kwargs={"password": self.password},
) as conn:
yield conn
except KeyError:
with FabricAdapter(
client=self.ssh,
user=self.username,
connect_kwargs={"password": self.password},
) as conn:
yield conn

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

# First, let's find and examine the file
find . -type f -name "client.py" | grep jumpstarter-driver-qemu

Repository: jumpstarter-dev/jumpstarter

Length of output: 147


🏁 Script executed:

# Read the client.py file to see the full context
cat -n "python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py" | head -150

Repository: jumpstarter-dev/jumpstarter

Length of output: 4941


🏁 Script executed:

# Check the imports to see if contextmanager is imported
head -20 "python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py"

Repository: jumpstarter-dev/jumpstarter

Length of output: 618


Narrow the try block to avoid catching caller KeyErrors in this context manager.

In a @contextmanager, exceptions from the caller's with body resume at the yield statement. A caller that raises KeyError inside the context will be caught by the current broad try block, triggering the fallback path unintentionally and potentially masking the original error. Move the except handler to wrap only the get_hostfwd_port call using try/except/else:

Proposed fix
+        try:
+            port = int(self.call("get_hostfwd_port", "ssh"))
+        except KeyError:
+            with FabricAdapter(
+                client=self.ssh,
+                user=self.username,
+                connect_kwargs={"password": self.password},
+            ) as conn:
+                yield conn
+        else:
             with Connection(
                 host="127.0.0.1",
                 port=port,
                 user=self.username,
                 connect_kwargs={"password": self.password},
             ) as conn:
                 yield conn
-        except KeyError:
-            with FabricAdapter(
-                client=self.ssh,
-                user=self.username,
-                connect_kwargs={"password": self.password},
-            ) as conn:
-                yield conn
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py`
around lines 82 - 97, The try/except block in the context manager is too broad
and will catch KeyError exceptions raised by the caller's code inside the with
block, causing unintended fallback behavior. Restructure this using
try/except/else pattern: wrap only the self.call("get_hostfwd_port", "ssh") call
in the try block, move the Connection creation and yield statement to an else
block (which executes only on success), and keep the FabricAdapter fallback in
the except KeyError block. This ensures that only KeyError from the
get_hostfwd_port call triggers the fallback, while caller errors are properly
propagated.

Comment on lines +385 to +403
# Resolve any hostport=0 hostfwd entries to the actual port QEMU chose.
# Parse 'info usernet': lines look like "TCP[HOST_FORWARD] fd addr port addr port ..."
# Store resolved ports on the parent so get_hostfwd_port() can return them to clients.
zero_fwds = {k: v for k, v in self.parent.hostfwd.items() if v.hostport == 0}
if zero_fwds:
usernet = await qmp.execute("human-monitor-command", {"command-line": "info usernet"})
self.logger.debug("info usernet output:\n%s", usernet)
for line in usernet.splitlines():
parts = line.split()
if len(parts) >= 6 and "HOST_FORWARD" in parts[0]:
# parts: Protocol[State] fd hostaddr hostport guestaddr guestport ...
actual_hostaddr, actual_hostport, actual_guestport = parts[2], int(parts[3]), int(parts[5])
for k, v in zero_fwds.items():
if v.hostaddr == actual_hostaddr and v.guestport == actual_guestport:
self.logger.info(
"hostfwd '%s': resolved port 0 -> %s:%d (guest port %d)",
k, actual_hostaddr, actual_hostport, actual_guestport,
)
self.parent._resolved_hostports[k] = actual_hostport

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

🧩 Analysis chain

🏁 Script executed:

find . -type f -name "driver.py" | grep jumpstarter-driver-qemu

Repository: jumpstarter-dev/jumpstarter

Length of output: 147


🏁 Script executed:

wc -l python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 149


🏁 Script executed:

sed -n '380,410p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 1849


🏁 Script executed:

sed -n '560,580p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 765


🏁 Script executed:

sed -n '430,440p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 382


🏁 Script executed:

sed -n '460,470p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py

Repository: jumpstarter-dev/jumpstarter

Length of output: 553


🏁 Script executed:

rg "_resolved_hostports" python/packages/jumpstarter-driver-qemu/

Repository: jumpstarter-dev/jumpstarter

Length of output: 623


🏁 Script executed:

rg "get_hostfwd_port" python/packages/jumpstarter-driver-qemu/

Repository: jumpstarter-dev/jumpstarter

Length of output: 498


🏁 Script executed:

rg "get_hostfwd_port" --type py

Repository: jumpstarter-dev/jumpstarter

Length of output: 498


Fail fast when a dynamic hostfwd port is unresolved.

Currently, if info usernet parsing misses a hostport=0 entry, get_hostfwd_port() returns 0 to the caller, causing SSH and other services to fail silently on an invalid port. Validate that every zero-port entry resolves during VM startup, and explicitly reject unresolved ports in get_hostfwd_port().

The proposed fix is necessary and should be applied:

  • Accumulate resolved ports in a local dict and validate all zero_fwds entries are resolved; raise RuntimeError with the list of missing keys if any are unresolved.
  • In get_hostfwd_port(), check if hostport == 0 and raise RuntimeError instead of returning the invalid port.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`
around lines 385 - 403, After the for loop that processes usernet lines in the
hostfwd resolution block, add validation to ensure all zero-port entries were
successfully resolved. Create a set of unresolved ports by checking which
entries in zero_fwds are not present in self.parent._resolved_hostports, and
raise a RuntimeError with the list of missing keys if any are found.
Additionally, in the get_hostfwd_port() method (not shown in this diff), add a
check that raises RuntimeError if the hostport value equals 0, preventing silent
failures when unresolved ports are returned to callers.

@mangelajo mangelajo requested a review from raballew June 18, 2026 12:44
@mangelajo mangelajo marked this pull request as draft June 19, 2026 10:31
@mangelajo

Copy link
Copy Markdown
Member Author

I need to review the qemu/ssh changes more carefully

Comment on lines +86 to +91
port=port,
user=self.username,
connect_kwargs={"password": self.password},
) as conn:
yield conn
except KeyError:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay this one is a little backwards but hear me out: I think shell() method catches KeyError to fall back from tcp to vsock, but this never actually works. On the driver side, get_hostfwd_port() raises KeyError, but the gRPC transport catches all exceptions and reraises them as DriverError via StatusCode.UNKNOWN but DriverError doesn't inherit from KeyError so VMs without SSH hostfwd will crash with an unhandled DriverError instead of falling back to vsock.

Comment on lines 382 to 407
Path(self.parent._pty).unlink(missing_ok=True)
Path(self.parent._pty).symlink_to(pty)

# Resolve any hostport=0 hostfwd entries to the actual port QEMU chose.
# Parse 'info usernet': lines look like "TCP[HOST_FORWARD] fd addr port addr port ..."
# Store resolved ports on the parent so get_hostfwd_port() can return them to clients.
zero_fwds = {k: v for k, v in self.parent.hostfwd.items() if v.hostport == 0}
if zero_fwds:
usernet = await qmp.execute("human-monitor-command", {"command-line": "info usernet"})
self.logger.debug("info usernet output:\n%s", usernet)
for line in usernet.splitlines():
parts = line.split()
if len(parts) >= 6 and "HOST_FORWARD" in parts[0]:
# parts: Protocol[State] fd hostaddr hostport guestaddr guestport ...
actual_hostaddr, actual_hostport, actual_guestport = parts[2], int(parts[3]), int(parts[5])
for k, v in zero_fwds.items():
if v.hostaddr == actual_hostaddr and v.guestport == actual_guestport:
self.logger.info(
"hostfwd '%s': resolved port 0 -> %s:%d (guest port %d)",
k, actual_hostaddr, actual_hostport, actual_guestport,
)
self.parent._resolved_hostports[k] = actual_hostport

await qmp.execute("system_reset")
await qmp.disconnect()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A ValueError from malformed output or a QMP error leaves the connection open and the QEMU process orphaned. You could wrap the QMP operation block in try/finally to guarantee await qmp.disconnect() runs.

Comment on lines 431 to 437
class Hostfwd(BaseModel):
protocol: Literal["tcp"] = "tcp"
hostaddr: str = "127.0.0.1"
hostport: int = Field(ge=1, le=65535)
hostport: int = Field(ge=0, le=65535) # 0 = let QEMU pick a free port
guestport: int = Field(ge=1, le=65535)


Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If on() fails after process creation but before usernet resolution completes, get_hostfwd_port() returns a stale port. I think adding self.parent._resolved_hostports.clear() at the beginning of on() before starting the new QEMU process, and also in off() alongside the existing cleanup should resolve this.

protocol: Literal["tcp"] = "tcp"
hostaddr: str = "127.0.0.1"
hostport: int = Field(ge=1, le=65535)
hostport: int = Field(ge=0, le=65535) # 0 = let QEMU pick a free port

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] hostaddr: str = "127.0.0.1" has no format validation. The value is interpolated unsanitized into a comma-separated QEMU -netdev argument. Since QEMU uses commas to delimit netdev options, a crafted hostaddr like 127.0.0.1,hostfwd=tcp:0.0.0.0:4444-:4444 would inject an additional hostfwd rule. The hostport and guestport fields have Field(ge=..., le=...) constraints, but hostaddr accepts arbitrary strings.

Suggested fix: add a Pydantic field_validator using ipaddress.ip_address() to ensure hostaddr is a valid IP address.

AI-generated, human reviewed

Comment on lines +571 to +577
def get_hostfwd_port(self, key: str) -> int:
"""Return the actual host port for a hostfwd entry (resolves port 0 assignments)."""
if key in self._resolved_hostports:
return self._resolved_hostports[key]
if key in self.hostfwd:
return self.hostfwd[key].hostport
raise KeyError(f"hostfwd key {key!r} not found")

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[HIGH] get_hostfwd_port() is a new exported method with three code paths (resolved port, configured port, KeyError), and the info usernet parsing logic uses positional string splitting with int(parts[3]) and int(parts[5]). Neither has any unit test. The only coverage is the full integration test that boots a real VM. Extracting the parsing logic into a standalone function (e.g., _parse_usernet_hostfwd(usernet_output, zero_fwds)) would make it straightforward to add parameterized unit tests covering all three get_hostfwd_port() code paths as well as edge cases in the usernet output format.

AI-generated, human reviewed

Comment on lines 461 to 468
flash_timeout: int = field(default=30 * 60) # 30 minutes

_tmp_dir: TemporaryDirectory = field(init=False, default_factory=TemporaryDirectory)
# Maps hostfwd key -> actual host port after QEMU resolves port 0 assignments
_resolved_hostports: dict[str, int] = field(init=False, default_factory=dict)

@classmethod
def client(cls) -> str:

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Defer TcpNetwork child creation for hostport=0 entries until after port resolution in on(), or update the child's port after resolution. Use a different key namespace for hostfwd children to avoid overwriting vsock children.

Comment on lines +392 to +403
for line in usernet.splitlines():
parts = line.split()
if len(parts) >= 6 and "HOST_FORWARD" in parts[0]:
# parts: Protocol[State] fd hostaddr hostport guestaddr guestport ...
actual_hostaddr, actual_hostport, actual_guestport = parts[2], int(parts[3]), int(parts[5])
for k, v in zero_fwds.items():
if v.hostaddr == actual_hostaddr and v.guestport == actual_guestport:
self.logger.info(
"hostfwd '%s': resolved port 0 -> %s:%d (guest port %d)",
k, actual_hostaddr, actual_hostport, actual_guestport,
)
self.parent._resolved_hostports[k] = actual_hostport

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The info usernet parser uses int(parts[3]) and int(parts[5]) without try/except. A malformed line would raise `ValueError

from contextlib import contextmanager

import click
from fabric import Connection

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

from fabric import Connection is a new import added by this PR, but fabric is not listed in dependencies.

Comment on lines +86 to +97
port=port,
user=self.username,
connect_kwargs={"password": self.password},
) as conn:
yield conn
except KeyError:
with FabricAdapter(
client=self.ssh,
user=self.username,
connect_kwargs={"password": self.password},
) as conn:
yield conn

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Narrow the try/except to only wrap the self.call("get_hostfwd_port", "ssh") call, then branch on the result.

Comment on lines 110 to +111
with qemu.shell() as s:
assert s.run("uname -r").stdout.strip() == f"6.17.1-300.fc43.{arch}"
assert s.run("uname -r").stdout.strip() != ""

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also assert the return code, e.g., result = s.run("uname -r"); assert result.ok; assert result.stdout.strip() != "".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants