test(qemu): replace Fedora Cloud image with Alpine Linux tiny image#809
test(qemu): replace Fedora Cloud image with Alpine Linux tiny image#809mangelajo wants to merge 1 commit into
Conversation
|
Warning Review limit reached
More reviews will be available in 18 minutes and 36 seconds. Learn how PR review limits work. Your organization has used up its prepaid credits, and credit purchases are no longer available. Enable the review add-on in the billing tab to keep reviews running — you're only billed for reviews past your plan's rate limits ($0.25/file). ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based credits. 🚦 How do rate limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan refill rate. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, the refill rate gradually slows as usage increases. The highest same-day bursts are limited more strictly. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (4)
📝 WalkthroughWalkthroughMigrates the QEMU integration test from Fedora Cloud to Alpine 3.22.4 UEFI tiny while introducing dynamic SSH forwarding port resolution. The driver's ChangesAlpine Migration with Dynamic SSH Port Resolution
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Possibly related PRs
Poem
🚥 Pre-merge checks | ✅ 4 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (4 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In @.github/workflows/python-tests.yaml:
- Around line 123-129: The combined actions/cache@v5 action for the Alpine cloud
images step creates a cache poisoning risk because it saves artifacts from pull
request events to a shared static cache key that's used across all branches.
Split this into two separate steps: one using actions/cache/restore@v5 that runs
on all events including pull requests, and another using actions/cache/save@v5
that only runs on trusted events (add a condition like if: github.event_name !=
'pull_request'). Apply this to the cache step with the key
alpine-3.22.4-uefi-tiny and path python/packages/jumpstarter-driver-qemu/images
to ensure the cache is restored in PRs but only saved by trusted workflows.
In
`@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py`:
- Around line 64-69: The current port allocation in the socket.bind() call at
lines 66-68 creates a race condition where the port is released immediately
after reservation, allowing another process to claim it before QEMU attempts to
bind for hostfwd. Implement a retry mechanism that attempts to reserve a port
with socket.bind() and then immediately use it with QEMU; if the bind fails due
to address already in use, catch the exception and retry with a new port
allocation. Repeat this process until a port is successfully reserved and used
without collision.
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`:
- Around line 536-538: The `runcmd` entry is constructing a shell command by
directly interpolating `self.username` and `self.password` into a single-quoted
string, which can be broken or exploited if these values contain single quotes,
newlines, or other special characters. Use Python's `shlex.quote()` function to
properly escape both `self.username` and `self.password` values before inserting
them into the shell command string in the runcmd list. This ensures that any
special characters in the credentials are safely handled as literal values
rather than being interpreted as shell syntax.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 1c4c2088-1fb5-49cd-8495-779f5d8a927e
📒 Files selected for processing (3)
.github/workflows/python-tests.yamlpython/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pypython/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py
0001659 to
e7cceca
Compare
cb01714 to
8e61e54
Compare
raballew
left a comment
There was a problem hiding this comment.
On a high level, I don't like hardcoding this in so many different places plus we are completely missing out the renovate process here. I think we will need to ensure that this image gets constantly updated as well.
b2962b3 to
83153d4
Compare
## Motivation Fedora Cloud qcow2 images used in QEMU driver tests weigh ~556 MB (x86_64) + ~519 MB (aarch64) = ~1.1 GB total, taking ~1 minute to download on every CI run. Alpine Linux 3.22.4 UEFI tiny images reduce that to ~127 MB + ~151 MB = ~278 MB — a 74% reduction — and download in ~6 s on GitHub Actions runners, making a cache unnecessary. ## Image replacement x86_64: 556 MB → 127 MB (77%) aarch64: 519 MB → 151 MB (71%) Total: ~1076 MB → ~278 MB (74%) Alpine's nocloud tiny images support UEFI boot on both architectures and use tiny-cloud (a minimal cloud-init alternative) that reads the standard NoCloud CIDATA vfat volume the QEMU driver already generates. ## Driver changes (driver.py) tiny-cloud differs from full cloud-init in two ways: 1. Hostname: tiny-cloud reads @hostname from the YAML key 'hostname' in meta-data, not 'local-hostname'. Add 'hostname' alongside 'local-hostname' so both implementations set it correctly. 2. Password: tiny-cloud ignores plain_text_passwd in the users stanza. Add a runcmd entry to set the password via chpasswd. Credentials are escaped with shlex.quote() so special characters are safe. Full cloud-init images are unaffected. hostport=0 support: Hostfwd now allows hostport=0, which tells QEMU's user-mode networking to pick a free port automatically. After QEMU starts, the driver queries the actual assigned port via QMP 'human-monitor-command info usernet', logs it at INFO level: hostfwd 'ssh': resolved port 0 -> 127.0.0.1:<port> (guest port 22) and stores it in _resolved_hostports so get_hostfwd_port() can return it to the client side. ## Client changes (client.py) shell() now checks for an 'ssh' hostfwd entry by calling get_hostfwd_port('ssh'). If present, it opens a direct fabric Connection to 127.0.0.1:<resolved_port>, bypassing the jumpstarter streaming layer entirely. If no ssh hostfwd is configured (KeyError), it falls back to the original FabricAdapter path over vsock. This is necessary because Alpine uses OpenRC, not systemd, so systemd-ssh-generator never runs and sshd only listens on TCP. Fedora cloud images ship systemd-ssh-generator (systemd >= 256) which automatically binds sshd to AF_VSOCK port 22 inside VMs — that is why the original vsock path worked with Fedora but not Alpine. ## Test changes (driver_test.py) - Pass hostfwd={'ssh': {hostport=0, guestport=22}} so qemu.shell() uses the TCP path with a QEMU-assigned port; no probe socket needed. - Wait for 'bootstrap_complete: done' on the serial console before attempting login, guaranteeing tiny-cloud has set the password and sshd is ready (avoids the race that existed with Fedora too, where the extra setenforce console round-trip happened to provide enough delay). - Press Enter if the GRUB countdown appears to skip the 10 s wait. - Update post-login prompt from bash '[user@host ~]$' to Alpine ash 'host:~$'. - Remove 'sudo setenforce 0' (no SELinux on Alpine). - Relax 'uname -r' assertion to a non-empty check. ## CI changes (python-tests.yaml) Remove the actions/cache steps entirely. Alpine images download in ~6 s on GitHub runners (measured), which is faster than cache restore overhead for files this size.
83153d4 to
9a25775
Compare
no real need for this, this does not go into production or anything, is just an image that we boot to run the tests. But that's an existing problem from the past, nothing new. |
| run: | | ||
| brew install renode/tap/renode | ||
|
|
||
| - name: Cache Fedora Cloud images |
There was a problem hiding this comment.
the cache was actually slower than the download.
| # If the driver has an 'ssh' hostfwd entry, fetch the actual host port | ||
| # (resolving any port=0 assignment) and connect directly over TCP. | ||
| # Otherwise fall back to tunnelling through the jumpstarter stream (vsock). | ||
| try: |
There was a problem hiding this comment.
we had to implement this because alpine does not setup ssh over vhost, and we didn't want to pick a port that could potentially conflict with whatever is on the testing host.
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py`:
- Around line 82-97: The try/except block in the context manager is too broad
and will catch KeyError exceptions raised by the caller's code inside the with
block, causing unintended fallback behavior. Restructure this using
try/except/else pattern: wrap only the self.call("get_hostfwd_port", "ssh") call
in the try block, move the Connection creation and yield statement to an else
block (which executes only on success), and keep the FabricAdapter fallback in
the except KeyError block. This ensures that only KeyError from the
get_hostfwd_port call triggers the fallback, while caller errors are properly
propagated.
- Line 7: The hardcoded Connection(host="127.0.0.1") usage in the SSH shell
connection block (around lines 84-90) ignores the custom Hostfwd.hostaddr
setting and breaks for non-local clients. Replace this hardcoded approach by
resolving into the ssh child which is a TcpNetwork instance at line 489 that
already contains the correct hostaddr and port configuration. Reference the
FabricAdapter fallback pattern at lines 92-97 as an example of the correct
approach for properly tunneling through jumpstarter's stream instead of creating
direct connections to 127.0.0.1.
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`:
- Around line 385-403: After the for loop that processes usernet lines in the
hostfwd resolution block, add validation to ensure all zero-port entries were
successfully resolved. Create a set of unresolved ports by checking which
entries in zero_fwds are not present in self.parent._resolved_hostports, and
raise a RuntimeError with the list of missing keys if any are found.
Additionally, in the get_hostfwd_port() method (not shown in this diff), add a
check that raises RuntimeError if the hostport value equals 0, preventing silent
failures when unresolved ports are returned to callers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 6d8109b7-9f5f-4cf2-bbc8-e2b176dad19a
📒 Files selected for processing (4)
.github/workflows/python-tests.yamlpython/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.pypython/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pypython/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py
💤 Files with no reviewable changes (1)
- .github/workflows/python-tests.yaml
🚧 Files skipped from review as they are similar to previous changes (1)
- python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver_test.py
| from contextlib import contextmanager | ||
|
|
||
| import click | ||
| from fabric import Connection |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
fd jumpstarter-driver-qemu -type dRepository: jumpstarter-dev/jumpstarter
Length of output: 242
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py | head -100Repository: jumpstarter-dev/jumpstarter
Length of output: 4055
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py | tail -20Repository: jumpstarter-dev/jumpstarter
Length of output: 913
🏁 Script executed:
grep -n "Connection\|127.0.0.1\|Hostfwd" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 172
🏁 Script executed:
fd jumpstarter-driver-network -type dRepository: jumpstarter-dev/jumpstarter
Length of output: 242
🏁 Script executed:
find . -name "*jumpstarter-driver-network*" -type d 2>/dev/null | head -5Repository: jumpstarter-dev/jumpstarter
Length of output: 116
🏁 Script executed:
find . -name "FabricAdapter*" -o -path "*/jumpstarter_driver_network/*" -name "*.py" 2>/dev/null | head -10Repository: jumpstarter-dev/jumpstarter
Length of output: 979
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-network/jumpstarter_driver_network/adapters/fabric.py | head -60Repository: jumpstarter-dev/jumpstarter
Length of output: 1419
🏁 Script executed:
grep -n "Hostfwd\|hostaddr" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | head -20Repository: jumpstarter-dev/jumpstarter
Length of output: 884
🏁 Script executed:
find . -name "*.py" -type f | xargs grep -l "Hostfwd\|hostaddr" 2>/dev/null | head -10Repository: jumpstarter-dev/jumpstarter
Length of output: 228
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | grep -A 5 -B 5 "hostfwd\|Hostfwd" | head -50Repository: jumpstarter-dev/jumpstarter
Length of output: 2610
🏁 Script executed:
grep -n "TcpNetwork\|get_hostfwd_port" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 398
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '571,590p'Repository: jumpstarter-dev/jumpstarter
Length of output: 871
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '485,495p'Repository: jumpstarter-dev/jumpstarter
Length of output: 443
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '455,475p'Repository: jumpstarter-dev/jumpstarter
Length of output: 1112
🏁 Script executed:
grep -B5 -A10 "def get_hostfwd_port" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 604
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py | sed -n '78,98p'Repository: jumpstarter-dev/jumpstarter
Length of output: 1024
🏁 Script executed:
grep -n "export\|get_hostfwd_port\|get_hostaddr" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 480
🏁 Script executed:
grep -n "def get_\|`@export`" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | grep -A1 "def get_"Repository: jumpstarter-dev/jumpstarter
Length of output: 305
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '569,610p'Repository: jumpstarter-dev/jumpstarter
Length of output: 1847
🏁 Script executed:
grep -B5 -A10 "def get_hostfwd" python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 604
🏁 Script executed:
cat -n python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py | sed -n '431,436p'Repository: jumpstarter-dev/jumpstarter
Length of output: 327
Use the network child for SSH shell connections instead of hardcoding 127.0.0.1.
The Connection(host="127.0.0.1") at lines 84-90 ignores the custom Hostfwd.hostaddr setting and fails for non-local clients. When the client runs on a different machine, it connects to the client's loopback instead of the server's configured host address. The proper path is to resolve into the ssh child (a TcpNetwork instance at line 489) which already has the correct hostaddr and port. The fallback pattern using FabricAdapter at lines 92-97 demonstrates the correct approach for tunneling through jumpstarter's stream.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py` at
line 7, The hardcoded Connection(host="127.0.0.1") usage in the SSH shell
connection block (around lines 84-90) ignores the custom Hostfwd.hostaddr
setting and breaks for non-local clients. Replace this hardcoded approach by
resolving into the ssh child which is a TcpNetwork instance at line 489 that
already contains the correct hostaddr and port configuration. Reference the
FabricAdapter fallback pattern at lines 92-97 as an example of the correct
approach for properly tunneling through jumpstarter's stream instead of creating
direct connections to 127.0.0.1.
| try: | ||
| port = int(self.call("get_hostfwd_port", "ssh")) | ||
| with Connection( | ||
| host="127.0.0.1", | ||
| port=port, | ||
| user=self.username, | ||
| connect_kwargs={"password": self.password}, | ||
| ) as conn: | ||
| yield conn | ||
| except KeyError: | ||
| with FabricAdapter( | ||
| client=self.ssh, | ||
| user=self.username, | ||
| connect_kwargs={"password": self.password}, | ||
| ) as conn: | ||
| yield conn |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
# First, let's find and examine the file
find . -type f -name "client.py" | grep jumpstarter-driver-qemuRepository: jumpstarter-dev/jumpstarter
Length of output: 147
🏁 Script executed:
# Read the client.py file to see the full context
cat -n "python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py" | head -150Repository: jumpstarter-dev/jumpstarter
Length of output: 4941
🏁 Script executed:
# Check the imports to see if contextmanager is imported
head -20 "python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py"Repository: jumpstarter-dev/jumpstarter
Length of output: 618
Narrow the try block to avoid catching caller KeyErrors in this context manager.
In a @contextmanager, exceptions from the caller's with body resume at the yield statement. A caller that raises KeyError inside the context will be caught by the current broad try block, triggering the fallback path unintentionally and potentially masking the original error. Move the except handler to wrap only the get_hostfwd_port call using try/except/else:
Proposed fix
+ try:
+ port = int(self.call("get_hostfwd_port", "ssh"))
+ except KeyError:
+ with FabricAdapter(
+ client=self.ssh,
+ user=self.username,
+ connect_kwargs={"password": self.password},
+ ) as conn:
+ yield conn
+ else:
with Connection(
host="127.0.0.1",
port=port,
user=self.username,
connect_kwargs={"password": self.password},
) as conn:
yield conn
- except KeyError:
- with FabricAdapter(
- client=self.ssh,
- user=self.username,
- connect_kwargs={"password": self.password},
- ) as conn:
- yield conn🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/client.py`
around lines 82 - 97, The try/except block in the context manager is too broad
and will catch KeyError exceptions raised by the caller's code inside the with
block, causing unintended fallback behavior. Restructure this using
try/except/else pattern: wrap only the self.call("get_hostfwd_port", "ssh") call
in the try block, move the Connection creation and yield statement to an else
block (which executes only on success), and keep the FabricAdapter fallback in
the except KeyError block. This ensures that only KeyError from the
get_hostfwd_port call triggers the fallback, while caller errors are properly
propagated.
| # Resolve any hostport=0 hostfwd entries to the actual port QEMU chose. | ||
| # Parse 'info usernet': lines look like "TCP[HOST_FORWARD] fd addr port addr port ..." | ||
| # Store resolved ports on the parent so get_hostfwd_port() can return them to clients. | ||
| zero_fwds = {k: v for k, v in self.parent.hostfwd.items() if v.hostport == 0} | ||
| if zero_fwds: | ||
| usernet = await qmp.execute("human-monitor-command", {"command-line": "info usernet"}) | ||
| self.logger.debug("info usernet output:\n%s", usernet) | ||
| for line in usernet.splitlines(): | ||
| parts = line.split() | ||
| if len(parts) >= 6 and "HOST_FORWARD" in parts[0]: | ||
| # parts: Protocol[State] fd hostaddr hostport guestaddr guestport ... | ||
| actual_hostaddr, actual_hostport, actual_guestport = parts[2], int(parts[3]), int(parts[5]) | ||
| for k, v in zero_fwds.items(): | ||
| if v.hostaddr == actual_hostaddr and v.guestport == actual_guestport: | ||
| self.logger.info( | ||
| "hostfwd '%s': resolved port 0 -> %s:%d (guest port %d)", | ||
| k, actual_hostaddr, actual_hostport, actual_guestport, | ||
| ) | ||
| self.parent._resolved_hostports[k] = actual_hostport |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
find . -type f -name "driver.py" | grep jumpstarter-driver-qemuRepository: jumpstarter-dev/jumpstarter
Length of output: 147
🏁 Script executed:
wc -l python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 149
🏁 Script executed:
sed -n '380,410p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 1849
🏁 Script executed:
sed -n '560,580p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 765
🏁 Script executed:
sed -n '430,440p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 382
🏁 Script executed:
sed -n '460,470p' python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.pyRepository: jumpstarter-dev/jumpstarter
Length of output: 553
🏁 Script executed:
rg "_resolved_hostports" python/packages/jumpstarter-driver-qemu/Repository: jumpstarter-dev/jumpstarter
Length of output: 623
🏁 Script executed:
rg "get_hostfwd_port" python/packages/jumpstarter-driver-qemu/Repository: jumpstarter-dev/jumpstarter
Length of output: 498
🏁 Script executed:
rg "get_hostfwd_port" --type pyRepository: jumpstarter-dev/jumpstarter
Length of output: 498
Fail fast when a dynamic hostfwd port is unresolved.
Currently, if info usernet parsing misses a hostport=0 entry, get_hostfwd_port() returns 0 to the caller, causing SSH and other services to fail silently on an invalid port. Validate that every zero-port entry resolves during VM startup, and explicitly reject unresolved ports in get_hostfwd_port().
The proposed fix is necessary and should be applied:
- Accumulate resolved ports in a local dict and validate all
zero_fwdsentries are resolved; raiseRuntimeErrorwith the list of missing keys if any are unresolved. - In
get_hostfwd_port(), check ifhostport == 0and raiseRuntimeErrorinstead of returning the invalid port.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/packages/jumpstarter-driver-qemu/jumpstarter_driver_qemu/driver.py`
around lines 385 - 403, After the for loop that processes usernet lines in the
hostfwd resolution block, add validation to ensure all zero-port entries were
successfully resolved. Create a set of unresolved ports by checking which
entries in zero_fwds are not present in self.parent._resolved_hostports, and
raise a RuntimeError with the list of missing keys if any are found.
Additionally, in the get_hostfwd_port() method (not shown in this diff), add a
check that raises RuntimeError if the hostport value equals 0, preventing silent
failures when unresolved ports are returned to callers.
|
I need to review the qemu/ssh changes more carefully |
| port=port, | ||
| user=self.username, | ||
| connect_kwargs={"password": self.password}, | ||
| ) as conn: | ||
| yield conn | ||
| except KeyError: |
There was a problem hiding this comment.
Okay this one is a little backwards but hear me out: I think shell() method catches KeyError to fall back from tcp to vsock, but this never actually works. On the driver side, get_hostfwd_port() raises KeyError, but the gRPC transport catches all exceptions and reraises them as DriverError via StatusCode.UNKNOWN but DriverError doesn't inherit from KeyError so VMs without SSH hostfwd will crash with an unhandled DriverError instead of falling back to vsock.
| Path(self.parent._pty).unlink(missing_ok=True) | ||
| Path(self.parent._pty).symlink_to(pty) | ||
|
|
||
| # Resolve any hostport=0 hostfwd entries to the actual port QEMU chose. | ||
| # Parse 'info usernet': lines look like "TCP[HOST_FORWARD] fd addr port addr port ..." | ||
| # Store resolved ports on the parent so get_hostfwd_port() can return them to clients. | ||
| zero_fwds = {k: v for k, v in self.parent.hostfwd.items() if v.hostport == 0} | ||
| if zero_fwds: | ||
| usernet = await qmp.execute("human-monitor-command", {"command-line": "info usernet"}) | ||
| self.logger.debug("info usernet output:\n%s", usernet) | ||
| for line in usernet.splitlines(): | ||
| parts = line.split() | ||
| if len(parts) >= 6 and "HOST_FORWARD" in parts[0]: | ||
| # parts: Protocol[State] fd hostaddr hostport guestaddr guestport ... | ||
| actual_hostaddr, actual_hostport, actual_guestport = parts[2], int(parts[3]), int(parts[5]) | ||
| for k, v in zero_fwds.items(): | ||
| if v.hostaddr == actual_hostaddr and v.guestport == actual_guestport: | ||
| self.logger.info( | ||
| "hostfwd '%s': resolved port 0 -> %s:%d (guest port %d)", | ||
| k, actual_hostaddr, actual_hostport, actual_guestport, | ||
| ) | ||
| self.parent._resolved_hostports[k] = actual_hostport | ||
|
|
||
| await qmp.execute("system_reset") | ||
| await qmp.disconnect() | ||
|
|
There was a problem hiding this comment.
A ValueError from malformed output or a QMP error leaves the connection open and the QEMU process orphaned. You could wrap the QMP operation block in try/finally to guarantee await qmp.disconnect() runs.
| class Hostfwd(BaseModel): | ||
| protocol: Literal["tcp"] = "tcp" | ||
| hostaddr: str = "127.0.0.1" | ||
| hostport: int = Field(ge=1, le=65535) | ||
| hostport: int = Field(ge=0, le=65535) # 0 = let QEMU pick a free port | ||
| guestport: int = Field(ge=1, le=65535) | ||
|
|
||
|
|
There was a problem hiding this comment.
If on() fails after process creation but before usernet resolution completes, get_hostfwd_port() returns a stale port. I think adding self.parent._resolved_hostports.clear() at the beginning of on() before starting the new QEMU process, and also in off() alongside the existing cleanup should resolve this.
| protocol: Literal["tcp"] = "tcp" | ||
| hostaddr: str = "127.0.0.1" | ||
| hostport: int = Field(ge=1, le=65535) | ||
| hostport: int = Field(ge=0, le=65535) # 0 = let QEMU pick a free port |
There was a problem hiding this comment.
[HIGH] hostaddr: str = "127.0.0.1" has no format validation. The value is interpolated unsanitized into a comma-separated QEMU -netdev argument. Since QEMU uses commas to delimit netdev options, a crafted hostaddr like 127.0.0.1,hostfwd=tcp:0.0.0.0:4444-:4444 would inject an additional hostfwd rule. The hostport and guestport fields have Field(ge=..., le=...) constraints, but hostaddr accepts arbitrary strings.
Suggested fix: add a Pydantic field_validator using ipaddress.ip_address() to ensure hostaddr is a valid IP address.
AI-generated, human reviewed
| def get_hostfwd_port(self, key: str) -> int: | ||
| """Return the actual host port for a hostfwd entry (resolves port 0 assignments).""" | ||
| if key in self._resolved_hostports: | ||
| return self._resolved_hostports[key] | ||
| if key in self.hostfwd: | ||
| return self.hostfwd[key].hostport | ||
| raise KeyError(f"hostfwd key {key!r} not found") |
There was a problem hiding this comment.
[HIGH] get_hostfwd_port() is a new exported method with three code paths (resolved port, configured port, KeyError), and the info usernet parsing logic uses positional string splitting with int(parts[3]) and int(parts[5]). Neither has any unit test. The only coverage is the full integration test that boots a real VM. Extracting the parsing logic into a standalone function (e.g., _parse_usernet_hostfwd(usernet_output, zero_fwds)) would make it straightforward to add parameterized unit tests covering all three get_hostfwd_port() code paths as well as edge cases in the usernet output format.
AI-generated, human reviewed
| flash_timeout: int = field(default=30 * 60) # 30 minutes | ||
|
|
||
| _tmp_dir: TemporaryDirectory = field(init=False, default_factory=TemporaryDirectory) | ||
| # Maps hostfwd key -> actual host port after QEMU resolves port 0 assignments | ||
| _resolved_hostports: dict[str, int] = field(init=False, default_factory=dict) | ||
|
|
||
| @classmethod | ||
| def client(cls) -> str: |
There was a problem hiding this comment.
Defer TcpNetwork child creation for hostport=0 entries until after port resolution in on(), or update the child's port after resolution. Use a different key namespace for hostfwd children to avoid overwriting vsock children.
| for line in usernet.splitlines(): | ||
| parts = line.split() | ||
| if len(parts) >= 6 and "HOST_FORWARD" in parts[0]: | ||
| # parts: Protocol[State] fd hostaddr hostport guestaddr guestport ... | ||
| actual_hostaddr, actual_hostport, actual_guestport = parts[2], int(parts[3]), int(parts[5]) | ||
| for k, v in zero_fwds.items(): | ||
| if v.hostaddr == actual_hostaddr and v.guestport == actual_guestport: | ||
| self.logger.info( | ||
| "hostfwd '%s': resolved port 0 -> %s:%d (guest port %d)", | ||
| k, actual_hostaddr, actual_hostport, actual_guestport, | ||
| ) | ||
| self.parent._resolved_hostports[k] = actual_hostport |
There was a problem hiding this comment.
The info usernet parser uses int(parts[3]) and int(parts[5]) without try/except. A malformed line would raise `ValueError
| from contextlib import contextmanager | ||
|
|
||
| import click | ||
| from fabric import Connection |
There was a problem hiding this comment.
from fabric import Connection is a new import added by this PR, but fabric is not listed in dependencies.
| port=port, | ||
| user=self.username, | ||
| connect_kwargs={"password": self.password}, | ||
| ) as conn: | ||
| yield conn | ||
| except KeyError: | ||
| with FabricAdapter( | ||
| client=self.ssh, | ||
| user=self.username, | ||
| connect_kwargs={"password": self.password}, | ||
| ) as conn: | ||
| yield conn |
There was a problem hiding this comment.
Narrow the try/except to only wrap the self.call("get_hostfwd_port", "ssh") call, then branch on the result.
| with qemu.shell() as s: | ||
| assert s.run("uname -r").stdout.strip() == f"6.17.1-300.fc43.{arch}" | ||
| assert s.run("uname -r").stdout.strip() != "" |
There was a problem hiding this comment.
Also assert the return code, e.g., result = s.run("uname -r"); assert result.ok; assert result.stdout.strip() != "".
Motivation
Fedora Cloud qcow2 images used in QEMU driver tests weigh ~556 MB (x86_64) + ~519 MB (aarch64) = ~1.1 GB total, taking ~1 minute to download on every CI run. Alpine Linux 3.22.4 UEFI tiny images reduce that to ~278 MB total — a 74% reduction — and download in ~6 s on GitHub Actions runners, making a cache unnecessary.
Alpine's
nocloudtiny images support UEFI boot on both architectures and usetiny-cloud(a minimal cloud-init alternative) that reads the standard NoCloud CIDATA vfat volume the QEMU driver already generates.Driver changes (
driver.py)tiny-clouddiffers from full cloud-init in two ways:Hostname: tiny-cloud reads
@hostnamefrom the YAML keyhostnameinmeta-data, notlocal-hostname. Addhostnamealongsidelocal-hostnameso both implementations set it correctly.Password: tiny-cloud ignores
plain_text_passwdin theusersstanza. Add aruncmdentry to set the password viachpasswd. Credentials are escaped withshlex.quote()so special characters are safe. Full cloud-init images are unaffected.hostport=0support:Hostfwdnow allowshostport=0, which tells QEMU's user-mode networking to pick a free port automatically. After QEMU starts, the driver queries the actual assigned port via QMPhuman-monitor-command info usernet, logs it at INFO level:and stores it in
_resolved_hostportssoget_hostfwd_port()can return it to the client side.Client changes (
client.py)shell()now checks for ansshhostfwd entry by callingget_hostfwd_port('ssh'). If present, it opens a directfabric.Connectionto127.0.0.1:<resolved_port>, bypassing the jumpstarter streaming layer entirely. If no ssh hostfwd is configured (KeyError), it falls back to the originalFabricAdapterpath over vsock.This is necessary because Alpine uses OpenRC, not systemd, so
systemd-ssh-generatornever runs and sshd only listens on TCP. Fedora cloud images shipsystemd-ssh-generator(systemd ≥ 256) which automatically binds sshd toAF_VSOCKport 22 inside VMs — that is why the original vsock path worked with Fedora but not Alpine.Test changes (
driver_test.py)hostfwd={'ssh': {'hostport': 0, 'guestport': 22}}soqemu.shell()uses the TCP path with a QEMU-assigned port; no probe socket needed.bootstrap_complete: doneon the serial console before attempting login, guaranteeing tiny-cloud has set the password and sshd is ready (avoids the race that existed with Fedora too, where the extrasetenforceconsole round-trip happened to provide enough delay).[user@host ~]$→ Alpine ashhost:~$.sudo setenforce 0(no SELinux on Alpine).uname -rassertion to a non-empty check.CI changes (
python-tests.yaml)Remove the
actions/cachesteps entirely. Alpine images download in ~6 s on GitHub runners (measured from CI logs), which is faster than cache restore overhead for files this size.