Skip to content

docs(gpu): drop manual KubeVirt patch step now that the platform auto-wires permittedHostDevices#556

Open
Aleksei Sviridkin (lexfrei) wants to merge 3 commits into
mainfrom
feat/gpu-auto-wiring
Open

docs(gpu): drop manual KubeVirt patch step now that the platform auto-wires permittedHostDevices#556
Aleksei Sviridkin (lexfrei) wants to merge 3 commits into
mainfrom
feat/gpu-auto-wiring

Conversation

@lexfrei
Copy link
Copy Markdown
Contributor

@lexfrei Aleksei Sviridkin (lexfrei) commented May 28, 2026

Companion to cozystack/cozystack#2768.

Rewrites step 2 of the GPU Passthrough guide. Until now the page instructed operators to run kubectl edit kubevirt -n cozy-kubevirt and hand-paste a permittedHostDevices.pciHostDevices block — that is the friction that ticket #2765 asked the platform to remove. With cozystack/cozystack#2768 landed, the bundle mirrors the chosen GPU variant into the KubeVirt CR automatically: HostDevices is appended to the feature-gate list and a starter NVIDIA pciHostDevices table (Hopper, Ada Lovelace, Ampere, Turing, Volta) is rendered alongside the operator's .gpu.permittedHostDevices extensions.

The new step 2 documents:

  • The contract — what the platform auto-injects and why (HostDevices gate, NVIDIA default table, externalResourceProvider: true semantics).
  • How to verify (kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml | yq ...).
  • The escape hatch — gpu.replaceDefaults, gpu.permittedHostDevices.pciHostDevices, plus the consequence of replaceDefaults: true with an empty list (no admittable GPU VMs).
  • The manual Package-CR override path — when an operator hand-crafts cozystack.gpu-operator outside the bundle for advanced overrides, they also hand-craft cozystack.kubevirt with the matching extraFeatureGates / permittedHostDevices. The manual override takes precedence over the bundle render.

Only next/virtualization/gpu.md is touched. The released doc versions (v1.4 and earlier) describe earlier Cozystack releases that still require the manual kubectl edit, and stay as-is.

Release note

docs(gpu): the GPU Passthrough guide no longer instructs operators to manually patch the KubeVirt CR — Cozystack now auto-wires the HostDevices feature gate and a starter NVIDIA permittedHostDevices table whenever cozystack.gpu-operator is enabled in bundles.enabledPackages. Operators extend or replace the defaults via .gpu.permittedHostDevices and .gpu.replaceDefaults.

Summary by CodeRabbit

  • Documentation
    • Updated GPU virtualization guide to remove a manual edit step and describe automated configuration when the GPU package is enabled.
    • Added verification commands for rendered GPU settings and guidance to customize or replace NVIDIA defaults via package values.
    • Included upgrade instructions for prior manual setups and documented a manual override path for non-bundled deployments.

@netlify
Copy link
Copy Markdown

netlify Bot commented May 28, 2026

Deploy Preview for cozystack ready!

Name Link
🔨 Latest commit 3e5b504
🔍 Latest deploy log https://app.netlify.com/projects/cozystack/deploys/6a229e96ce82490008178c81
😎 Deploy Preview https://deploy-preview-556--cozystack.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 28, 2026

Review Change Stack

Warning

Review limit reached

@lexfrei, we couldn't start this review because you've reached your PR review rate limit.

More reviews will be available in 35 minutes and 45 seconds. Learn how PR review limits work.

Your organization has run out of usage credits. Purchase more in the billing tab.

⌛ How to resolve this issue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available.

Please see our Fair Usage Limits Policy for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: 5b7d701f-1eb8-40fe-b5a4-48ecc633dcd2

📥 Commits

Reviewing files that changed from the base of the PR and between cabf50d and 3e5b504.

📒 Files selected for processing (1)
  • content/en/docs/next/virtualization/gpu.md
📝 Walkthrough

Walkthrough

This PR updates GPU configuration documentation for KubeVirt, replacing manual CR editing instructions with automated management via the cozystack.gpu-operator package. It adds verification commands, customization guidance via Platform Package values, upgrade instructions for existing users, and documents the manual override path for bundle opt-out scenarios.

Changes

GPU Configuration Workflow

Layer / File(s) Summary
Automatic wiring and informational note
content/en/docs/next/virtualization/gpu.md
Describes automatic injection of the HostDevices feature gate and generation of a default spec.configuration.permittedHostDevices.pciHostDevices allowlist for NVIDIA GPUs; adds a kubectl + jq verification snippet and an informational callout that manual edits are reconciled.
Extend or replace NVIDIA defaults via Platform Package values
content/en/docs/next/virtualization/gpu.md
Documents how to extend or fully replace the generated NVIDIA defaults using Platform Package values, describes replaceDefaults semantics, and gives an example with pciVendorSelector, resourceName, and externalResourceProvider.
Upgrade guidance from hand-edited CRs
content/en/docs/next/virtualization/gpu.md
Provides steps to migrate previously hand-edited spec.configuration.permittedHostDevices entries into platform values (optionally using replaceDefaults: true) and guidance to validate resourceName against node-advertised values to avoid admission failures.
Manual Package-CR override path
content/en/docs/next/virtualization/gpu.md
Documents the manual override workflow when opting out of bundle rendering: create a cozystack.kubevirt Package CR to supply HostDevices and permittedHostDevices, noting this manual CR takes precedence over bundle rendering.

Estimated code review effort

🎯 1 (Trivial) | ⏱️ ~3 minutes

Possibly related issues

Poem

🐰 A rabbit's gentle paw adjusts the GPU dials,
Where auto-wired configs now guide through trials,
No more kubectl edits, no manual blues—
Defaults extended, upgrades traced with clues,
Hop on, the cluster hums and smoothly compiles.

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately describes the main change: removing the manual kubectl edit step and replacing it with automatic platform wiring of GPU permittedHostDevices configuration.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/gpu-auto-wiring

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request updates the GPU virtualization documentation to explain that Cozystack now automatically configures and wires KubeVirt when the GPU operator is enabled. It details the automatic injection of host devices, how to extend or replace NVIDIA defaults, and the manual Package-CR override path. The review feedback suggests improving command portability by replacing yq with jq in the verification step, and correcting the configuration path from components.kubevirt.values to spec.values for standalone Package CRs.

Comment on lines +115 to +116
kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml \
| yq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}'
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

Using yq can sometimes lead to compatibility issues depending on whether the user has the Python-based yq (which supports full jq syntax) or the Go-based yq (which has a different expression syntax) installed.

Using kubectl ... -o json | jq ... is much more portable, standard, and guaranteed to work across different environments since jq is universally standardized.

Suggested change
kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml \
| yq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}'
kubectl -n cozy-kubevirt get kubevirt kubevirt -o json \
| jq '.spec.configuration | {featureGates: .developerConfiguration.featureGates, permittedHostDevices: .permittedHostDevices}'


### Manual Package-CR override path

If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR with `components.kubevirt.values.extraFeatureGates: [HostDevices]` and the appropriate `permittedHostDevices` block. The manual Package-CR override path takes precedence over the bundle render whenever both exist.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

When creating a standalone cozystack.kubevirt Package CR directly, the configuration values should be defined under spec.values rather than components.kubevirt.values. The components.<name>.values structure is used when configuring components within the umbrella cozystack-platform package.

Updating this path ensures the standalone Package CR is configured correctly.

Suggested change
If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR with `components.kubevirt.values.extraFeatureGates: [HostDevices]` and the appropriate `permittedHostDevices` block. The manual Package-CR override path takes precedence over the bundle render whenever both exist.
If you opt out of bundle management and hand-craft a `cozystack.gpu-operator` Package CR directly (to apply overrides the bundle does not expose — driver settings, custom node selectors, validator / dcgmExporter tweaks), the platform does NOT auto-wire `HostDevices` or `permittedHostDevices` into the KubeVirt CR. In that flow, mirror the bundle behaviour by also creating a `cozystack.kubevirt` Package CR with `spec.values.extraFeatureGates: [HostDevices]` and the appropriate `permittedHostDevices` block. The manual Package-CR override path takes precedence over the bundle render whenever both exist.

Copy link
Copy Markdown
Member

@kvaps Andrei Kvapil (kvaps) left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Requesting changes on one thing: keep a discoverable "GPU not in the default table" escape hatch — but route it through .gpu.permittedHostDevices, not kubectl edit. The rest of the rewrite is good.

I'd rather we not leave operators without a visible manual path. Two points:

  1. The reconcile-safe manual path already lives in this PR — the "Extending or replacing the NVIDIA defaults" section (.gpu.permittedHostDevices + replaceDefaults). That's the right answer for a card not in the static table, and it survives reconcile because it flows through platform values → the KubeVirt CR template. My only ask is to make it more discoverable — e.g. a short FAQ entry / callout titled "My GPU isn't in the default table" that links to it, since operators upgrading from the old flow will look for the removed kubectl edit step.

  2. Please don't reinstate the old kubectl edit kubevirt step verbatim behind a spoiler. Post-auto-wiring that field is owned by the chart template, so a hand edit is reverted on the next Flux/Helm reconcile — keeping it as-is would be a footgun. If we show the raw CR shape at all, it should be explicitly labelled "reference only — permittedHostDevices is reconciled from platform values; edit .gpu.permittedHostDevices instead" inside the collapsible.

This ties into the upgrade-safety request on the platform side — cozystack/cozystack#2768 (and the migration breakdown in cozystack/cozystack#2768 (comment)): operators who hand-edited permittedHostDevices need a clear, persistent migration target, and .gpu.permittedHostDevices is it. Worth surfacing the same upgrade note here too.

Net: keep the manual capability, just anchor it on the persistent knob and make it easy to find.

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
content/en/docs/next/virtualization/gpu.md (1)

147-150: ⚡ Quick win

Inconsistent kubectl command pattern.

This command uses kubectl get kubevirt -n cozy-kubevirt -o yaml without specifying the resource name, then indexes into .items[0]. However, line 115 uses kubectl get kubevirt kubevirt with the explicit resource name, which returns the object directly without needing .items[] indexing.

For consistency and clarity, use the same pattern as line 115:

📝 Suggested fix for consistency
-   kubectl get kubevirt -n cozy-kubevirt -o yaml \
-     | yq '.items[0].spec.configuration.permittedHostDevices'
+   kubectl -n cozy-kubevirt get kubevirt kubevirt -o yaml \
+     | yq '.spec.configuration.permittedHostDevices'
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@content/en/docs/next/virtualization/gpu.md` around lines 147 - 150, The
kubectl command uses the list-style invocation and then indexes into .items[0],
which is inconsistent with the explicit resource call used earlier; update the
command so it targets the specific KubeVirt resource name (same pattern as the
earlier `kubectl get kubevirt kubevirt`) and remove the need for `.items[0]`
when extracting `.spec.configuration.permittedHostDevices` to keep command style
consistent and clearer.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@content/en/docs/next/virtualization/gpu.md`:
- Line 110: Update the wording around resourceName in the
spec.configuration.permittedHostDevices.pciHostDevices paragraph to reflect the
actual slug format used by nvidia-sandbox-device-plugin (v25.x): state that
resourceName slugs are typically two-component identifiers like
`nvidia.com/GA102GL_A10` or `nvidia.com/TU104GL_T4` and clarify that optional
`<form>` and `<mem>` components may be appended for more specific devices (i.e.,
`<arch>_<model>` is the common case, with optional `_ <form>_ <mem>` when
present); keep the note about externalResourceProvider: true and mention the
plugin as the source of these resource names.

---

Nitpick comments:
In `@content/en/docs/next/virtualization/gpu.md`:
- Around line 147-150: The kubectl command uses the list-style invocation and
then indexes into .items[0], which is inconsistent with the explicit resource
call used earlier; update the command so it targets the specific KubeVirt
resource name (same pattern as the earlier `kubectl get kubevirt kubevirt`) and
remove the need for `.items[0]` when extracting
`.spec.configuration.permittedHostDevices` to keep command style consistent and
clearer.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: cc18be5d-d493-4e16-a7de-27048f475ce7

📥 Commits

Reviewing files that changed from the base of the PR and between ef54f10 and 5ba523c.

📒 Files selected for processing (1)
  • content/en/docs/next/virtualization/gpu.md

Comment thread content/en/docs/next/virtualization/gpu.md Outdated
…-wires permittedHostDevices

Step 2 of the GPU Passthrough guide instructed operators to
`kubectl edit kubevirt -n cozy-kubevirt` and hand-paste a
permittedHostDevices.pciHostDevices block. cozystack/cozystack#2768
removes the need for that step: when cozystack.gpu-operator is in
bundles.enabledPackages, the platform now mirrors the chosen GPU
variant into the KubeVirt CR automatically — appending HostDevices
to the feature-gate list and rendering a starter NVIDIA pciHostDevices
table covering Hopper, Ada Lovelace, Ampere, Turing and Volta.

The new step 2 documents the contract (what the platform auto-injects
and why), the verification recipe, the escape hatch via
.gpu.permittedHostDevices / .gpu.replaceDefaults, and the manual
Package-CR override path used by operators who need overrides the
bundle does not expose (driver settings, custom node selectors,
validator / dcgmExporter tweaks) — in that flow they also hand-craft
the matching cozystack.kubevirt Package CR.

Only next/virtualization/gpu.md is updated; v1.4 and earlier
describe releases that still require the manual patch and stay
as-is.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…ostDevices

The bundle now owns spec.configuration.permittedHostDevices, so the first
reconcile after upgrade overwrites manual kubectl-edit entries with the NVIDIA
default table. Tell operators to move custom entries into
.gpu.permittedHostDevices and verify each resourceName against node-advertised
names before upgrading, since the default slugs (e.g. TU104GL_T4) differ from
legacy names (e.g. TU104GL_TESLA_T4) and a mismatch silently rejects GPU VMs.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
…d portable

Add a callout that redirects operators looking for the removed
`kubectl edit kubevirt` step to the `.gpu.permittedHostDevices` knob,
linking the extend/replace and upgrade sections so the persistent
manual path stays easy to find.

Use `kubectl -o json | jq` for the verify and dump commands — matches
the convention used across the rest of the docs and avoids the Go-yq
vs Python-yq expression-syntax drift.

Correct the resourceName slug convention to `<arch>_<model>` with
optional `_<form>_<mem>` qualifiers, and note the default table is
rendered in the passthrough (vfio-pci) variant.

Assisted-By: Claude <noreply@anthropic.com>
Signed-off-by: Aleksei Sviridkin <f@lex.la>
@lexfrei
Copy link
Copy Markdown
Contributor Author

Andrei Kvapil (@kvaps) addressed the review — kept the manual capability but anchored it on .gpu.permittedHostDevices instead of kubectl edit:

  1. Added a discoverable callout "My GPU isn't in the default table — where's the old kubectl edit kubevirt step?" linking to "Extending or replacing the NVIDIA defaults".
  2. Did not reinstate the raw kubectl edit kubevirt step; the callout explains the field is now reconciled from platform values and a live edit is reverted on the next reconcile.
  3. Added an "Upgrading from a hand-edited KubeVirt CR" section mirroring the platform-side upgrade note, including the resourceName verification and the TU104GL_T4 vs TU104GL_TESLA_T4 example.

Re-requesting review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants