docs(operations): add containerized GPU workloads guide#555
docs(operations): add containerized GPU workloads guide#555Aleksei Sviridkin (lexfrei) wants to merge 1 commit into
Conversation
✅ Deploy Preview for cozystack ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
|
Warning Review limit reached
More reviews will be available in 33 minutes and 43 seconds. Learn how PR review limits work. Your organization has run out of usage credits. Purchase more in the billing tab. ⌛ How to resolve this issue?After more reviews become available, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans include higher PR review limits than trial, open-source, and free plans. In all cases, reviews become available again over time. During sustained high-volume PR review activity, CodeRabbit may temporarily slow when the next review becomes available. Please see our Fair Usage Limits Policy for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: defaults Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
📝 WalkthroughWalkthroughThis PR adds a new operations documentation page explaining how to deploy and use the ChangesGPU Container Workloads Documentation
Possibly related issues
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~20 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Code Review
This pull request adds a new documentation page detailing how to run containerized GPU workloads using the container variant of the cozystack.gpu-operator package. The review feedback suggests specifying the cozy-system namespace in both the kubectl patch command and the Package resource manifest to ensure they are applied to the correct namespace.
| kubectl patch packages.cozystack.io cozystack.cozystack-platform --type=json \ | ||
| -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "cozystack.gpu-operator"}]' |
There was a problem hiding this comment.
In Cozystack, the Package resources (including cozystack.cozystack-platform) are typically located in the cozy-system namespace. Running kubectl patch without specifying the namespace will fail if the user's current context is set to another namespace (like default). Adding -n cozy-system ensures the command runs successfully.
| kubectl patch packages.cozystack.io cozystack.cozystack-platform --type=json \ | |
| -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "cozystack.gpu-operator"}]' | |
| kubectl patch packages.cozystack.io cozystack.cozystack-platform -n cozy-system --type=json \\ | |
| -p '[{"op": "add", "path": "/spec/components/platform/values/bundles/enabledPackages/-", "value": "cozystack.gpu-operator"}]' |
| apiVersion: cozystack.io/v1alpha1 | ||
| kind: Package | ||
| metadata: | ||
| name: cozystack.gpu-operator | ||
| spec: | ||
| variant: container |
There was a problem hiding this comment.
The Package resource needs to be created in the cozy-system namespace for the Cozystack operator to detect and reconcile it. Adding namespace: cozy-system to the metadata ensures it is applied to the correct namespace.
| apiVersion: cozystack.io/v1alpha1 | |
| kind: Package | |
| metadata: | |
| name: cozystack.gpu-operator | |
| spec: | |
| variant: container | |
| apiVersion: cozystack.io/v1alpha1 | |
| kind: Package | |
| metadata: | |
| name: cozystack.gpu-operator | |
| namespace: cozy-system | |
| spec: | |
| variant: container |
3170d45 to
8b83e54
Compare
|
Actionable comments posted: 0 |
Document the new container variant of cozystack.gpu-operator, paired with cozystack/cozystack#2766. Covers the apt-installed-driver-and-toolkit Linux shape that the variant targets: when to pick it over the passthrough and vGPU variants, prerequisites (host driver + host nvidia-container-toolkit, validated via nvidia-smi over kubectl debug), the operator-validator host-driver auto-detect path (/host/usr/bin/nvidia-smi), Talos caveat with a pointer to the values-native-talos.yaml reference, install steps, a sample CUDA pod for verification, the variant comparison matrix, and a cross-reference to the HAMi sharing guide for tenant Kubernetes clusters. Lands under operations/ — symmetric with virtualization/gpu.md (VM passthrough on management cluster) and kubernetes/gpu-sharing.md (HAMi in tenant Kubernetes addons). Assisted-By: Claude <noreply@anthropic.com> Signed-off-by: Aleksei Sviridkin <f@lex.la>
8b83e54 to
b9cae43
Compare
What this PR does
Add a new operations guide describing the
containervariant ofcozystack.gpu-operator— the architectural mode for containerized GPU workloads (CUDA pods, ML training, inference) on Linux GPU nodes that already ship the NVIDIA driver andnvidia-container-toolkitvia the distro package manager.The new page lands at
content/en/docs/next/operations/gpu-container-workloads.mdand rounds out the GPU documentation surface:defaultvariant).containervariant).Content covers when to pick the variant (host driver + host toolkit prerequisite), the operator-validator host-driver auto-detect mechanism (
/host/usr/bin/nvidia-smi), the Talos caveat with a pointer to theexamples/values-native-talos.yamlreference, install steps withPackageCRvariant: container, a sample CUDA pod for verification, and a three-row variant comparison matrix.Companion to cozystack/cozystack#2766, which adds the
containervariant itself.Release note
Summary by CodeRabbit