docs(jep-0014): simplified virtual exporter design (meeting consensus)#810
docs(jep-0014): simplified virtual exporter design (meeting consensus)#810mangelajo wants to merge 6 commits into
Conversation
Propose a Virtual Scalable Exporter subsystem for Jumpstarter that manages pools of virtual targets with configurable autoscaling via per-provider CRDs (QEMUExporterPool, AndroidExporterPool, etc.). Co-authored-by: Cursor <cursoragent@cursor.com>
Add DD-4 explaining why per-lease parameters are not included in this JEP. The same use case is served by creating separate pools with different resource profiles, avoiding complexity across the Lease CRD, controller, pool controllers, and driver templates. Co-authored-by: Cursor <cursoragent@cursor.com>
- Clarify warm pool rationale and cold-start latency range (10-60s) - Rename minInstances/maxInstances to minWarmInstances/maxTotalInstances - Make maxTotalInstances optional (0 or omitted means unlimited) - Add Crossplane to Prior Art with rationale - Resolve scheduled leases question via existing BeginTime mechanism - Add DD-5: built-in scaling vs HPA/KEDA - Add DD-4: per-lease parameters rejected in favor of pool flavors - Add composite exporters and Corellium to Future Possibilities - Clarify instance reuse with recycleStrategy field (ExitAndReplace default) - Add language identifiers to untyped fenced code blocks - Add Apache 2.0 license footer Co-authored-by: Cursor <cursoragent@cursor.com>
New JEP files not listed in any toctree cause Sphinx build warnings, which fail the check-warnings CI job. Co-authored-by: Cursor <cursoragent@cursor.com>
Document admin, controller, and user actions for the QEMU warm-pool scenario. Adopt ExporterSet + VirtualTargetClass reference graph, simplify homogeneous QEMU pools to avoid per-instance claims, and align examples with the phased lifecycle. Co-authored-by: Cursor <cursoragent@cursor.com>
|
No actionable comments were generated in the recent review. 🎉 ℹ️ Recent review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (1)
✅ Files skipped from review due to trivial changes (1)
📝 WalkthroughWalkthroughAdds JEP-0014 ("Virtual Scalable Exporters", Draft), a 1578-line design document proposing a Kubernetes-native warm pool subsystem for virtual exporters. It defines ChangesJEP-0014: Virtual Scalable Exporters
Estimated code review effort🎯 2 (Simple) | ⏱️ ~15 minutes Poem
🚥 Pre-merge checks | ✅ 5✅ Passed checks (5 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 3
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
Inline comments:
In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`:
- Around line 1008-1036: The reconciliation loop pseudocode contains conditions
that check replicas < spec.maxReplicas for scaling decisions, but it does not
explicitly handle the case where maxReplicas is 0 or omitted (which per line 745
means no upper bound). Update the pseudocode in the reconciliation loop to
clarify the scaling-up conditions by adding explicit checks that skip the
maxReplicas limit when it is 0 or omitted. Specifically, modify the two scale-up
conditions (the one maintaining minAvailableReplicas and the demand-driven
scale-up) to only apply the maxReplicas comparison when maxReplicas is not 0 and
not omitted, ensuring the logic is consistent with the stated semantics.
- Around line 1182-1191: The "Unresolved Questions" section contains a question
about the exact scaling algorithm (proportional, step-based, predictive) that is
already addressed in the reconciliation loop pseudocode (lines 1013-1036), which
specifies a concrete step-based algorithm. Move the scaling algorithm question
from the "Unresolved Questions" section to the "Resolved" subsection, or
alternatively keep it in Unresolved but clarify that the algorithm strategy
(step-based) is decided while only fine-tuning details (rate limits, backoff
parameters) remain unresolved.
- Around line 500-502: The JEP document references the "native sidecar" pattern
with `restartPolicy: Always` on init containers at multiple locations (around
lines 309 and 500) but does not specify the Kubernetes version requirement. Add
explicit documentation of the minimum Kubernetes version requirement (v1.29 or
later) where the native sidecar pattern is introduced or first referenced. This
can be done by adding a note or requirement section that clearly states the
SidecarContainers feature requires Kubernetes v1.29+ to be stable and GA,
ensuring users understand the infrastructure prerequisites for implementing this
pattern.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: da4b83c6-6ed1-4e0b-b9b9-f87195a315ed
📒 Files selected for processing (2)
python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.mdpython/docs/source/contributing/jeps/index.md
| - **Exporter sidecar** (native sidecar, `restartPolicy: Always`) — starts | ||
| first, registers with `jumpstarter-controller`. | ||
| - **QEMU runtime container** — started by provisioner; exporter talks to |
There was a problem hiding this comment.
🧩 Analysis chain
🌐 Web query:
What is the current status and minimum Kubernetes version for init container restartPolicy support (KEP-3617)?
💡 Result:
The ability to specify a restartPolicy for init containers was introduced via the "SidecarContainers" feature, which became stable and enabled by default in Kubernetes v1.29 [1]. Current Status: 1. Sidecar Containers: Since Kubernetes v1.29, you can define init containers with restartPolicy: Always. These are treated as "restartable sidecar containers" that remain running for the duration of the Pod's lifecycle [1]. 2. Per-Container Restart Policy: In Kubernetes v1.34, further enhancements were introduced under the ContainerRestartRules feature (alpha). This allows for more granular control, including specifying restartPolicy and restartPolicyRules at the container level [2]. This allows, for example, an init container to be configured to run only once (restartPolicy: Never), ensuring that if it fails, the entire Pod fails, rather than repeatedly restarting the init container [2][3]. Minimum Version: - To use the basic restartable sidecar functionality (restartPolicy: Always on an init container), Kubernetes v1.29 or later is required [1]. - To use the advanced per-container restartPolicy and restartPolicyRules features, Kubernetes v1.34 or later is required (with the ContainerRestartRules feature gate enabled) [2]. KEP-3617 specifically refers to the work that enabled Sidecar Containers [4]. While initial development occurred in earlier versions (often appearing in v1.28 as a beta/feature-gated feature [4][5]), it is officially considered stable and generally available starting in v1.29 [1].
Citations:
- 1: https://kubernetes.io/docs/concepts/workloads/pods/sidecar-containers/
- 2: https://kubernetes.io/blog/2025/08/29/kubernetes-v1-34-per-container-restart-policy/
- 3: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/
- 4: Desired behavior of a startup failed restartable init container in a pod with
restartPolicyset toNeverkubernetes/kubernetes#120234 - 5: Kubernetes Skips Init Containers Beyond First During Some Pod Restarts kubernetes/kubernetes#124002
🏁 Script executed:
cd "$(git rev-parse --show-toplevel)" && \
wc -l python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.mdRepository: jumpstarter-dev/jumpstarter
Length of output: 152
🏁 Script executed:
cd "$(git rev-parse --show-toplevel)" && \
sed -n '300,320p' python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md && \
echo "---" && \
sed -n '495,510p' python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.mdRepository: jumpstarter-dev/jumpstarter
Length of output: 1731
🏁 Script executed:
cd "$(git rev-parse --show-toplevel)" && \
grep -n -i "kubernetes\|version\|k8s" python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md | head -20Repository: jumpstarter-dev/jumpstarter
Length of output: 1108
Document minimum Kubernetes version requirement for native sidecar init containers.
Lines 309 and 500 reference a "native sidecar" pattern with restartPolicy: Always on init containers. This requires Kubernetes v1.29 or later (where the SidecarContainers feature became stable and GA). The JEP should explicitly document this minimum version requirement.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`
around lines 500 - 502, The JEP document references the "native sidecar" pattern
with `restartPolicy: Always` on init containers at multiple locations (around
lines 309 and 500) but does not specify the Kubernetes version requirement. Add
explicit documentation of the minimum Kubernetes version requirement (v1.29 or
later) where the native sidecar pattern is introduced or first referenced. This
can be done by adding a note or requirement section that clearly states the
SidecarContainers feature requires Kubernetes v1.29+ to be stable and GA,
ensuring users understand the infrastructure prerequisites for implementing this
pattern.
| ### Reconciliation Loop | ||
|
|
||
| Each `ExporterSet` controller runs a continuous reconciliation loop, triggered by | ||
| changes to the set CR, owned Exporters, or matching Leases: | ||
|
|
||
| ```text | ||
| for each ExporterSet CR: | ||
| ownedExporters = list Exporters owned by this CR | ||
| replicas = count ownedExporters in Ready state | ||
| leasedReplicas = count ownedExporters with an active LeaseRef | ||
| availableReplicas = replicas - leasedReplicas | ||
| pendingLeases = count pending Leases matching spec.selector | ||
|
|
||
| # Invariant: maintain minAvailableReplicas warm buffer | ||
| if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas: | ||
| scale up to restore availableReplicas | ||
|
|
||
| # Demand-driven scale-up | ||
| elif pendingLeases > 0 AND replicas < spec.maxReplicas: | ||
| scale up by min(pendingLeases, spec.maxReplicas - replicas) | ||
|
|
||
| # Scale-down: excess idle replicas | ||
| elif availableReplicas > spec.minAvailableReplicas AND cooldown elapsed: | ||
| graceful scale down: | ||
| 1. set exporter.spec.enabled = false | ||
| 2. wait until leaseRef remains empty | ||
| 3. delete Pod, Exporter CR, and *VirtualTarget | ||
| (never below minAvailableReplicas) | ||
| ``` |
There was a problem hiding this comment.
Clarify maxReplicas=0 semantics in the reconciliation loop pseudocode.
Line 745 states "maxReplicas: 0 or omitted = no upper bound," but the reconciliation pseudocode (lines 1022–1027) checks replicas < spec.maxReplicas without handling the case where maxReplicas is 0 or omitted. If 0 means "no limit," the comparison should explicitly skip the limit check when maxReplicas is 0 or nil.
🔧 Suggested clarification in pseudocode
# Invariant: maintain minAvailableReplicas warm buffer
- if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas:
+ if availableReplicas < spec.minAvailableReplicas AND (spec.maxReplicas == 0 OR replicas < spec.maxReplicas):
scale up to restore availableReplicas
# Demand-driven scale-up
- elif pendingLeases > 0 AND replicas < spec.maxReplicas:
+ elif pendingLeases > 0 AND (spec.maxReplicas == 0 OR replicas < spec.maxReplicas):
scale up by min(pendingLeases, spec.maxReplicas - replicas)🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`
around lines 1008 - 1036, The reconciliation loop pseudocode contains conditions
that check replicas < spec.maxReplicas for scaling decisions, but it does not
explicitly handle the case where maxReplicas is 0 or omitted (which per line 745
means no upper bound). Update the pseudocode in the reconciliation loop to
clarify the scaling-up conditions by adding explicit checks that skip the
maxReplicas limit when it is 0 or omitted. Specifically, modify the two scale-up
conditions (the one maintaining minAvailableReplicas and the demand-driven
scale-up) to only apply the maxReplicas comparison when maxReplicas is not 0 and
not omitted, ensuring the logic is consistent with the stated semantics.
| ## Unresolved Questions | ||
|
|
||
| - What is the exact scaling algorithm (proportional, step-based, predictive)? | ||
|
|
||
| ### Resolved | ||
|
|
||
| - **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013. | ||
| - **Lease release detection:** Controllers watch Lease objects directly. | ||
| - **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated | ||
| leases until effective. |
There was a problem hiding this comment.
Resolve or clarify the "exact scaling algorithm" unresolved question.
The "Unresolved Questions" section (line 1184) asks "What is the exact scaling algorithm (proportional, step-based, predictive)?" However, the reconciliation loop pseudocode (lines 1013–1036) already specifies a concrete step-based algorithm (scale up by min(pendingLeases, maxReplicas - replicas)). This should be moved to the "Resolved" subsection or clarified to indicate that the algorithm strategy is decided but fine-tuning details (e.g., rate limits, backoff) remain unresolved.
🔧 Suggested edit to mark resolved
## Unresolved Questions
-What is the exact scaling algorithm (proportional, step-based, predictive)?
-
### Resolved
+**Scaling algorithm:** Step-based (scale up by pending lease count, respecting `maxReplicas`; scale down after cooldown). Fine-tuning (rate limits, backoff curves) remains for implementation.
+
**Observability (JEP-0013):** ...📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| ## Unresolved Questions | |
| - What is the exact scaling algorithm (proportional, step-based, predictive)? | |
| ### Resolved | |
| - **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013. | |
| - **Lease release detection:** Controllers watch Lease objects directly. | |
| - **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated | |
| leases until effective. | |
| ## Unresolved Questions | |
| ### Resolved | |
| - **Scaling algorithm:** Step-based (scale up by pending lease count, respecting `maxReplicas`; scale down after cooldown). Fine-tuning (rate limits, backoff curves) remains for implementation. | |
| - **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013. | |
| - **Lease release detection:** Controllers watch Lease objects directly. | |
| - **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated | |
| leases until effective. |
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.
In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`
around lines 1182 - 1191, The "Unresolved Questions" section contains a question
about the exact scaling algorithm (proportional, step-based, predictive) that is
already addressed in the reconciliation loop pseudocode (lines 1013-1036), which
specifies a concrete step-based algorithm. Move the scaling algorithm question
from the "Unresolved Questions" section to the "Resolved" subsection, or
alternatively keep it in Unresolved but clarify that the algorithm strategy
(step-based) is decided while only fine-tuning details (rate limits, backoff
parameters) remain unresolved.
Simplify to a 2-CRD namespaced model with nested parameters, flash-at-lease semantics, off-cluster provisioning, and a QEMU-first implementation plan. Co-authored-by: Cursor <cursoragent@cursor.com>
|
Superseded: meeting-consensus JEP content was pushed to |
Summary
Revises JEP-0014 to reflect 2026-06 team review decisions. Supersedes the Kirk-heavy #744 discussion branch for the agreed v1 design.
VirtualTargetClass+ExporterSetonly — no typed*VirtualTargetclaim CRDsparameterson class + set with deep-merge (not flatmap[string]string)ExporterSet.spec.ttlor admin-pinned boot images in v1qemu-baremetal.jumpstarter.dev)Relationship to #744
#744 explored Kirk's full proposal (typed claims, cluster-scoped classes, etc.). This PR reflects meeting consensus and is the preferred design direction for v1. Consider closing #744 or marking it superseded once this merges.
VirtualTargetClass+ optional*VirtualTargetclaimsVirtualTargetClass+ExporterSetonlyparameters+ deep-mergeDesign decisions added/updated
VirtualTargetClass, same-namespace referencesTest plan
QEMUVirtualTarget/CorelliumVirtualTarget/AndroidVirtualTargetusage (only rejected-alternatives / DD-3 references)spec.ttlin API examples (only DD-7 rejection text)VirtualTargetClassin examples or API table