Skip to content

docs(jep-0014): simplified virtual exporter design (meeting consensus)#810

Closed
mangelajo wants to merge 6 commits into
mainfrom
jep-0014-e2e-flow
Closed

docs(jep-0014): simplified virtual exporter design (meeting consensus)#810
mangelajo wants to merge 6 commits into
mainfrom
jep-0014-e2e-flow

Conversation

@mangelajo

@mangelajo mangelajo commented Jun 18, 2026

Copy link
Copy Markdown
Member

Summary

Revises JEP-0014 to reflect 2026-06 team review decisions. Supersedes the Kirk-heavy #744 discussion branch for the agreed v1 design.

  • 2-CRD model: namespaced VirtualTargetClass + ExporterSet only — no typed *VirtualTarget claim CRDs
  • Dictionary config: nested parameters on class + set with deep-merge (not flat map[string]string)
  • Flash-at-lease (DD-7): warm pool provides baseline runtime; users flash/boot like physical targets — no ExporterSet.spec.ttl or admin-pinned boot images in v1
  • End-to-end flow: phased QEMU warm-pool walkthrough (admin / controller / user), flash-at-lease session, 2-CRD summary tables
  • External provisioning: off-cluster bare-metal example (qemu-baremetal.jumpstarter.dev)
  • Implementation plan: Phase 2 QEMU reference only → Phase 3 off-cluster → deferred Corellium/Android/lifecycle controller

Relationship to #744

#744 explored Kirk's full proposal (typed claims, cluster-scoped classes, etc.). This PR reflects meeting consensus and is the preferred design direction for v1. Consider closing #744 or marking it superseded once this merges.

Topic #744 (discussion) This PR (consensus)
Admin CRDs VirtualTargetClass + optional *VirtualTarget claims VirtualTargetClass + ExporterSet only
Class scope Cluster-scoped (CSI analog) Namespaced (same namespace as set)
Provider config Typed fields + parameters map Nested YAML parameters + deep-merge
Image freshness (various) User flash at lease; lifecycle controller deferred (DD-7)
v1 provisioner Multiple backends QEMU reference → off-cluster → rest deferred
E2E flow Partial / absent Full phased QEMU example

Design decisions added/updated

  • DD-3: pluggable provisioner + dictionary parameters; reject typed claims
  • DD-6: namespaced VirtualTargetClass, same-namespace references
  • DD-7: defer TTL/image refresh; future cross-cutting lifecycle controller

Test plan

  • Grep: no active QEMUVirtualTarget / CorelliumVirtualTarget / AndroidVirtualTarget usage (only rejected-alternatives / DD-3 references)
  • Grep: no spec.ttl in API examples (only DD-7 rejection text)
  • Grep: no cluster-scoped VirtualTargetClass in examples or API table
  • End-to-end flow matches 2-CRD namespaced model
  • Init containers vs hooks listed under Unresolved Questions only
  • Implementation phases: QEMU-only for v1 near-term deliverables
  • Sphinx docs build clean

mangelajo and others added 5 commits June 3, 2026 19:03
Propose a Virtual Scalable Exporter subsystem for Jumpstarter that
manages pools of virtual targets with configurable autoscaling via
per-provider CRDs (QEMUExporterPool, AndroidExporterPool, etc.).

Co-authored-by: Cursor <cursoragent@cursor.com>
Add DD-4 explaining why per-lease parameters are not included in this
JEP. The same use case is served by creating separate pools with
different resource profiles, avoiding complexity across the Lease CRD,
controller, pool controllers, and driver templates.

Co-authored-by: Cursor <cursoragent@cursor.com>
- Clarify warm pool rationale and cold-start latency range (10-60s)
- Rename minInstances/maxInstances to minWarmInstances/maxTotalInstances
- Make maxTotalInstances optional (0 or omitted means unlimited)
- Add Crossplane to Prior Art with rationale
- Resolve scheduled leases question via existing BeginTime mechanism
- Add DD-5: built-in scaling vs HPA/KEDA
- Add DD-4: per-lease parameters rejected in favor of pool flavors
- Add composite exporters and Corellium to Future Possibilities
- Clarify instance reuse with recycleStrategy field (ExitAndReplace default)
- Add language identifiers to untyped fenced code blocks
- Add Apache 2.0 license footer

Co-authored-by: Cursor <cursoragent@cursor.com>
New JEP files not listed in any toctree cause Sphinx build warnings,
which fail the check-warnings CI job.

Co-authored-by: Cursor <cursoragent@cursor.com>
Document admin, controller, and user actions for the QEMU warm-pool
scenario. Adopt ExporterSet + VirtualTargetClass reference graph,
simplify homogeneous QEMU pools to avoid per-instance claims, and
align examples with the phased lifecycle.

Co-authored-by: Cursor <cursoragent@cursor.com>
@coderabbitai

coderabbitai Bot commented Jun 18, 2026

Copy link
Copy Markdown
Contributor

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 6b47d214-4e7d-48ad-93f6-2988227db8d1

📥 Commits

Reviewing files that changed from the base of the PR and between 355013d and 15086cc.

📒 Files selected for processing (1)
  • python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md
✅ Files skipped from review due to trivial changes (1)
  • python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md

📝 Walkthrough

Walkthrough

Adds JEP-0014 ("Virtual Scalable Exporters", Draft), a 1578-line design document proposing a Kubernetes-native warm pool subsystem for virtual exporters. It defines VirtualTargetClass, ExporterSet, and related CRDs, specifies end-to-end flows, reconciliation behavior, and scaling semantics. The JEP index and Sphinx toctree are updated to include the new entry.

Changes

JEP-0014: Virtual Scalable Exporters

Layer / File(s) Summary
Document header and index registration
python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md, python/docs/source/contributing/jeps/index.md
JEP-0014 header metadata (ID, title, author, Draft status, dates) is added; the Standards Track table and hidden Sphinx toctree in the index are updated to include the new JEP.
Abstract, motivation, user stories, and core abstractions
python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md
Describes the problem with current virtual target handling, introduces the fidelity/cost ladder and user stories, then defines VirtualTargetClass, ExporterSet, Exporter, and the container-backed sidecar pattern with YAML examples.
End-to-end flows and architecture overview
python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md
Documents warm-pool provisioning, lease acquisition, session behavior, lease release, recycling, scale-down, and demand-spike scale-up via a QEMU example; includes architecture overview, provisioner deployment models, controller reconciliation inputs/watches, and off-cluster provisioning topologies.
API and protocol changes
python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md
Specifies new CRDs with spec/status shapes, dictionary-based parameter deep-merge semantics, ExporterSet scaling subresource behavior, and the new Exporter.spec.enabled field for graceful scale-down.
Design decisions, reconciliation, test plan, and implementation plan
python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md
Covers design rationale (pool-based scaling, pluggable provisioners, v1 scope), reconciliation loop, instance state transitions, failure modes, test plan, acceptance/graduation criteria, backward compatibility, consequences, rejected alternatives, unresolved questions, and phased implementation strategy.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~15 minutes

Poem

🐇 Hops through the spec with delight,
A warm pool of exporters, just right!
ExporterSets scale up and down,
VirtualTargetClass wears the crown,
The JEP is drafted — review day's bright! ✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title directly and clearly summarizes the main change: documenting a simplified virtual exporter design based on team consensus, which aligns with the primary objective of revising JEP-0014 to reflect agreed v1 design decisions.
Description check ✅ Passed The description is comprehensive and directly related to the changeset, clearly explaining the design revisions (2-CRD model, dictionary config, flash-at-lease semantics, end-to-end flow, external provisioning, implementation plan) and their relationship to PR #744.
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch jep-0014-e2e-flow

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`:
- Around line 1008-1036: The reconciliation loop pseudocode contains conditions
that check replicas < spec.maxReplicas for scaling decisions, but it does not
explicitly handle the case where maxReplicas is 0 or omitted (which per line 745
means no upper bound). Update the pseudocode in the reconciliation loop to
clarify the scaling-up conditions by adding explicit checks that skip the
maxReplicas limit when it is 0 or omitted. Specifically, modify the two scale-up
conditions (the one maintaining minAvailableReplicas and the demand-driven
scale-up) to only apply the maxReplicas comparison when maxReplicas is not 0 and
not omitted, ensuring the logic is consistent with the stated semantics.
- Around line 1182-1191: The "Unresolved Questions" section contains a question
about the exact scaling algorithm (proportional, step-based, predictive) that is
already addressed in the reconciliation loop pseudocode (lines 1013-1036), which
specifies a concrete step-based algorithm. Move the scaling algorithm question
from the "Unresolved Questions" section to the "Resolved" subsection, or
alternatively keep it in Unresolved but clarify that the algorithm strategy
(step-based) is decided while only fine-tuning details (rate limits, backoff
parameters) remain unresolved.
- Around line 500-502: The JEP document references the "native sidecar" pattern
with `restartPolicy: Always` on init containers at multiple locations (around
lines 309 and 500) but does not specify the Kubernetes version requirement. Add
explicit documentation of the minimum Kubernetes version requirement (v1.29 or
later) where the native sidecar pattern is introduced or first referenced. This
can be done by adding a note or requirement section that clearly states the
SidecarContainers feature requires Kubernetes v1.29+ to be stable and GA,
ensuring users understand the infrastructure prerequisites for implementing this
pattern.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: da4b83c6-6ed1-4e0b-b9b9-f87195a315ed

📥 Commits

Reviewing files that changed from the base of the PR and between f19e473 and 355013d.

📒 Files selected for processing (2)
  • python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md
  • python/docs/source/contributing/jeps/index.md

Comment on lines +500 to +502
- **Exporter sidecar** (native sidecar, `restartPolicy: Always`) — starts
first, registers with `jumpstarter-controller`.
- **QEMU runtime container** — started by provisioner; exporter talks to

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

🧩 Analysis chain

🌐 Web query:

What is the current status and minimum Kubernetes version for init container restartPolicy support (KEP-3617)?

💡 Result:

The ability to specify a restartPolicy for init containers was introduced via the "SidecarContainers" feature, which became stable and enabled by default in Kubernetes v1.29 [1]. Current Status: 1. Sidecar Containers: Since Kubernetes v1.29, you can define init containers with restartPolicy: Always. These are treated as "restartable sidecar containers" that remain running for the duration of the Pod's lifecycle [1]. 2. Per-Container Restart Policy: In Kubernetes v1.34, further enhancements were introduced under the ContainerRestartRules feature (alpha). This allows for more granular control, including specifying restartPolicy and restartPolicyRules at the container level [2]. This allows, for example, an init container to be configured to run only once (restartPolicy: Never), ensuring that if it fails, the entire Pod fails, rather than repeatedly restarting the init container [2][3]. Minimum Version: - To use the basic restartable sidecar functionality (restartPolicy: Always on an init container), Kubernetes v1.29 or later is required [1]. - To use the advanced per-container restartPolicy and restartPolicyRules features, Kubernetes v1.34 or later is required (with the ContainerRestartRules feature gate enabled) [2]. KEP-3617 specifically refers to the work that enabled Sidecar Containers [4]. While initial development occurred in earlier versions (often appearing in v1.28 as a beta/feature-gated feature [4][5]), it is officially considered stable and generally available starting in v1.29 [1].

Citations:


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && \
wc -l python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md

Repository: jumpstarter-dev/jumpstarter

Length of output: 152


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && \
sed -n '300,320p' python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md && \
echo "---" && \
sed -n '495,510p' python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md

Repository: jumpstarter-dev/jumpstarter

Length of output: 1731


🏁 Script executed:

cd "$(git rev-parse --show-toplevel)" && \
grep -n -i "kubernetes\|version\|k8s" python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md | head -20

Repository: jumpstarter-dev/jumpstarter

Length of output: 1108


Document minimum Kubernetes version requirement for native sidecar init containers.

Lines 309 and 500 reference a "native sidecar" pattern with restartPolicy: Always on init containers. This requires Kubernetes v1.29 or later (where the SidecarContainers feature became stable and GA). The JEP should explicitly document this minimum version requirement.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`
around lines 500 - 502, The JEP document references the "native sidecar" pattern
with `restartPolicy: Always` on init containers at multiple locations (around
lines 309 and 500) but does not specify the Kubernetes version requirement. Add
explicit documentation of the minimum Kubernetes version requirement (v1.29 or
later) where the native sidecar pattern is introduced or first referenced. This
can be done by adding a note or requirement section that clearly states the
SidecarContainers feature requires Kubernetes v1.29+ to be stable and GA,
ensuring users understand the infrastructure prerequisites for implementing this
pattern.

Comment on lines +1008 to +1036
### Reconciliation Loop

Each `ExporterSet` controller runs a continuous reconciliation loop, triggered by
changes to the set CR, owned Exporters, or matching Leases:

```text
for each ExporterSet CR:
ownedExporters = list Exporters owned by this CR
replicas = count ownedExporters in Ready state
leasedReplicas = count ownedExporters with an active LeaseRef
availableReplicas = replicas - leasedReplicas
pendingLeases = count pending Leases matching spec.selector

# Invariant: maintain minAvailableReplicas warm buffer
if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas:
scale up to restore availableReplicas

# Demand-driven scale-up
elif pendingLeases > 0 AND replicas < spec.maxReplicas:
scale up by min(pendingLeases, spec.maxReplicas - replicas)

# Scale-down: excess idle replicas
elif availableReplicas > spec.minAvailableReplicas AND cooldown elapsed:
graceful scale down:
1. set exporter.spec.enabled = false
2. wait until leaseRef remains empty
3. delete Pod, Exporter CR, and *VirtualTarget
(never below minAvailableReplicas)
```

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Clarify maxReplicas=0 semantics in the reconciliation loop pseudocode.

Line 745 states "maxReplicas: 0 or omitted = no upper bound," but the reconciliation pseudocode (lines 1022–1027) checks replicas < spec.maxReplicas without handling the case where maxReplicas is 0 or omitted. If 0 means "no limit," the comparison should explicitly skip the limit check when maxReplicas is 0 or nil.

🔧 Suggested clarification in pseudocode
  # Invariant: maintain minAvailableReplicas warm buffer
- if availableReplicas < spec.minAvailableReplicas AND replicas < spec.maxReplicas:
+ if availableReplicas < spec.minAvailableReplicas AND (spec.maxReplicas == 0 OR replicas < spec.maxReplicas):
    scale up to restore availableReplicas
  
  # Demand-driven scale-up
- elif pendingLeases > 0 AND replicas < spec.maxReplicas:
+ elif pendingLeases > 0 AND (spec.maxReplicas == 0 OR replicas < spec.maxReplicas):
    scale up by min(pendingLeases, spec.maxReplicas - replicas)
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`
around lines 1008 - 1036, The reconciliation loop pseudocode contains conditions
that check replicas < spec.maxReplicas for scaling decisions, but it does not
explicitly handle the case where maxReplicas is 0 or omitted (which per line 745
means no upper bound). Update the pseudocode in the reconciliation loop to
clarify the scaling-up conditions by adding explicit checks that skip the
maxReplicas limit when it is 0 or omitted. Specifically, modify the two scale-up
conditions (the one maintaining minAvailableReplicas and the demand-driven
scale-up) to only apply the maxReplicas comparison when maxReplicas is not 0 and
not omitted, ensuring the logic is consistent with the stated semantics.

Comment on lines +1182 to +1191
## Unresolved Questions

- What is the exact scaling algorithm (proportional, step-based, predictive)?

### Resolved

- **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013.
- **Lease release detection:** Controllers watch Lease objects directly.
- **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated
leases until effective.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor | ⚡ Quick win

Resolve or clarify the "exact scaling algorithm" unresolved question.

The "Unresolved Questions" section (line 1184) asks "What is the exact scaling algorithm (proportional, step-based, predictive)?" However, the reconciliation loop pseudocode (lines 1013–1036) already specifies a concrete step-based algorithm (scale up by min(pendingLeases, maxReplicas - replicas)). This should be moved to the "Resolved" subsection or clarified to indicate that the algorithm strategy is decided but fine-tuning details (e.g., rate limits, backoff) remain unresolved.

🔧 Suggested edit to mark resolved
 ## Unresolved Questions
 
-What is the exact scaling algorithm (proportional, step-based, predictive)?
-
 ### Resolved
 
+**Scaling algorithm:** Step-based (scale up by pending lease count, respecting `maxReplicas`; scale down after cooldown). Fine-tuning (rate limits, backoff curves) remains for implementation.
+
 **Observability (JEP-0013):** ...
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
## Unresolved Questions
- What is the exact scaling algorithm (proportional, step-based, predictive)?
### Resolved
- **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013.
- **Lease release detection:** Controllers watch Lease objects directly.
- **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated
leases until effective.
## Unresolved Questions
### Resolved
- **Scaling algorithm:** Step-based (scale up by pending lease count, respecting `maxReplicas`; scale down after cooldown). Fine-tuning (rate limits, backoff curves) remains for implementation.
- **Observability (JEP-0013):** Provisioner controllers emit metrics per JEP-0013.
- **Lease release detection:** Controllers watch Lease objects directly.
- **Scheduled leases:** `Spec.BeginTime` on Lease CRs; controllers ignore future-dated
leases until effective.
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@python/docs/source/contributing/jeps/JEP-0014-virtual-scalable-exporters.md`
around lines 1182 - 1191, The "Unresolved Questions" section contains a question
about the exact scaling algorithm (proportional, step-based, predictive) that is
already addressed in the reconciliation loop pseudocode (lines 1013-1036), which
specifies a concrete step-based algorithm. Move the scaling algorithm question
from the "Unresolved Questions" section to the "Resolved" subsection, or
alternatively keep it in Unresolved but clarify that the algorithm strategy
(step-based) is decided while only fine-tuning details (rate limits, backoff
parameters) remain unresolved.

@mangelajo mangelajo changed the title docs(jep-0014): end-to-end flow and revised ExporterSet model docs(jep-0014): simplified virtual exporter design (meeting consensus) Jun 19, 2026
Simplify to a 2-CRD namespaced model with nested parameters, flash-at-lease
semantics, off-cluster provisioning, and a QEMU-first implementation plan.

Co-authored-by: Cursor <cursoragent@cursor.com>
@mangelajo

Copy link
Copy Markdown
Member Author

Superseded: meeting-consensus JEP content was pushed to jep-0014-virtual-scalable-exporters (continues in #744).

@mangelajo mangelajo closed this Jun 19, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant