Fix df.loop: run loops as child sub-orchestrations and fix cross-generation child ID collision#228
Open
Copilot wants to merge 1 commit into
Open
Fix df.loop: run loops as child sub-orchestrations and fix cross-generation child ID collision#228Copilot wants to merge 1 commit into
Copilot wants to merge 1 commit into
Conversation
Copilot
AI
changed the title
[WIP] Fix df.loop continue_as_new restart behavior
Fix df.loop continue_as_new restarting from graph root when loop is not root
Jun 11, 2026
pinodeca
requested changes
Jun 12, 2026
pinodeca
left a comment
Contributor
There was a problem hiding this comment.
Regarding your statement:
A sub-orchestration approach was evaluated but is blocked by a duroxide 0.1.29
limitation: ContinueAsNew drops the parent link, so the parent orchestration never
receives the child's completion.
I created this PR in duroxide to unblock that approach:
microsoft/duroxide#31
Therefore, re-evaluate and:
- if the approach is still blocked, explain why and what would have to change to unblock it.
- if the approach is unblocked, re-implement using the sub-orchestration approach and when you're done update the PR description accordingly.
a593965 to
754f1f2
Compare
This was
linked to
issues
Jun 16, 2026
This was referenced Jun 16, 2026
7 tasks
Contributor
|
If microsoft/duroxide#33 merges and is part of the next release, this PR can revert back to using Also, note that this PR should solve #230 and #233 (along with #227, which it initially targeted). Let's validate that by adding regression tests for the scenarios described in 230 and 233. |
pinodeca
added a commit
that referenced
this pull request
Jun 27, 2026
Squash of the node-state-model proposal, the execution-id proposal, and the temporary exec-id implementation plan. Rebased onto the df.loop sub-orchestration fix (PR #228, copilot/fix-loop-restart-issue).
3a4255f to
b26327a
Compare
pinodeca
added a commit
that referenced
this pull request
Jun 28, 2026
Squash of the node-state-model proposal, the execution-id proposal, and the temporary exec-id implementation plan. Rebased onto the df.loop sub-orchestration fix (PR #228, copilot/fix-loop-restart-issue).
pinodeca
added a commit
that referenced
this pull request
Jun 28, 2026
Squash of the node-state-model proposal, the execution-id proposal, and the temporary exec-id implementation plan. Rebased onto the df.loop sub-orchestration fix (PR #228, copilot/fix-loop-restart-issue).
b26327a to
3aed223
Compare
…s_new df.loop called continue_as_new inline in the main orchestration, so every new generation restarted from graph.root_node_id, re-executing prefix nodes on every iteration. Each df.loop() node now spawns a dedicated child sub-orchestration (execute_loop) that owns continue_as_new; the parent awaits it and runs any suffix nodes exactly once. Relies on duroxide PR #31 (parent link preserved across continue_as_new), pulled in as a git dependency until merged and released. Co-authored-by: copilot-swe-agent <copilot@github.com> Co-authored-by: pinodeca <32303022+pinodeca@users.noreply.github.com>
3aed223 to
a348916
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Two related defects in how
df.loopdrivescontinue_as_new:Non-root loops restarted from the graph root.
df.loopcalledcontinue_as_newinline in the main orchestration, so every new generation restarted fromgraph.root_node_id— re-executing prefix nodes on every iteration. Forprefix ~> df.loop(body), the prefix ran once per iteration instead of once per instance.Loops that spawned sub-orchestrations hung forever across generations. Once loops run as child orchestrations (see below), a loop body that itself spawns sub-orchestrations — a nested
df.loop, or aJOIN/RACEbranch — would stall. Duroxide's auto-generated child instance IDs ({parent}::sub::{event_id}) reset their event counter on eachcontinue_as_newgeneration, so every loop generation re-derived the same child ID and collided with the previous (now terminal) child. The collided child was acked without processing and the loop never made progress.Approach
Loops run as child sub-orchestrations
Each
df.loop()node spawns a dedicated child sub-orchestration (execute_loop, registered aspg_durable::orchestration::execute-loop). The child handles all iterations viacontinue_as_new; when the loop exits it returns aSubtreeEnvelopeto the parent. The parent orchestration awaits the child and continues with any suffix nodes — so prefix nodes run exactly once.This is made possible by duroxide PR #31, which preserves the parent link across
continue_as_newgenerations in a child orchestration. Theduroxidedependency is switched to a git dependency on thepinodeca/continue-parent-linkbranch until that PR is merged and released.Security is preserved:
execute_loopcallsload_function_graphat the start of every generation (including aftercontinue_as_new), re-validatingsubmitted_byfrom the database and catching cross-iteration tampering exactly as the parentexecute()does.Generation-qualified child instance IDs
To fix the cross-generation collision, all sub-orchestration spawn sites (loop, join, join3 extras, race) now use
schedule_sub_orchestration_with_idwith a deterministic, generation-qualified child ID built by a newchild_instance_id()helper:execution_idincrements on eachcontinue_as_new, so a child spawned in generation N never collides with the terminal child from generation N-1, while the ID stays deterministic across replays of the same generation.Changes
Cargo.toml/Cargo.lock: switchesduroxideto a git dependency onmicrosoft/duroxidebranchpinodeca/continue-parent-link; adds[patch.crates-io]soduroxide-pg's transitive dependency resolves to the same version. Revert to crates.io pins once PR [DO NOT MERGE] Add native Duroxide provider, stop using duroxide-pg-opt #31 is merged and released.src/orchestrations/execute_function_graph.rsexecute_loop(new public sub-orchestration): runs iterations viacontinue_as_new; re-callsload_function_graphper generation for security; returns aSubtreeEnvelopeon exit (a break inside the body is the loop's own terminator, so the loop always exits withNormalcontrol)execute_loop_node: spawnsexecute_loopand merges results back viaparse_subtree_envelopechild_instance_id(new helper) +schedule_sub_orchestration_with_idat the loop, join, and race spawn sites — fixes the cross-generation child ID collisionsrc/registry.rs: registersLOOP_NAMEalongsideNAMEandSUBTREE_NAMEUSER_GUIDE.md: notes that eachdf.loop()now runs as its own child orchestration (prefix runs once, suffix runs after the loop exits, anddf.instancesshows the parent plus the loop's child instance)tests/e2e/sql/24_nonroot_loop.sql: regression tests covering:prefix ~> loop— prefix runs once, body runs N timesprefix ~> loop ~> suffix— prefix and suffix each run oncedf.loop(... ~> df.loop(...) ...)) — exercises the cross-generation child ID fixprefix ~> loop(body, while-condition) ~> suffixexits via the while conditionUpgrade & Migration
No extension schema changes and no upgrade script changes. The new
.soreads the samedfschema as before; loop execution is entirely a runtime/orchestration change. The only dependency change is the temporaryduroxidegit pin noted above.