PLT-458: Open-loop scheduler (replace closed-loop dequeue)#48
PLT-458: Open-loop scheduler (replace closed-loop dequeue)#48bdchatham wants to merge 10 commits into
Conversation
Make transaction arrival open-loop to fix coordinated omission: tx i is issued at t₀ + i/λ independent of in-flight completion, so a slow SUT no longer slows the generator and hides backlog in latency. - sender/scheduler.go: openLoopScheduler owns t₀ and the monotonic sequence index i, derives λ from the shared rate.Limiter as a clock source (sampled per tick to honor a ramping λ; telescopes to t₀ + i/λ at fixed λ), and stamps IntendedSendTime at the true scheduled instant. - Overflow is bounded-in-flight + drop-and-count: a non-blocking semaphore TryAcquire admits the tx or drops-and-counts it; the arrival clock is never blocked on capacity (REL8/REL9 load shedding). - One rate authority preserved: the ramper still drives λ via limiter.SetLimit; the worker's busy-spin Allow() gate is replaced by a blocking Wait (closed-loop only) and disabled under open-loop. - Behind config flag arrival-model (default closed_loop, the regression baseline) + max-in-flight; arrival_model and run_txs_dropped_total are recorded at run end. - types.LoadTx gains SequenceIndex (scheduler-owned, single-write per the documented concurrency contract) for PLT-463 schedule-lag attribution; dropped txs carry zero InclusionTime and are kept out of inclusion-rate denominators. Tests: schedule-accuracy (tracks t₀ + i/λ within tolerance), clock not throttled by a slow sender (overrun dropped not blocked), ramped-λ gap shrink, and stamp-before-handoff under -race. go build + go test -race green. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… bound B1: reject open_loop without a finite positive arrival rate. With TPS=0 and no ramp, λ=rate.Inf, the inter-arrival gap collapses to 0, IntendedSendTime never advances past t₀, and the scheduler spins and drops everything. Add config.Settings.Validate (TPS>0 or --ramp-up required for open_loop) and call it after ResolveSettings; fail fast with a clear error. minScheduleRate stays as divide-by-zero/+Inf defense-in-depth only. B2: tie the in-flight permit to real send completion, not enqueue. Worker.Send (via ShardedSender) returns at enqueue, so the prior defer release() bounded enqueue backlog, not unacked sends — and dropped reflected buffer geometry. Thread a LoadTx.OnComplete hook the worker invokes after sendTransaction; the scheduler stamps it to release the permit so maxInFlight bounds true in-flight and dropped measures genuine load-shed. Enqueue-failure path completes inline. schedule_lag (PLT-463) remains the primary CO-detection gate. Also: reject unrecognized --arrival-model at config load; drop the SequenceIndex self-disambiguation claim (gate on run-level arrival_model); state the dropped-tx inclusion-denominator invariant plainly; fix relase typo. Tests: replace the synchronous fakeSender with an async enqueue-and-complete sender so the slow-sender drop test exercises production semantics; add the permit-held-until-completion guard, a conservation invariant, and config validation rejection tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
PR SummaryMedium Risk Overview Open-loop path: a new scheduler issues arrivals at t₀ + i/λ (absolute sleep, λ from the shared limiter/ramper), stamps Wiring: Large scheduler/unit and real-worker test coverage exercises schedule accuracy, conservation, determinism under drops, permit lifecycle, and prewarm pacing. Reviewed by Cursor Bugbot for commit 5399f90. Bugbot is set up for automated code reviews on this repo. Configure here. |
Omit the redundant time.Duration type from the minGap declaration in scheduler_test (inferred from time.Hour). Caught by golangci-lint (CI gate). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…-loop-scheduler # Conflicts: # main.go # sender/sharded_sender.go # sender/worker.go
Every existing scheduler test drove a fake TxSender that fired tx.OnComplete itself, so the suite stayed green even if the real Worker forgot to invoke it in runTxSender — leaking the open-loop in-flight semaphore (permits never released → the maxInFlight bound becomes meaningless). Add an httptest JSON-RPC harness (answers eth_sendRawTransaction, the only RPC the ethclient send path issues) behind the real Worker + open-loop scheduler: - Conservation on the real path: issued == completed + dropped, with completed driven by the real worker's OnComplete; handled sends matched against the real RPC server's count. - Permit released by the worker: maxInFlight=1 with a server that blocks one send holds exactly one in flight and drops the rest; releasing it resumes flow — which is only possible if the worker fires OnComplete. Both tests fail when the OnComplete invoke is removed (verified). Also clarify the scheduler doc: enqueue is async but the RPC send is synchronous, so the permit is held for the full round-trip. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The conservation test sampled `succeeded == handled` against an in-flight window: the httptest server bumps `handled` on receiving eth_sendRawTransaction, but the worker bumps `succeeded` only after SendTransaction returns and OnComplete fires. CI's slower scheduling caught that window (handled=200, succeeded=199). The dominant cause was teardown ordering, not pure sampling: the scheduler ran as the scope's main task, so the instant it exhausted the generator and returned, service.Run canceled the worker's context — aborting the last send whose 200 OK the server had already counted. That send completed with context-canceled, so completed++ but not succeeded++. Fix: run the scheduler and worker as background tasks behind a main gate that blocks until the test tears down, so the scope stays alive until quiescence. Anchor the assertion on the fixed total and require exhaustion, conservation, and equality together in one predicate, evaluated only once they all hold (a stable fixpoint, since the counters are monotonic and no new work is issued after exhaustion). Correctness depends on convergence, not the deadline. Verified: go test -race -count=50 and -count=20 -cpu=1,2,4 green; GOMAXPROCS=1 and =2 green. Falsification holds — commenting out the OnComplete invoke in runTxSender fails both tests. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Move the dense coordinated-omission / arrival-model narrative out of scheduler.go into a new sender/doc.go package doc, and lean the inline comments to terse pointers. No behavior change. The load-bearing inline notes stay (leaned) at the code they guard: the worker's OnComplete permit-release and the single-writer stamp-before- hand-off contract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Bugbot finding 1: the scheduler called Generate() before TryAcquire, so a dropped tick still advanced the seeded generator streams and signed a tx that was then discarded. Under saturation the same seed produced a different admitted workload depending on SUT speed, violating the per-stream reproducibility contract (PLT-456). Reorder so the in-flight permit is acquired first; a dropped tick now consumes zero draws and zero signing CPU, making admitted txs a deterministic prefix of the seeded sequence. The arrival clock stays non-blocking/unthrottled. Add Admitted() so conservation anchors on scheduled-arrival counts (admitted + dropped), not generator-draw counts, since draws now occur only on admitted ticks. SequenceIndex remains the arrival-tick index i (monotonic, non-contiguous under drops) so IntendedSendTime = t0 + i/λ holds. New determinism guard test asserts admitted draws are a gapless prefix under forced drops. Bugbot finding 2: an admitted send that failed in the worker was logged but counted as neither sent nor dropped, breaking issued == sent + dropped under RPC failures. Add a failed counter; onSent increments succeeded on nil err and failed on non-nil err (both release the permit). Thread failed into the run summary (new run_txs_failed_total gauge) for goodput accounting. Conservation invariant documented and tested: scheduled == dropped + admitted, and admitted == succeeded + failed; the conservation test injects a failing send and asserts it is counted as failed. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
Cursor Bugbot has reviewed your changes using default effort and found 1 potential issue.
There are 2 total unresolved issues (including 1 from previous review).
❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.
Reviewed by Cursor Bugbot for commit fe8df95. Configure here.
… vocabulary Trim the dropped/admitted field-comment block and DispatcherStats/RunSummary comments to at-site facts plus a pointer to doc.go's Conservation section, which owns the invariant. Unify prose to scheduled = dropped + admitted; admitted = succeeded + failed. Make Admitted()'s doc honest: test/audit only, no production consumer. Comment-only; no identifier or metric renames. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
|
Both Cursor Bugbot findings addressed (commit fe8df95, polish in 30c2578): 1. Drop path still generates transactions — fixed. The scheduler tick now acquires the in-flight permit ( 2. Failed sends break issued accounting — fixed. Added a Both verified under |
In open-loop the workers are constructed RateLimited=false because the scheduler owns the arrival clock, but the scheduler paces only the main load. Prewarm runs first over those same ungated workers, so it flooded the SUT unthrottled. Pace prewarm off the shared limiter in the dispatcher: closed-loop leaves d.limiter nil and the worker still gates, so neither phase is double-throttled and the main loop stays scheduler-paced. Correct the conservation test to the two-level invariant from doc.go: scheduled == dropped + admitted (using the scheduler counters, since a dropped tick no longer draws Generate) and admitted == succeeded + failed. Add a real-worker prewarm-rate-limit guard. Trim worker.go/dispatcher.go inline narration that doc.go owns, keeping the load-bearing notes (OnComplete CRITICAL, single-writer stamp, per-iteration context cancel, conservation counters). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Implements PLT-458 — the core of the coordinated-omission fix. Issues at
t_i = t₀ + i/λindependent of in-flight completion, so latency no longer hides the backlog.What
sender/scheduler.go(openLoopScheduler): arrival clock att₀ + i/λ(absolute-instant sleep → drift-free), stampsIntendedSendTimeat the true scheduled instant + a per-txSequenceIndex. Workers become pure async senders (thelimiter.Allow()busy-spin is gone).LoadTx.OnCompletehook fired by the worker aftersendTransaction), sodroppedmeasures genuine load-shed, not buffer geometry.rate.Limiter. Behind a config flag (--arrival-model), with the legacy closed-loop path retained as the regression baseline.arrival_modelrecorded.Review (systems + measurement + idiom, two rounds)
The loop caught and fixed two blocking issues before merge:
TPS=0→rate.Inf→gap=0→ a degenerate constant latency anchor. Now rejected at config validation (open-loop requires finite positive λ:TPS>0or a ramp).droppedmeasured buffer geometry and a synchronous test masked it. Fixed to release at real send completion; test now uses an async sender + a directPermitHeldUntilCompletionguard. Verified: concurrency-correct (sync.Once, happens-before intact,-race -count=20clean), conservationissued == sent + droppedholds.schedule_lag(PLT-463) remains the primary CO-detection gate. Forward-note: inclusion-rate denominators must usesent, neverissued.Decision brief:
designs/sei-load-workload-modeler/PLT-458-open-loop-scheduler.md.🤖 Generated with Claude Code