Skip to content

PLT-460: Distribution samplers (uniform, zipfian θ)#46

Open
bdchatham wants to merge 5 commits into
mainfrom
brandon2/plt-460-distribution-samplers
Open

PLT-460: Distribution samplers (uniform, zipfian θ)#46
bdchatham wants to merge 5 commits into
mainfrom
brandon2/plt-460-distribution-samplers

Conversation

@bdchatham

Copy link
Copy Markdown
Contributor

Implements PLT-460 — the real SampleIndex samplers behind the PLT-455 Distribution type, drawing from PLT-456 seeded sub-streams.

What

  • Uniform: O(1) draw into [0,n).
  • Zipfian(θ): YCSB precomputed-zeta — zeta computed once (O(N)) and cached, O(1) per draw. θ→0 collapses to uniform, θ→1 heavy hotspot. Smallest-term-first summation for numerical stability at large N.
  • Seeded via Distribution.SetStream (mirrors GasPicker.SetStream); new dist:%d:key / dist:%d:size stream ids added to the FROZEN contract (append-only, non-perturbing). Unbound → global RNG (preserves the unseeded path).

Review (systems + idiom)

  • Zipfian PMF independently validated against true Zipf; top-k skew monotone in θ (θ=0→1%, 0.5→9.5%, 0.9→42%). Determinism + per-stream-multiset contract upheld; -race clean.
  • Fixes applied: guarded a masked eta NaN at n≤2; documented the n-must-be-stable contract; renamed a builtin-shadowing helper + a misleading field; marked the mutex-bearing struct copy-unsafe.

Decision brief: designs/sei-load-workload-modeler/PLT-460-distribution-samplers.md.

🤖 Generated with Claude Code

bdchatham and others added 2 commits June 12, 2026 07:53
Implement the real index samplers behind the frozen Distribution wire
format from PLT-455, bound to the PLT-456 seeded sub-streams.

Uniform: Stream.Uint64N(n) seeded, rand.Uint64N(n) unbound.

Zipfian: YCSB precomputed-zeta generator. zeta(n, theta) is computed
once per keyspace size n in O(n) and cached (summed smallest-term-first
for numerical stability at n=1e6); each draw is O(1). n arrives at
sample time, so the cache fills lazily and recomputes only if n changes.

Stream binding mirrors GasPicker.SetStream: Distribution.SetStream
type-switches the delegate; a nil stream falls back to the global RNG.
bindDistributionStreams wires the per-scenario streams in the generator.

FROZEN contract: added stream ids "dist:%d:key" / "dist:%d:size"
(append-only; documented in rng.go input #2 and streams.go).

Tests: chi-square uniformity; top-1% mass rising monotonically with
theta (1% -> 9.5% -> 42% -> 53%); per-stream determinism; seeded !=
unseeded binding guard; n=0 error; NaN/range sweep across theta; n=1e6
init+1000 draws bounded (~50ms). go build + go test -race green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Guard eta against the n<=2 NaN (denom==0) by pinning it to 0; it is
provably never read at n<=2 but a NaN in cached state is a refactor
hazard. Document the n-stability contract on SampleIndex so PLT-465 can
not silently trigger per-draw O(n) zeta recomputes. Rename thetaPow2 ->
halfPowTheta (it holds 0.5^theta), inline the RNG fallback to mirror
UniformDistribution/gas.go, and mark the struct not copy-safe.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@cursor

cursor Bot commented Jun 12, 2026

Copy link
Copy Markdown

PR Summary

Medium Risk
Changes workload randomness and adds frozen RNG stream ids that affect replay compatibility; zipfian math and caching are subtle but heavily tested and documented.

Overview
Replaces the PLT-455 Distribution stubs with real uniform and YCSB zipfian(θ) index samplers, wired for seeded replay like the gas pickers.

Uniform draws in [0, n) via rng.Stream.Uint64N or the global RNG when unbound. Zipfian uses precomputed zeta(n, θ) (cached per n under a mutex), O(1) draws, smallest-term-first summation, and guards for n ≤ 2 / float rounding. Distribution.SetStream binds samplers; n == 0 now returns an error instead of silently returning 0.

The generator calls bindDistributionStreams for each scenario’s KeyDistribution / SizeDistribution using new append-only frozen stream ids dist:%d:key and dist:%d:size. Package config/doc.go documents wire format, math, and reproducibility contracts.

Tests add determinism, seeded vs unseeded, chi-square uniform check, zipfian skew vs θ, init cost, and numerical edge cases.

Reviewed by Cursor Bugbot for commit f638310. Bugbot is set up for automated code reviews on this repo. Configure here.

- Drop the pointless JSON unmarshal into the field-less UniformDistribution
  (its only field, the seeded stream, is bound via SetStream, not JSON) — SA9005.
- Remove the pointless math.IsNaN on a uint64-derived float in the zipfian test;
  the in-range assertion is the real guard against a NaN-derived draw — SA4015.

Caught by golangci-lint (CI gate); local build+test+vet did not run it.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
bdchatham and others added 2 commits June 12, 2026 13:32
Move the dense distribution narrative (zipfian zeta math, precompute-once
design, numerical stability, frozen wire format, seeded-stream reproducibility)
into a new package-level doc.go and lean distribution.go's comments to terse
pointers. No behavior change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…parallelism

- config/doc.go: 'sub-stream' was split across lines, rendering as 'sub- stream'
  under go doc; reflow to 'substream'.
- distribution.go: restore the '(no Name)' gloss on SampleIndex for parallelism
  with SetStream and the package doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@bdchatham bdchatham requested review from amir-deris and masih June 12, 2026 22:48
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant