Skip to content

perf(encode): reuse the LDM producer table across dict frames#440

Merged
polaz merged 2 commits into
mainfrom
perf/compress-dict-ldm-realloc
Jun 21, 2026
Merged

perf(encode): reuse the LDM producer table across dict frames#440
polaz merged 2 commits into
mainfrom
perf/compress-dict-ldm-realloc

Conversation

@polaz

@polaz polaz commented Jun 21, 2026

Copy link
Copy Markdown
Member

Summary

Two compress-dict / LDM follow-ups on the reused-encoder path (the CoordiNode
per-label-dictionary shape):

  1. Reuse the LDM producer table across frames. On the reused-encoder
    compress-dict path, reset built a fresh LdmProducer every frame,
    re-allocating its hash table (large at btultra2), while upstream zstd reuses
    its ldmState_t. Reuse the existing allocation when the derived LDM params
    are unchanged (clear() re-zeros without allocating; only a params change or
    the first frame builds a new one).

  2. Run the wasm dict bench over all benched levels. The wasm dictionary
    bench looped [3, 19] while the non-dict bench covers [1, 3, 19, 22], so
    levels 1 and 22 had no dict coverage. Aligned to LEVELS.

Impact

  • compress-dict/level_22_btultra2_ldm timing: −3.84% (criterion, i9;
    "Performance has improved", 31.66 → 30.45 µs vs base). Byte-identical output.
  • wasm: levels 1 and 22 now exercised for dict compress/decompress; verified
    L1/L22 dict round-trip on the wasm build (L22 gives the best ratio).

Testing

  • cargo nextest run -p structured-zstd --features "hash dict_builder"823/823
    pass
    , including the dict-primed byte-identity and reused-encoder tests.
  • wasm dict round-trip verified for all four levels against the simd128 payload.

Summary by CodeRabbit

  • Tests

    • Expanded dictionary compression benchmarks to cover all compression levels for more comprehensive testing.
  • Refactor

    • Optimized compression engine efficiency by improving memory allocation reuse across operations, reducing overhead.

polaz added 2 commits June 21, 2026 17:15
On the reused-encoder compress-dict path, reset built a fresh LdmProducer every
frame, re-allocating its hash table (large at btultra2). The producer is
designed for reuse (clear() re-zeros without allocating); reuse the existing
allocation when the derived LDM params are unchanged, matching upstream zstd's
ldmState_t lifecycle. clear() keeps the table correct for the new frame.
The dictionary bench looped [3, 19] while the non-dict bench covers
[1, 3, 19, 22], so levels 1 and 22 had no dict coverage. Align to LEVELS;
verified L1/L22 dict round-trip on the wasm build.
@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 020fa367-a116-47ea-b879-be668155a05c

📥 Commits

Reviewing files that changed from the base of the PR and between aed82fc and ab93dc9.

📒 Files selected for processing (2)
  • zstd-wasm/bench/bench.mjs
  • zstd/src/encoding/match_generator.rs

📝 Walkthrough

Walkthrough

The PR makes two changes: the dictionary benchmark in bench.mjs now iterates over the shared LEVELS constant (replacing a hardcoded [3, 19] subset) in both the computation and reporting loops, and MatchGeneratorDriver::reset in match_generator.rs is updated to reuse an existing LdmProducer allocation when its params() match the newly derived LDM parameters, calling clear() instead of always constructing a new producer.

Changes

Dictionary LDM reuse and benchmark coverage

Layer / File(s) Summary
Dictionary benchmark level iteration
zstd-wasm/bench/bench.mjs
Both the dictRows computation loop and the per-scenario results reporting loop now iterate over LEVELS instead of the hardcoded [3, 19] array.
LdmProducer conditional reuse in MatchGeneratorDriver::reset
zstd/src/encoding/match_generator.rs
In the HashChain backend path, derived LDM parameters are computed first; the existing producer is extracted via hc.take_ldm_producer(), its params() are compared to the derived value, and either clear() is called to reuse it or a fresh LdmProducer::new(p) is constructed before storing back via hc.set_ldm_producer.

Estimated code review effort

🎯 2 (Simple) | ⏱️ ~10 minutes

Possibly related issues

Possibly related PRs

  • structured-world/structured-zstd#397: Both PRs modify MatchGeneratorDriver::reset in match_generator.rs around HashChain LDM producer creation and reset during dictionary (primed) restore, making the LDM state management logic directly overlap.

Poem

🐇 Hop, hop, no more waste!
Old producer, same params? Clear and paste!
LEVELS now run full and wide,
Dict benchmarks bloom on every side.
No fresh alloc when old will do—
This bunny optimized for you! 🌿

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and accurately reflects the main performance optimization introduced in the PR: reusing the LDM producer table across dictionary-compressed frames instead of allocating a new one per frame.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch perf/compress-dict-ldm-realloc

Comment @coderabbitai help to get the list of available commands and usage tips.

@codecov

codecov Bot commented Jun 21, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@greptile-apps

greptile-apps Bot commented Jun 21, 2026

Copy link
Copy Markdown

Greptile Summary

This PR improves performance on the reused-encoder compress-dict path by reusing the LdmProducer allocation across frames when the derived LDM params are unchanged, mirroring upstream zstd's ldmState_t reuse. It also extends the wasm dict benchmark to cover all four compression levels (1, 3, 19, 22) instead of only 3 and 19.

  • match_generator.rs: Moves LDM param resolution outside the if let MatcherStorage::HashChain block (needed to satisfy the borrow checker), then reuses the existing producer via take/clear/set when params match, or falls back to LdmProducer::new on a first frame or params change. clear() is called explicitly for correctness even though BtMatcher::reset already did it, making it a harmless double-clear.
  • bench.mjs: Replaces the hardcoded [3, 19] loop in both dict-bench loops with the shared LEVELS = [1, 3, 19, 22] constant, giving dict compress/decompress coverage at levels 1 and 22.

Confidence Score: 5/5

Safe to merge; both changes are narrow and well-reasoned, and the Rust logic is backed by matching test coverage (823/823 pass) and byte-identical output verification.

The Rust refactor correctly handles all cases (first frame, params-unchanged reuse, params change, LDM disabled, non-BT strategy) and the borrow-checker split is a textbook Rust pattern. The double clear() is redundant but harmless. The bench change is a one-line constant substitution with no risk of regression.

No files require special attention.

Important Files Changed

Filename Overview
zstd/src/encoding/match_generator.rs Restructures the LDM producer wiring block to reuse the existing allocation when derived params are unchanged, avoiding per-frame re-allocation of the large LDM hash table on the reused-encoder dict path.
zstd-wasm/bench/bench.mjs Aligns the dict bench loop variable from a hardcoded [3, 19] to the shared LEVELS constant so levels 1 and 22 receive dict-compress/decompress coverage.

Comments Outside Diff (1)

  1. zstd/src/encoding/match_generator.rs, line 2635-2641 (link)

    P2 The surrounding comment still says "installing here gives the new frame a fresh producer," but the new behaviour may reuse the existing cleared allocation rather than always installing a freshly-allocated one. The semantic outcome is the same (a correctly initialised producer), but the comment is slightly misleading after this change.

    Note: If this suggestion doesn't match your team's coding style, reply to this and let me know. I'll remember it for next time!

Reviews (1): Last reviewed commit: "test(wasm-bench): run dict compress/deco..." | Re-trigger Greptile

@polaz

polaz commented Jun 21, 2026

Copy link
Copy Markdown
Member Author

@coderabbitai review

@coderabbitai

coderabbitai Bot commented Jun 21, 2026

Copy link
Copy Markdown
✅ Action performed

Review finished.

Note: CodeRabbit is an incremental review system and does not re-review already reviewed commits. This command is applicable only when automatic reviews are paused.

@polaz polaz merged commit dc9e3f7 into main Jun 21, 2026
28 checks passed
@polaz polaz deleted the perf/compress-dict-ldm-realloc branch June 21, 2026 16:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant