Skip to content

Optimize Generated Parser Code Size via Table Serialization#59

Merged
ehwan merged 3 commits into
breaking_changefrom
compress_parser_init_code
Jun 19, 2026
Merged

Optimize Generated Parser Code Size via Table Serialization#59
ehwan merged 3 commits into
breaking_changefrom
compress_parser_init_code

Conversation

@ehwan

@ehwan ehwan commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Summary

The generated code for the Parser trait (specifically get_rules and get_states) previously occupied the majority of the generated source files. This was due to verbose runtime vector and structure initializations (such as nested IntermediateState and ProductionRule structures), leading to large compiled binary sizes and slow compilation times.
This PR optimizes the compiled code size by serializing the DFA state transition tables and production rules into compact static integer slices (&[u32] and &[u8]). At runtime, these arrays are lazily unpacked/decoded on-demand inside a OnceLock initialization block.

Key Changes

Emitter (rusty_lr_parser/src/emit.rs)

  • Stable Enum Representation: Decorated the generated terminal classes (*TerminalClasses) and non-terminals (*NonTerminals) enums with #[repr(usize)] to guarantee a stable memory layout.
  • Index Deserialization: Implemented a safe from_usize(value: usize) -> Self helper method on both enums using std::mem::transmute.
  • Parser Table Serialization:
    • Serialized production rules into flat integer arrays (RULE_NAMES, RULE_PRECEDENCES, RULE_TOKENS_DATA, RULE_TOKENS_OFFSETS).
    • Serialized DFA states into flat integer arrays (SHIFT_TERM_DATA, SHIFT_TERM_OFFSETS, SHIFT_NONTERM_DATA, SHIFT_NONTERM_OFFSETS, REDUCE_DATA, REDUCE_OFFSETS, RULESET_DATA, RULESET_OFFSETS, CAN_ACCEPT_ERROR).
  • Safety Assertions: Added build-time bounds checks during code generation to ensure rule, symbol, and state indices do not exceed their allocated bit-widths.
  • Lazy Runtime Reconstruction: Rewrote Parser::get_rules() and Parser::get_states() to reconstruct the rule list and state transition maps at runtime in a loop on-demand.
  • Cleanup: Removed unused closures/variables (precedence_to_stream, token_to_stream, and nonterminals_token) to keep the codebase warning-free.

Verification & Validation

Code Size Reductions

This change significantly decreases the generated code footprint:

  • parser_expanded.rs: Reduced by 7,304 lines (~55% reduction).
  • json.rs (diff): Reduced by 3,328 lines.
  • calculator_u8.rs (diff): Reduced by 778 lines.
  • calculator.rs (diff): Reduced by 388 lines.

@ehwan ehwan self-assigned this Jun 19, 2026
@ehwan ehwan changed the base branch from main to breaking_change June 19, 2026 10:36

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request refactors the parser generation and execution to serialize parser tables into flat static integer slices that are decoded lazily at runtime, significantly reducing compiled binary size. Consequently, the Parser trait methods no longer require a parser instance, and the Context API has been simplified to eliminate passing the parser to methods like feed and accept. Feedback on the changes highlights a critical memory safety concern: the newly introduced public from_usize functions on the generated enums perform unchecked std::mem::transmute operations, which can lead to Undefined Behavior if called with out-of-bounds values. It is highly recommended to add boundary assertions to these functions to ensure safety.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread rusty_lr_parser/src/emit.rs
Comment thread rusty_lr_parser/src/emit.rs
Comment thread scripts/diff/calculator.rs
Comment thread scripts/diff/calculator.rs
Comment thread scripts/diff/calculator_u8.rs
Comment thread scripts/diff/calculator_u8.rs
@ehwan ehwan merged commit decbb8e into breaking_change Jun 19, 2026
@ehwan ehwan deleted the compress_parser_init_code branch June 19, 2026 10:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant