Optimize Generated Parser Code Size via Table Serialization by ehwan · Pull Request #59 · ehwan/RustyLR

ehwan · 2026-06-19T10:36:11Z

Summary

The generated code for the Parser trait (specifically get_rules and get_states) previously occupied the majority of the generated source files. This was due to verbose runtime vector and structure initializations (such as nested IntermediateState and ProductionRule structures), leading to large compiled binary sizes and slow compilation times.
This PR optimizes the compiled code size by serializing the DFA state transition tables and production rules into compact static integer slices (&[u32] and &[u8]). At runtime, these arrays are lazily unpacked/decoded on-demand inside a OnceLock initialization block.

Key Changes

Emitter (`rusty_lr_parser/src/emit.rs`)

Stable Enum Representation: Decorated the generated terminal classes (*TerminalClasses) and non-terminals (*NonTerminals) enums with #[repr(usize)] to guarantee a stable memory layout.
Index Deserialization: Implemented a safe from_usize(value: usize) -> Self helper method on both enums using std::mem::transmute.
Parser Table Serialization:
- Serialized production rules into flat integer arrays (RULE_NAMES, RULE_PRECEDENCES, RULE_TOKENS_DATA, RULE_TOKENS_OFFSETS).
- Serialized DFA states into flat integer arrays (SHIFT_TERM_DATA, SHIFT_TERM_OFFSETS, SHIFT_NONTERM_DATA, SHIFT_NONTERM_OFFSETS, REDUCE_DATA, REDUCE_OFFSETS, RULESET_DATA, RULESET_OFFSETS, CAN_ACCEPT_ERROR).
Safety Assertions: Added build-time bounds checks during code generation to ensure rule, symbol, and state indices do not exceed their allocated bit-widths.
Lazy Runtime Reconstruction: Rewrote Parser::get_rules() and Parser::get_states() to reconstruct the rule list and state transition maps at runtime in a loop on-demand.
Cleanup: Removed unused closures/variables (precedence_to_stream, token_to_stream, and nonterminals_token) to keep the codebase warning-free.

Verification & Validation

Code Size Reductions

This change significantly decreases the generated code footprint:

parser_expanded.rs: Reduced by 7,304 lines (~55% reduction).
json.rs (diff): Reduced by 3,328 lines.
calculator_u8.rs (diff): Reduced by 778 lines.
calculator.rs (diff): Reduced by 388 lines.

gemini-code-assist

Code Review

This pull request refactors the parser generation and execution to serialize parser tables into flat static integer slices that are decoded lazily at runtime, significantly reducing compiled binary size. Consequently, the Parser trait methods no longer require a parser instance, and the Context API has been simplified to eliminate passing the parser to methods like feed and accept. Feedback on the changes highlights a critical memory safety concern: the newly introduced public from_usize functions on the generated enums perform unchecked std::mem::transmute operations, which can lead to Undefined Behavior if called with out-of-bounds values. It is highly recommended to add boundary assertions to these functions to ensure safety.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

ehwan added 2 commits June 19, 2026 19:30

take1

db99b40

add comments

63f267e

ehwan self-assigned this Jun 19, 2026

ehwan changed the base branch from main to breaking_change June 19, 2026 10:36

gemini-code-assist Bot reviewed Jun 19, 2026

View reviewed changes

Comment thread rusty_lr_parser/src/emit.rs

Comment thread rusty_lr_parser/src/emit.rs

Comment thread scripts/diff/calculator.rs

Comment thread scripts/diff/calculator.rs

Comment thread scripts/diff/calculator_u8.rs

Comment thread scripts/diff/calculator_u8.rs

assert value max size in from_usize

3d39b2a

ehwan merged commit decbb8e into breaking_change Jun 19, 2026

ehwan deleted the compress_parser_init_code branch June 19, 2026 10:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize Generated Parser Code Size via Table Serialization#59

Optimize Generated Parser Code Size via Table Serialization#59
ehwan merged 3 commits into
breaking_changefrom
compress_parser_init_code

ehwan commented Jun 19, 2026

Uh oh!

gemini-code-assist Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

ehwan commented Jun 19, 2026

Summary

Key Changes

Emitter (rusty_lr_parser/src/emit.rs)

Verification & Validation

Code Size Reductions

Uh oh!

gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Emitter (`rusty_lr_parser/src/emit.rs`)