Optimize Generated Parser Code Size via Table Serialization#59
Conversation
There was a problem hiding this comment.
Code Review
This pull request refactors the parser generation and execution to serialize parser tables into flat static integer slices that are decoded lazily at runtime, significantly reducing compiled binary size. Consequently, the Parser trait methods no longer require a parser instance, and the Context API has been simplified to eliminate passing the parser to methods like feed and accept. Feedback on the changes highlights a critical memory safety concern: the newly introduced public from_usize functions on the generated enums perform unchecked std::mem::transmute operations, which can lead to Undefined Behavior if called with out-of-bounds values. It is highly recommended to add boundary assertions to these functions to ensure safety.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Summary
The generated code for the
Parsertrait (specificallyget_rulesandget_states) previously occupied the majority of the generated source files. This was due to verbose runtime vector and structure initializations (such as nestedIntermediateStateandProductionRulestructures), leading to large compiled binary sizes and slow compilation times.This PR optimizes the compiled code size by serializing the DFA state transition tables and production rules into compact static integer slices (
&[u32]and&[u8]). At runtime, these arrays are lazily unpacked/decoded on-demand inside aOnceLockinitialization block.Key Changes
Emitter (
rusty_lr_parser/src/emit.rs)*TerminalClasses) and non-terminals (*NonTerminals) enums with#[repr(usize)]to guarantee a stable memory layout.from_usize(value: usize) -> Selfhelper method on both enums usingstd::mem::transmute.RULE_NAMES,RULE_PRECEDENCES,RULE_TOKENS_DATA,RULE_TOKENS_OFFSETS).SHIFT_TERM_DATA,SHIFT_TERM_OFFSETS,SHIFT_NONTERM_DATA,SHIFT_NONTERM_OFFSETS,REDUCE_DATA,REDUCE_OFFSETS,RULESET_DATA,RULESET_OFFSETS,CAN_ACCEPT_ERROR).Parser::get_rules()andParser::get_states()to reconstruct the rule list and state transition maps at runtime in a loop on-demand.precedence_to_stream,token_to_stream, andnonterminals_token) to keep the codebase warning-free.Verification & Validation
Code Size Reductions
This change significantly decreases the generated code footprint:
parser_expanded.rs: Reduced by 7,304 lines (~55% reduction).json.rs(diff): Reduced by 3,328 lines.calculator_u8.rs(diff): Reduced by 778 lines.calculator.rs(diff): Reduced by 388 lines.