Skip to content

Support box Keyword in %tokentype and NonTerminal RuleTypes#56

Merged
ehwan merged 1 commit into
mainfrom
autobox
Jun 19, 2026
Merged

Support box Keyword in %tokentype and NonTerminal RuleTypes#56
ehwan merged 1 commit into
mainfrom
autobox

Conversation

@ehwan

@ehwan ehwan commented Jun 19, 2026

Copy link
Copy Markdown
Owner

Description

This PR introduces support for the box keyword in front of %tokentype and NonTerminal RuleType definitions (e.g., %tokentype box MyToken; or Expr(box MyLargeASTNode)).

Motivation

In RustyLR, all semantic values are stored in a single unified Data enum representing the parser's stack. Because a Rust enum's size is dictated by its largest variant, any large type inflated the footprint of all stack slots, leading to performance and memory overhead. Wrapping those types in Box manually solved this but was tedious to write since it required manually wrapping/unwrapping Box inside reduce actions.
This change automates the process:

  • Marking %tokentype or a rule's return type with box generates ::std::boxed::Box<Type> internally in the Data stack enum.
  • Popped values are automatically dereferenced (*val), exposing the raw unboxed types to reduce actions.
  • Reduced action results are automatically wrapped in ::std::boxed::Box::new(...) when pushed back onto the stack.
  • The parser's final return value (pop_start) is also automatically unboxed.
    This achieves minimal data stack enum layout size with zero manual boilerplate code.

Changes

rusty_lr_parser

  • nonterminal_info.rs: Added the ruletype_boxed flag to NonTerminalInfo.
  • grammar.rs:
    • Added is_tokentype_boxed to Grammar.
    • Implemented check_and_strip_box helper to detect and strip the box keyword from types.
    • Modified Grammar::from_grammar_args and is_placeholder_type to parse and strip the box prefix.
    • Added test_box_keyword_parsing to verify metadata parsing and code emission assertions.
  • pattern.rs: Updated NonTerminalInfo helper rules to default ruletype_boxed to false.
  • emit.rs:
    • Wrapped boxed types in ::std::boxed::Box<Type> within the generated Data enum.
    • Generated automatic dereferences (*val) when popping boxed data stack values.
    • Generated automatic wrapping (::std::boxed::Box::new(...)) when pushing boxed rule outputs.
    • Handled terminal token box wrapping on push and start symbol unboxing on pop.

Documentation

  • SYNTAX.md: Updated the Memory Optimization with Box section to document the new box keyword syntax and provide usage examples.

@ehwan ehwan self-assigned this Jun 19, 2026

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces support for the box keyword to automatically box large semantic values (such as %tokentype or non-terminal RuleTypes) in the generated parser's data stack enum, optimizing memory usage. The parser generator automatically handles wrapping (Box::new) and unwrapping (*val) during reduce actions. Feedback on the pull request suggests optimizing the parsing of rule typenames in rusty_lr_parser/src/grammar.rs to avoid redundant cloning and parsing of the TokenStream by combining the checks for boxing and placeholder types.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment on lines +1030 to 1045
let is_boxed = if let Some(rt) = &rules_arg.typename {
let (boxed, _) = check_and_strip_box(rt.clone());
boxed
} else {
false
};

let ruletype = if rules_arg.typename.is_none() {
None
} else if is_placeholder_type(&rules_arg.typename) {
let placeholder_name = format_ident!("__rustylr_placeholder_{}", rules_arg.name.value());
Some(quote! { #placeholder_name })
} else {
rules_arg.typename.clone()
let (_, stripped) = check_and_strip_box(rules_arg.typename.clone().unwrap());
Some(stripped)
};

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The current implementation clones and parses the TokenStream multiple times for each rule's typename (once for is_boxed, once for is_placeholder_type, and once in the else branch). This is redundant and inefficient.

We can optimize this by calling check_and_strip_box once and checking for the placeholder pattern directly on the stripped stream.

            let (is_boxed, ruletype) = if let Some(rt) = &rules_arg.typename {
                let (boxed, stripped) = check_and_strip_box(rt.clone());
                let is_placeholder = {
                    let mut it = stripped.clone().into_iter();
                    if let Some(proc_macro2::TokenTree::Ident(ident)) = it.next() {
                        ident.to_string() == "_" && it.next().is_none()
                    } else {
                        false
                    }
                };
                if is_placeholder {
                    let placeholder_name = format_ident!("__rustylr_placeholder_{}", rules_arg.name.value());
                    (boxed, Some(quote! { #placeholder_name }))
                } else {
                    (boxed, Some(stripped))
                }
            } else {
                (false, None)
            };

@ehwan ehwan merged commit 283d536 into main Jun 19, 2026
1 check passed
@ehwan ehwan deleted the autobox branch June 19, 2026 00:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant