Skip to content

Parser: fix exponential parse time on speculative prefix parsing#2352

Open
LucaCappelletti94 wants to merge 1 commit into
apache:mainfrom
LucaCappelletti94:pathological6
Open

Parser: fix exponential parse time on speculative prefix parsing#2352
LucaCappelletti94 wants to merge 1 commit into
apache:mainfrom
LucaCappelletti94:pathological6

Conversation

@LucaCappelletti94
Copy link
Copy Markdown
Contributor

@LucaCappelletti94 LucaCappelletti94 commented May 23, 2026

parse_prefix speculatively tries the next token as a reserved-word expression head and, on failure, falls back to treating it as an unreserved identifier. Both arms can independently recurse into the same downstream position. Two complementary caches now break the 2^N exploration: an outer per-position failure cache catches chains where both arms fail at every level, and a per-arm cache on the reserved-word try_parse catches chains where the arm fails but the fallback succeeds.

Supersedes #2349 (named-arg chain) and #2351 (NOT-prefix chain). Does NOT supersede #2350 (compound-keyword .not-b chain, which lives in parse_compound_expr).

Same machine, release build, apache main 182eae81 baseline:

Input Dialect Size Before After
if(current_time(...x, n=30 PG 394 B >60s 591us
.foo(t--,i) chain, n=25 (was #2349) PG 308 B >60s 550us
x-not-b. chain, n=30 (was #2351) Generic 247 B >60s 1.45ms
case\t- chain, n=30 (new) SQLite 181 B >60s 700us
cAse#f-#9-cAse#f... fuzz artifact PG 142 B 86ms 486us

Regression tests in tests/sqlparser_common.rs and benches under sqlparser_bench cover the if(current_time and case-case shapes.

@LucaCappelletti94
Copy link
Copy Markdown
Contributor Author

Me watching yet another fuzzer crash:

image

@LucaCappelletti94
Copy link
Copy Markdown
Contributor Author

No more exponential-class crashes for a day! Maybe we are done!

Comment thread src/parser/mod.rs
Comment on lines +1835 to +1840
// Memoize failed speculative reserved-word parses. When
// the reserved arm (CASE, CURRENT_TIME, etc.) does
// exponential work but the unreserved fallback ultimately
// succeeds, the overall `parse_prefix` returns `Ok` and the
// outer cache never fires. Chains like `case-case-...c`
// need this per-arm cache to break the doubling.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to follow this comment, what is 'outer cache' and 'break doubling' referring to?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Outer cache is the cache in the parent call before the recursive call, and doubling is that at each layer of depth it went in before, the operations doubled (hence the previous 2^layers complexity from before)

Comment thread src/parser/mod.rs
}
let result = self.parse_prefix_inner();
if let Err(ref e) = result {
self.failed_prefix_positions.insert(start_index, e.clone());
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

wondering about memory usage, how large do we expect the caches to get in the worse case scenarios?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a ~2GB limit on the fuzzer processes and after many billions of iterations it did not manage to OOM - it hit before the ASAN thread number limit than an OOM, so at the very least I believe it is not exponential. I can try and get a scaling though.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm yeah I'm mostly worried about the increased memory usage. A lot of deployments don't have e.g. 2GB to allocate to parsing a sql query. I think the main issue is that the map contains strings, maybe if its a much cheaper/copy object somehow we were looking to store, it might be feasible.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants