Parser: fix exponential parse time on speculative prefix parsing#2352
Parser: fix exponential parse time on speculative prefix parsing#2352LucaCappelletti94 wants to merge 1 commit into
Conversation
b6a762e to
7dcc611
Compare
|
No more exponential-class crashes for a day! Maybe we are done! |
| // Memoize failed speculative reserved-word parses. When | ||
| // the reserved arm (CASE, CURRENT_TIME, etc.) does | ||
| // exponential work but the unreserved fallback ultimately | ||
| // succeeds, the overall `parse_prefix` returns `Ok` and the | ||
| // outer cache never fires. Chains like `case-case-...c` | ||
| // need this per-arm cache to break the doubling. |
There was a problem hiding this comment.
I wasn't able to follow this comment, what is 'outer cache' and 'break doubling' referring to?
There was a problem hiding this comment.
Outer cache is the cache in the parent call before the recursive call, and doubling is that at each layer of depth it went in before, the operations doubled (hence the previous 2^layers complexity from before)
| } | ||
| let result = self.parse_prefix_inner(); | ||
| if let Err(ref e) = result { | ||
| self.failed_prefix_positions.insert(start_index, e.clone()); |
There was a problem hiding this comment.
wondering about memory usage, how large do we expect the caches to get in the worse case scenarios?
There was a problem hiding this comment.
I have a ~2GB limit on the fuzzer processes and after many billions of iterations it did not manage to OOM - it hit before the ASAN thread number limit than an OOM, so at the very least I believe it is not exponential. I can try and get a scaling though.
There was a problem hiding this comment.
hmm yeah I'm mostly worried about the increased memory usage. A lot of deployments don't have e.g. 2GB to allocate to parsing a sql query. I think the main issue is that the map contains strings, maybe if its a much cheaper/copy object somehow we were looking to store, it might be feasible.

parse_prefixspeculatively tries the next token as a reserved-word expression head and, on failure, falls back to treating it as an unreserved identifier. Both arms can independently recurse into the same downstream position. Two complementary caches now break the 2^N exploration: an outer per-position failure cache catches chains where both arms fail at every level, and a per-arm cache on the reserved-wordtry_parsecatches chains where the arm fails but the fallback succeeds.Supersedes #2349 (named-arg chain) and #2351 (NOT-prefix chain). Does NOT supersede #2350 (compound-keyword
.not-bchain, which lives inparse_compound_expr).Same machine, release build, apache main
182eae81baseline:if(current_time(...x, n=30.foo(t--,i)chain, n=25 (was #2349)x-not-b.chain, n=30 (was #2351)case\t-chain, n=30 (new)cAse#f-#9-cAse#f...fuzz artifactRegression tests in
tests/sqlparser_common.rsand benches undersqlparser_benchcover theif(current_timeandcase-caseshapes.