Apply distance constraints in rescore KNN queries (fixes #308)#309
Open
stumpylog wants to merge 1 commit into
Open
Apply distance constraints in rescore KNN queries (fixes #308)#309stumpylog wants to merge 1 commit into
stumpylog wants to merge 1 commit into
Conversation
KNN queries on vec0 tables using INDEXED BY rescore(...) silently ignored `distance <op> ?` constraints. The standard chunk-scan path filters candidates against parsed distance constraints, but the rescore dispatch (rescore_knn) never applied them. Apply the constraints to the rescored float distances in phase 2, before top-k selection, so they target the same distance the `distance` column reports. Handle the now-possible zero-surviving-candidates case. Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Fixes #308. On
vec0tables created withINDEXED BY rescore(...), KNN queries silently ignoreddistance <op> ?constraints:This returned the top-k by distance without filtering out rows above the threshold.
Root cause
The standard chunk-scan path filters candidates against the distance constraints parsed from
idxStr/argv. Rescore columns dispatch torescore_knn, which received those constraints but never applied them.Fix
Apply the constraints in
rescore_knnphase 2, after the exact float distances are computed and before top-k selection — so they target the final rescored distance (what thedistancecolumn reports), not the coarse quantized distance from phase 1. Candidates failing aGE/GT/LE/LTconstraint are dropped, with an early-out when none survive.Testing
test_knn_distance_constraint_leandtest_knn_distance_constraint_lt_gttotests/test-rescore.py; both fail onmainand pass with this change.🤖 Generated with Claude Code