Skip to content

Implement filter search for disk path using AdpativeL#1173

Merged
dyhyfu merged 17 commits into
mainfrom
u/yaohongdeng/impl_adaptiveL_disk
Jul 1, 2026
Merged

Implement filter search for disk path using AdpativeL#1173
dyhyfu merged 17 commits into
mainfrom
u/yaohongdeng/impl_adaptiveL_disk

Conversation

@dyhyfu

@dyhyfu dyhyfu commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

This PR is to add filter search to disk path using AdpativeL filter algorithm.

Reference PR: #1131

The main changes:

  1. New filter_search() method on DiskIndexSearcher (parallel to flat_search) that constructs InlineFilterSearch::new(Knn, label_provider, adaptive_l) and hands it to the same self.index.search(...) engine as the plain Knn path.
  2. New PredicateLabelProvider that adapts the existing &dyn Fn(&u32) -> bool to QueryLabelProvider
  3. search() and search_internal() gain adaptive_l: Option. When None, behavior is identical to today. When Some(_), the graph path routes through filter_search.
  4. Three-way dispatch in search_internal: flat scan → graph + adaptive_l → plain Knn.

@dyhyfu dyhyfu requested review from a team and Copilot June 16, 2026 03:06

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR adds an AdaptiveL-enabled, inline label-filtered graph-search path for disk index search, wiring it through DiskIndexSearcher while keeping existing behavior unchanged when AdaptiveL is not provided.

Changes:

  • Add adaptive_l: Option<AdaptiveL> to DiskIndexSearcher::search / search_internal and dispatch to an InlineFilterSearch path when present.
  • Introduce a small PredicateLabelProvider adapter to bridge the existing &dyn Fn(&u32) -> bool filter predicate into QueryLabelProvider<u32>.
  • Add disk-provider tests covering (1) no-op equivalence vs plain Knn and (2) selective predicates returning only matching IDs; update callers to pass None until CLI/JSON exposes AdaptiveL.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 1 comment.

File Description
diskann-tools/src/utils/search_disk_index.rs Passes None for the new adaptive_l argument (not yet exposed via CLI).
diskann-disk/src/search/provider/disk_provider.rs Implements filter_search via InlineFilterSearch + AdaptiveL, adds adapter + tests, and extends search APIs with adaptive_l.
diskann-disk/src/build/builder/core.rs Updates test call sites to include the new adaptive_l parameter.
diskann-benchmark/src/backend/disk_index/search.rs Passes None for the new adaptive_l argument (not yet exposed via benchmark JSON).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated
@magdalendobson

Copy link
Copy Markdown
Contributor

You may want to take a look at the changes that are coming to filters with #1141 and make sure your changes are compatible

Comment thread diskann-benchmark/src/disk_index/search.rs Outdated
Comment thread diskann-tools/src/utils/search_disk_index.rs Outdated
@magdalendobson magdalendobson linked an issue Jun 16, 2026 that may be closed by this pull request
Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated
Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated
Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated
Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated
Comment thread diskann-tools/src/utils/search_disk_index.rs
Comment thread diskann-disk/src/search/plan.rs Outdated
@codecov-commenter

codecov-commenter commented Jun 18, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 89.36877% with 32 lines in your changes missing coverage. Please review.
✅ Project coverage is 89.64%. Comparing base (3fafae0) to head (2853447).

Files with missing lines Patch % Lines
diskann-tools/src/utils/search_disk_index.rs 0.00% 9 Missing ⚠️
diskann-benchmark/src/inputs/disk.rs 0.00% 8 Missing ⚠️
diskann-disk/src/search/provider/disk_provider.rs 95.91% 8 Missing ⚠️
diskann-disk/src/search/search_mode.rs 91.95% 7 Missing ⚠️

❌ Your patch status has failed because the patch coverage (89.36%) is below the target coverage (90.00%). You can increase the patch coverage or adjust the target coverage.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1173      +/-   ##
==========================================
- Coverage   89.64%   89.64%   -0.01%     
==========================================
  Files         502      503       +1     
  Lines       95500    95707     +207     
==========================================
+ Hits        85607    85792     +185     
- Misses       9893     9915      +22     
Flag Coverage Δ
miri 89.64% <89.36%> (-0.01%) ⬇️
unittests 89.30% <89.36%> (+<0.01%) ⬆️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-benchmark/src/main.rs 91.53% <ø> (ø)
diskann-disk/src/build/builder/core.rs 95.36% <100.00%> (+<0.01%) ⬆️
diskann-disk/src/search/search_mode.rs 91.95% <91.95%> (ø)
diskann-benchmark/src/inputs/disk.rs 1.44% <0.00%> (-0.06%) ⬇️
diskann-disk/src/search/provider/disk_provider.rs 93.99% <95.91%> (+0.39%) ⬆️
diskann-tools/src/utils/search_disk_index.rs 0.00% <0.00%> (ø)

... and 2 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Comment thread diskann-benchmark/src/disk_index/search.rs Outdated
@magdalendobson

Copy link
Copy Markdown
Contributor

This looks fine to me pending a couple of things:

  • Let's get the benchmark integration a little smoother by creating a wrapper struct that's json deserializable and constructs the SearchMode directly
  • Can you include a few experiments with larger filtered datasets as a sanity check for correctness/performance here?

Reconciles the new DeterminantDiversity post-processor feature from main
(#1141 follow-on) with this branch's SearchMode enum. Folds the post-
processor selection into SearchMode as a fourth variant rather than
exposing it as an orthogonal `post_processor` parameter:

  pub enum SearchMode<'a> {
      FlatScan { filter },
      Graph { filter },
      InlineFilter { predicate, adaptive_l },
      DiverseGraph { filter, params: DeterminantDiversityParams },   // new
  }

Net effect: public `search()` stays at a single `mode` parameter; main's
`SearchPostProcessorKind` enum is removed (its variants are reachable
via SearchMode constructors). The internal `DiskSearchPostProcessor`
dispatch type is kept and now drives the DiverseGraph match arm via
`index.search_with`. `DeterminantDiversityAndFilter.filter` migrates
to `Option<PostprocessFilter>` for consistency with this branch's
RerankAndFilter migration.

Migrated main's `test_disk_search_determinant_diversity` to the new
SearchMode API (now uses `SearchMode::diverse_graph(params)` and
`SearchMode::diverse_graph_filtered(predicate, params)`).

Benchmark `disk_index/search.rs` gains a 6-arm match that builds the
SearchMode from the JSON-driven (is_flat_search, has_filter,
post_processor) triple. Tools and builder pass `SearchMode::graph()`
unchanged.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated

@arkrishn94 arkrishn94 left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @dyhyfu, just left a couple of comments.

Comment thread diskann-disk/src/search/provider/disk_provider.rs Outdated
Comment thread diskann-disk/src/search/provider/disk_provider.rs

@suri-kumkaran suri-kumkaran left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are good to go after fixing disk-index benchmark input and moving towards PostprocessConfig.

Comment thread diskann-disk/src/search/provider/disk_provider.rs
Comment thread diskann-disk/src/search/search_mode.rs
@dyhyfu dyhyfu merged commit 745718c into main Jul 1, 2026
24 checks passed
@dyhyfu dyhyfu deleted the u/yaohongdeng/impl_adaptiveL_disk branch July 1, 2026 06:58
use std::{fmt, num::NonZeroUsize, path::Path};

use anyhow::Context;
#[cfg(feature = "disk-index")]

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any reason we have everything gated by #[cfg(feature = "disk-index")]?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Filtered search support on disk path

8 participants