Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
dcdc377
Make benchmarks stateful.
Apr 28, 2026
d60ac07
Checkpoint.
Apr 28, 2026
6cc92bd
Checkopint.
Apr 29, 2026
f050e3d
Oh boy.
Apr 29, 2026
2d03d11
Merge remote-tracking branch 'origin/main' into mhildebr/benchmark-pl…
May 1, 2026
aa78836
Fix delimiters.
May 1, 2026
ccf891a
Revert formatting.
May 1, 2026
2e060ea
Make plugins slightly more flexible.
May 1, 2026
7a85df8
Merge branch 'main' into mhildebr/benchmark-plugins
hildebrandmw May 1, 2026
fc8008a
Add determinant-diversity plugin support on search-plugin architecture
narendatha May 4, 2026
298d1b8
Integrate determinant-diversity via disk search_with post-processor
narendatha May 4, 2026
fbd34fd
Restrict determinant-diversity to async full-precision topk
narendatha May 4, 2026
851174e
Keep single async determinant-diversity example JSON
narendatha May 4, 2026
ed0c918
Improve plugin matching resilience via phase-shape helpers
narendatha May 5, 2026
3a6aa1a
Use SearchPhaseKind::as_str in benchmark plugin kinds
narendatha May 5, 2026
ccfe4d7
remove serde defaults
narendatha May 5, 2026
01e194d
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha May 5, 2026
86883b7
minor merge fix
narendatha May 5, 2026
554bc7f
hook up actual algorithm, replace placeholder.
narendatha May 5, 2026
8c59e6f
WIP: Trait bound experiment for async determinant-diversity - HRTB pr…
narendatha May 5, 2026
b73abc8
apply mark's beautiful fix for lifetime issue
narendatha May 6, 2026
d1884c3
Fix async determinant-diversity: wire real vectors, timing metrics, r…
narendatha May 6, 2026
701ce8e
Fix CI clippy-features spherical plugin errors and apply formatting
narendatha May 6, 2026
6b935d3
Add determinant-diversity support for async and disk-index benchmarks
narendatha May 6, 2026
6f47ba3
Merge branch 'u/narendatha/det_div_plugins' of https://github.com/mic…
narendatha May 6, 2026
8acbba2
imrpove code coverage
narendatha May 6, 2026
a48e255
minor fix
narendatha May 6, 2026
468b5d2
cargo fmt
narendatha May 6, 2026
d9e66ba
WIP: Benchmarks refactoring - threading fix, rich struct params, as_s…
narendatha May 13, 2026
9cda6e0
Add post-processor generic parameter to KNN struct in benchmark-core
narendatha May 13, 2026
8b07676
Task 5: Create unified validation struct for DeterminantDiversity in …
narendatha May 13, 2026
8ce2130
Task 6: Add module-level documentation to determinant_diversity_post_…
narendatha May 13, 2026
309ecb3
Task 7: Add algorithmic tests to determinant_diversity_post_process.rs
narendatha May 13, 2026
3be546a
Task 8: Merge similar routines in determinant_diversity_post_process.rs
narendatha May 13, 2026
f92c3be
Task 9: Replace Vec<Vec<f32>> with Matrix for residuals storage
narendatha May 13, 2026
10b0182
Task 10: Move determinant_diversity_post_process out of async_ module
narendatha May 13, 2026
ca20a24
Refactor determinant-diversity benchmark path
narendatha May 13, 2026
f538583
cargo fmt and clippy fixes for CI
narendatha May 13, 2026
f58789d
Use shared determinant-diversity params validation
narendatha May 18, 2026
79c635f
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha May 18, 2026
3c11d36
code review comment, use a struct instead of a tuple
narendatha May 18, 2026
b65e673
Refine determinant-diversity invariants and range representation
narendatha May 18, 2026
85797ce
minor code cleanup
narendatha May 18, 2026
8aeee5a
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 2, 2026
19380a7
Address PR #1011 r3268094250: document determinant_diversity math; us…
narendatha Jun 2, 2026
63f1ab8
knn: route configured post-processors through KNN::with_postprocessor
narendatha Jun 2, 2026
2f482f9
det-div: add unit tests for edge cases and selection invariants
narendatha Jun 3, 2026
6f95f21
det-div: fence math blocks in doc comments to avoid doctest compile e…
narendatha Jun 3, 2026
bc03532
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 9, 2026
ce8db8d
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 10, 2026
125ae1b
minor fixes.
narendatha Jun 12, 2026
98b6a7d
Mark's code review comments: minor refactors
narendatha Jun 12, 2026
44c5884
refactor: determinant_diversity API - migrate from generic Vec to mat…
narendatha Jun 12, 2026
87001ad
refactor: make determinant-diversity a first-class SearchPhase variant
narendatha Jun 12, 2026
bc9e488
refactor(knn): use AsPostProcessor trait for infallible post-processo…
narendatha Jun 12, 2026
7ff4ff1
refactor(inputs): rely on plugin matching for det-div backend compati…
narendatha Jun 12, 2026
77ca168
test(disk): add det-div post-processor smoke test
narendatha Jun 12, 2026
b416b95
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 12, 2026
11c989e
fix test failure
narendatha Jun 12, 2026
855fbd2
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 13, 2026
1807bf0
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 16, 2026
1c8591d
Move DeterminantDiversityParams into determinant_diversity module
narendatha Jun 18, 2026
0c0a1b1
Merge remote-tracking branch 'origin/main' into u/narendatha/det_div_…
narendatha Jun 18, 2026
05a6b37
Remove redundant ordered_ids vec in determinant-diversity post-process
narendatha Jun 19, 2026
008f62b
Make search() take SearchPostProcessorKind directly with explicit Non…
narendatha Jun 19, 2026
3165ce4
cargo fmt
narendatha Jun 19, 2026
aaca369
Move DeterminantDiversity post-processor from diskann-benchmark to di…
narendatha Jun 19, 2026
a6652ae
Use thiserror derive for DeterminantDiversityError
narendatha Jun 19, 2026
26784b7
Add From<DeterminantDiversityError> for ANNError and use ? at callsites
narendatha Jun 19, 2026
ae4981d
minor comment fix
narendatha Jun 19, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
119 changes: 111 additions & 8 deletions diskann-benchmark-core/src/search/graph/knn.rs
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ use crate::{
};

/// A built-in helper for benchmarking the K-nearest neighbors method
Comment thread
narendatha marked this conversation as resolved.
/// [`graph::DiskANNIndex::search`].
/// [`graph::DiskANNIndex::search`] with optional post-processing support.
///
/// This is intended to be used in conjunction with [`search::search`] or
/// [`search::search_all`] and provides some basic additional metrics for
Expand All @@ -32,21 +32,31 @@ use crate::{
///
/// The provided implementation of [`Search`] accepts [`graph::search::Knn`]
/// and returns [`Metrics`] as additional output.
///
/// # Type Parameters
///
/// - `DP`: The data provider type
/// - `T`: The query element type
/// - `S`: The search strategy type
/// - `PP`: Post-processor selector. Defaults to [`Defaulted`], which uses the
/// strategy's default post-processor. Use [`KNN::with_postprocessor`] to
/// supply an explicit post-processor.
#[derive(Debug)]
pub struct KNN<DP, T, S>
pub struct KNN<DP, T, S, PP = Defaulted>
where
DP: provider::DataProvider,
{
index: Arc<graph::DiskANNIndex<DP>>,
queries: Arc<Matrix<T>>,
strategy: Strategy<S>,
post_processor: PP,
}

impl<DP, T, S> KNN<DP, T, S>
impl<DP, T, S> KNN<DP, T, S, Defaulted>
where
DP: provider::DataProvider,
{
/// Construct a new [`KNN`] searcher.
/// Construct a new [`KNN`] searcher using the strategy's default post-processor.
///
/// If `strategy` is one of the container variants of [`Strategy`], its length
/// must match the number of rows in `queries`. If this is the case, then the
Expand All @@ -68,10 +78,98 @@ where
index,
queries,
strategy,
post_processor: Defaulted,
}))
}
}

impl<DP, T, S, PP> KNN<DP, T, S, Forwarded<PP>>
where
DP: provider::DataProvider,
{
/// Construct a new [`KNN`] searcher with an explicit post-processor.
///
/// # Errors
///
/// Returns an error if the number of elements in `strategy` is not compatible with
/// the number of rows in `queries`.
pub fn with_postprocessor(
index: Arc<graph::DiskANNIndex<DP>>,
queries: Arc<Matrix<T>>,
strategy: Strategy<S>,
post_processor: PP,
) -> anyhow::Result<Arc<Self>> {
strategy.length_compatible(queries.nrows())?;

Ok(Arc::new(Self {
index,
queries,
strategy,
post_processor: Forwarded(post_processor),
}))
}
}

impl<DP, T, S, PP> KNN<DP, T, S, PP>
where
DP: provider::DataProvider,
{
/// Access the index.
pub fn index(&self) -> &Arc<graph::DiskANNIndex<DP>> {
&self.index
}
}

/// Resolves a post-processor for [`KNN`] given a search strategy.
///
/// This trait lets [`KNN`] support both "use the strategy's default post-processor"
/// ([`Defaulted`]) and "use this explicit post-processor" ([`Forwarded`]) without
/// duplicating the search loop.
pub trait AsPostProcessor<'a, S, DP, T>
where
DP: provider::DataProvider,
S: glue::SearchStrategy<'a, DP, T>,
{
/// The concrete post-processor used for a single search.
type Processor: glue::SearchPostProcess<S::SearchAccessor, T, DP::ExternalId> + Send + Sync;

/// Construct the post-processor to use for a single search.
fn as_post_processor(&'a self, strategy: &'a S) -> Self::Processor;
}

/// Marker indicating that [`KNN`] should use the strategy's default post-processor.
#[derive(Debug, Clone, Copy)]
pub struct Defaulted;

impl<'a, S, DP, T> AsPostProcessor<'a, S, DP, T> for Defaulted
where
DP: provider::DataProvider,
S: glue::DefaultPostProcessor<'a, DP, T, DP::ExternalId>,
{
type Processor = S::Processor;

fn as_post_processor(&'a self, strategy: &'a S) -> Self::Processor {
strategy.default_post_processor()
}
}

/// Wraps an explicit post-processor for use with [`KNN::with_postprocessor`].
#[derive(Debug, Clone, Copy)]
pub struct Forwarded<PP>(PP);
Comment thread
narendatha marked this conversation as resolved.

impl<'a, S, DP, T, PP> AsPostProcessor<'a, S, DP, T> for Forwarded<PP>
where
DP: provider::DataProvider,
S: glue::SearchStrategy<'a, DP, T>,
PP: glue::SearchPostProcess<S::SearchAccessor, T, DP::ExternalId> + Clone + AsyncFriendly,
{
type Processor = PP;

fn as_post_processor(&'a self, _strategy: &'a S) -> Self::Processor {
self.0.clone()
}
}

/// Additional metrics collected during [`KNN`] search.
///
/// # Note
Expand All @@ -86,10 +184,11 @@ pub struct Metrics {
pub hops: u32,
}

impl<DP, T, S> Search for KNN<DP, T, S>
impl<DP, T, S, PP> Search for KNN<DP, T, S, PP>
where
DP: provider::DataProvider<Context: Default, ExternalId: search::Id>,
S: for<'a> glue::DefaultSearchStrategy<'a, DP, &'a [T], DP::ExternalId> + Clone + AsyncFriendly,
S: for<'a> glue::SearchStrategy<'a, DP, &'a [T]> + Clone + AsyncFriendly,
PP: for<'a> AsPostProcessor<'a, S, DP, &'a [T]> + AsyncFriendly,
graph::search::Knn:
for<'a> graph::Search<'a, DP, S, &'a [T], Output = graph::index::SearchStats>,
T: AsyncFriendly + Clone,
Expand Down Expand Up @@ -117,11 +216,15 @@ where
{
let context = DP::Context::default();
let knn_search = *parameters;
let strategy = self.strategy.get(index)?;
let processor = self.post_processor.as_post_processor(strategy);

let stats = self
.index
.search(
.search_with(
knn_search,
self.strategy.get(index)?,
strategy,
processor,
&context,
self.queries.row(index),
buffer,
Expand Down
48 changes: 48 additions & 0 deletions diskann-benchmark/example/async-determinant-diversity.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,48 @@
{
"search_directories": [
"test_data/disk_index_search"
],
"jobs": [
{
"type": "graph-index-build",
"content": {
"source": {
"index-source": "Build",
"data_type": "float32",
"data": "disk_index_siftsmall_learn_256pts_data.fbin",
"distance": "squared_l2",
"max_degree": 32,
"l_build": 50,
"alpha": 1.2,
"backedge_ratio": 1.0,
"num_threads": 1,
"start_point_strategy": "medoid",
"num_insert_attempts": 1,
"saturate_inserts": false
},
"search_phase": {
"search-type": "topk-determinant-diversity",
"queries": "disk_index_sample_query_10pts.fbin",
"groundtruth": "disk_index_10pts_idx_uint32_truth_search_res.bin",
"reps": 5,
"num_threads": [
1
],
"power": 2.0,
"eta": 0.01,
"runs": [
{
"search_n": 20,
"search_l": [
20,
30,
40
],
"recall_k": 10
}
]
}
}
}
]
}
42 changes: 42 additions & 0 deletions diskann-benchmark/example/disk-index-determinant-diversity.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,42 @@
{
"search_directories": [
"test_data/disk_index_search"
],
"jobs": [
{
"type": "disk-index",
"content": {
"source": {
"disk-index-source": "Build",
"data_type": "float32",
"data": "disk_index_siftsmall_learn_256pts_data.fbin",
"distance": "squared_l2",
"dim": 128,
"max_degree": 32,
"l_build": 50,
"num_threads": 1,
"build_ram_limit_gb": 2.0,
"num_pq_chunks": 128,
"quantization_type": "FP",
"save_path": "siftsmall_index_full_det_div"
},
"search_phase": {
"queries": "disk_index_sample_query_10pts.fbin",
"groundtruth": "disk_index_10pts_idx_uint32_truth_search_res.bin",
"search_list": [10, 20, 40],
"beam_width": 4,
"recall_at": 10,
"num_threads": 1,
"is_flat_search": false,
"distance": "squared_l2",
"vector_filters_file": null,
"post_processor": {
"type": "determinant-diversity",
"power": 2.0,
"eta": 1.0
}
}
}
}
]
}
35 changes: 23 additions & 12 deletions diskann-benchmark/src/disk_index/search.rs
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,8 @@ use diskann_benchmark_runner::{files::InputFile, utils::MicroSeconds};
use diskann_disk::{
data_model::{AdHoc, CachingStrategy},
search::provider::{
disk_provider::DiskIndexSearcher, disk_vertex_provider_factory::DiskVertexProviderFactory,
disk_provider::{DiskIndexSearcher, SearchPostProcessorKind},
disk_vertex_provider_factory::DiskVertexProviderFactory,
},
storage::disk_index_reader::DiskIndexReader,
utils::{instrumentation::PerfLogger, statistics, AlignedFileReaderFactory, QueryStatistics},
Expand All @@ -32,7 +33,10 @@ use serde::{Deserialize, Serialize};

use crate::{
disk_index::json_spancollector::JsonSpanCollector,
inputs::disk::{DiskIndexLoad, DiskSearchPhase},
inputs::{
disk::{DiskIndexLoad, DiskSearchPhase},
post_processor::TopkPostProcessor,
},
utils::{datafiles, SimilarityMeasure},
};

Expand Down Expand Up @@ -264,6 +268,12 @@ where
zipped.for_each_in_pool(
pool.as_ref(),
|(((((q, vf), id_chunk), dist_chunk), stats), rc)| {
let post_processor = search_params.post_processor.as_ref().map_or(
SearchPostProcessorKind::None,
|TopkPostProcessor::DeterminantDiversity(params)| {
SearchPostProcessorKind::DeterminantDiversity(*params)
},
);
let vector_filter = if search_params.vector_filters_file.is_none() {
None
} else {
Expand All @@ -277,20 +287,21 @@ where
l,
Some(search_params.beam_width),
vector_filter,
post_processor,
search_params.is_flat_search,
) {
Ok(search_result) => {
*stats = search_result.stats.query_statistics;
*rc = search_result.results.len() as u32;
let actual_results = search_result
.results
.len()
.min(search_params.recall_at as usize);
for (i, result_item) in search_result
.results
.iter()
.take(actual_results)
.enumerate()
let base_count = (search_result.stats.result_count as usize)
.min(search_params.recall_at as usize)
.min(search_result.results.len());

*rc = base_count as u32;
id_chunk.fill(0);
dist_chunk.fill(0.0);

for (i, result_item) in
search_result.results.iter().take(base_count).enumerate()
{
id_chunk[i] = result_item.vertex_id;
dist_chunk[i] = result_item.distance;
Expand Down
Loading
Loading