Skip to content

Standardize names and use kebab case in diskann-tools#1217

Open
magdalendobson wants to merge 32 commits into
mainfrom
users/magdalen/standardize_naming_in_tools
Open

Standardize names and use kebab case in diskann-tools#1217
magdalendobson wants to merge 32 commits into
mainfrom
users/magdalen/standardize_naming_in_tools

Conversation

@magdalendobson

Copy link
Copy Markdown
Contributor

This PR standardizes all variable names in diskann-tools. It also enforces that every commandline parameter uses kebab case.

Magdalen Manohar and others added 25 commits May 14, 2026 17:56
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR standardizes diskann-tools CLI flag naming (moving to kebab-case) and updates internal naming around query-time filtering (“vector filters” → “filter bitmaps”). It also adds a new range-groundtruth computation command and supporting utilities.

Changes:

  • Renamed filter-related parameters to filter_bitmap_file and improved mismatch handling by returning tool errors instead of panicking.
  • Standardized many clap long option names to kebab-case across multiple binaries.
  • Added range-search ground truth computation plumbing (compute_range_ground_truth_from_datafiles) and a new compute_range_groundtruth binary.

Reviewed changes

Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
diskann-tools/src/utils/search_disk_index.rs Renames filter input param and replaces an assert_eq! with a proper CLI error when filter/query counts mismatch.
diskann-tools/src/utils/ground_truth.rs Adds reusable query-bitmap builder, renames filter input param, and introduces range-groundtruth computation + writer usage.
diskann-tools/src/bin/subsample_bin.rs Converts args to explicit kebab-case long flags (notably changes previously positional args into required options).
diskann-tools/src/bin/relative_contrast.rs Renames CLI flags to kebab-case.
diskann-tools/src/bin/range_search_disk_index.rs.disabled Renames CLI flags to kebab-case in the disabled range-search binary.
diskann-tools/src/bin/random_data_generator.rs Renames CLI flags to kebab-case.
diskann-tools/src/bin/generate_synthetic_labels.rs Renames CLI flags to kebab-case.
diskann-tools/src/bin/generate_pq.rs Renames CLI flags to kebab-case.
diskann-tools/src/bin/generate_minmax.rs Renames CLI flags to kebab-case.
diskann-tools/src/bin/gen_associated_data_from_range.rs Renames CLI flag to kebab-case.
diskann-tools/src/bin/compute_specificities.rs Renames CLI flags to kebab-case.
diskann-tools/src/bin/compute_range_groundtruth.rs Adds a new CLI entrypoint for generating range-search ground truth (kebab-case flags).
diskann-tools/src/bin/compute_multivec_groundtruth.rs Renames CLI flags to kebab-case and updates filter/label flag naming.
diskann-tools/src/bin/compute_groundtruth.rs Renames filter arg to filter-bitmap-file and standardizes multiple CLI flags to kebab-case.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines 107 to 109
/// Optional labels file for base vectors
#[arg(long = "base_file_labels", default_value = None)]
#[arg(long = "base-file-labels", default_value = None)]
pub base_file_labels: Option<String>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in commit 0b53bd7 by applying the suggestion: removed default_value = None from base_file_labels so omitted input maps to None.

Comment on lines +122 to 130
#[arg(long = "base-file-labels", default_value = None)]
pub base_file_labels: Option<String>,

/// File containing the query vectors in binary format
#[arg(long = "query_file", short, required = true)]
#[arg(long = "query-file", short, required = true)]
pub query_file: String,

#[arg(long = "query_file_labels", default_value = None)]
#[arg(long = "query-file-labels", default_value = None)]
pub query_file_labels: Option<String>,
Comment on lines 136 to 146
/// Filter bitmap file in the range ground truth format
#[arg(long = "filter_bitmap_file", short, default_value = None)]
#[arg(long = "filter-bitmap-file", short, default_value = None)]
pub filter_bitmap_file: Option<String>,

/// Number of ground truth nearest neigbhors to compute
#[arg(long = "recall_at", short = 'K', default_value = "10")]
#[arg(long = "recall-at", short = 'K', default_value = "10")]
pub recall_at: u32,

/// File containing the associated data in binary format
#[arg(long = "associated_data_file", required = false, default_value = None)]
#[arg(long = "associated-data-file", required = false, default_value = None)]
pub associated_data_file: Option<String>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c4af1 by removing required = false and default_value = None from associated_data_file, so the absent option now stays None.

Comment on lines +112 to 120
#[arg(long = "base-file-labels", default_value = None)]
pub base_file_labels: Option<String>,

/// File containing the query vectors in binary format
#[arg(long = "query_file", short, required = true)]
#[arg(long = "query-file", short, required = true)]
pub query_file: String,

#[arg(long = "query_file_labels", default_value = None)]
#[arg(long = "query-file-labels", default_value = None)]
pub query_file_labels: Option<String>,

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed in 34c4af1 by removing default_value = None from query-file-labels so the option is truly unset when omitted.

Comment thread diskann-tools/src/bin/subsample_bin.rs
@codecov-commenter

codecov-commenter commented Jul 1, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.64%. Comparing base (be48520) to head (7c5d71c).

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1217      +/-   ##
==========================================
- Coverage   89.64%   89.64%   -0.01%     
==========================================
  Files         503      503              
  Lines       95695    95695              
==========================================
- Hits        85787    85785       -2     
- Misses       9908     9910       +2     
Flag Coverage Δ
miri 89.64% <ø> (-0.01%) ⬇️
unittests 89.31% <ø> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-tools/src/bin/compute_groundtruth.rs 0.00% <ø> (ø)
...kann-tools/src/bin/compute_multivec_groundtruth.rs 0.00% <ø> (ø)
diskann-tools/src/bin/compute_range_groundtruth.rs 0.00% <ø> (ø)
diskann-tools/src/bin/compute_specificities.rs 0.00% <ø> (ø)
...nn-tools/src/bin/gen_associated_data_from_range.rs 0.00% <ø> (ø)
diskann-tools/src/bin/generate_minmax.rs 0.00% <ø> (ø)
diskann-tools/src/bin/generate_pq.rs 0.00% <ø> (ø)
diskann-tools/src/bin/generate_synthetic_labels.rs 0.00% <ø> (ø)
diskann-tools/src/bin/random_data_generator.rs 0.00% <ø> (ø)
diskann-tools/src/bin/relative_contrast.rs 0.00% <ø> (ø)
... and 1 more

... and 3 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@harsha-simhadri

Copy link
Copy Markdown
Contributor

@copilot can you check that these renames do not break any CI workflows or tests.

Copilot AI commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@copilot can you check that these renames do not break any CI workflows or tests.

Checked. For this PR branch, recent CI runs are action_required with 0 jobs (not test failures), and failed-job logs report none. I also verified there are no .github workflow callsites using the renamed CLI flags, and local validation passed: cargo test -p diskann-tools --lib (67/67) and cargo check -p diskann-tools --bins.

@harsha-simhadri

Copy link
Copy Markdown
Contributor

@copilot can you check whether this needs any updates to readme files or other documentation in the repo or the associated wiki on github

Copilot AI commented Jul 1, 2026

Copy link
Copy Markdown
Contributor

@copilot can you check whether this needs any updates to readme files or other documentation in the repo or the associated wiki on github

Checked. I don’t see required doc updates from these renames right now: repository Markdown docs don’t reference the renamed diskann-tools CLI flags, and the current wiki pages are high-level/research/perf content with no affected CLI usage. So no README/wiki changes appear necessary for this PR.

Magdalen Manohar added 3 commits July 1, 2026 19:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants