Standardize names and use kebab case in diskann-tools#1217
Standardize names and use kebab case in diskann-tools#1217magdalendobson wants to merge 32 commits into
Conversation
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Copilot Autofix powered by AI <175728472+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
This PR standardizes diskann-tools CLI flag naming (moving to kebab-case) and updates internal naming around query-time filtering (“vector filters” → “filter bitmaps”). It also adds a new range-groundtruth computation command and supporting utilities.
Changes:
- Renamed filter-related parameters to
filter_bitmap_fileand improved mismatch handling by returning tool errors instead of panicking. - Standardized many
claplong option names to kebab-case across multiple binaries. - Added range-search ground truth computation plumbing (
compute_range_ground_truth_from_datafiles) and a newcompute_range_groundtruthbinary.
Reviewed changes
Copilot reviewed 11 out of 12 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| diskann-tools/src/utils/search_disk_index.rs | Renames filter input param and replaces an assert_eq! with a proper CLI error when filter/query counts mismatch. |
| diskann-tools/src/utils/ground_truth.rs | Adds reusable query-bitmap builder, renames filter input param, and introduces range-groundtruth computation + writer usage. |
| diskann-tools/src/bin/subsample_bin.rs | Converts args to explicit kebab-case long flags (notably changes previously positional args into required options). |
| diskann-tools/src/bin/relative_contrast.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/range_search_disk_index.rs.disabled | Renames CLI flags to kebab-case in the disabled range-search binary. |
| diskann-tools/src/bin/random_data_generator.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/generate_synthetic_labels.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/generate_pq.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/generate_minmax.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/gen_associated_data_from_range.rs | Renames CLI flag to kebab-case. |
| diskann-tools/src/bin/compute_specificities.rs | Renames CLI flags to kebab-case. |
| diskann-tools/src/bin/compute_range_groundtruth.rs | Adds a new CLI entrypoint for generating range-search ground truth (kebab-case flags). |
| diskann-tools/src/bin/compute_multivec_groundtruth.rs | Renames CLI flags to kebab-case and updates filter/label flag naming. |
| diskann-tools/src/bin/compute_groundtruth.rs | Renames filter arg to filter-bitmap-file and standardizes multiple CLI flags to kebab-case. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| /// Optional labels file for base vectors | ||
| #[arg(long = "base_file_labels", default_value = None)] | ||
| #[arg(long = "base-file-labels", default_value = None)] | ||
| pub base_file_labels: Option<String>, |
There was a problem hiding this comment.
Fixed in commit 0b53bd7 by applying the suggestion: removed default_value = None from base_file_labels so omitted input maps to None.
| #[arg(long = "base-file-labels", default_value = None)] | ||
| pub base_file_labels: Option<String>, | ||
|
|
||
| /// File containing the query vectors in binary format | ||
| #[arg(long = "query_file", short, required = true)] | ||
| #[arg(long = "query-file", short, required = true)] | ||
| pub query_file: String, | ||
|
|
||
| #[arg(long = "query_file_labels", default_value = None)] | ||
| #[arg(long = "query-file-labels", default_value = None)] | ||
| pub query_file_labels: Option<String>, |
| /// Filter bitmap file in the range ground truth format | ||
| #[arg(long = "filter_bitmap_file", short, default_value = None)] | ||
| #[arg(long = "filter-bitmap-file", short, default_value = None)] | ||
| pub filter_bitmap_file: Option<String>, | ||
|
|
||
| /// Number of ground truth nearest neigbhors to compute | ||
| #[arg(long = "recall_at", short = 'K', default_value = "10")] | ||
| #[arg(long = "recall-at", short = 'K', default_value = "10")] | ||
| pub recall_at: u32, | ||
|
|
||
| /// File containing the associated data in binary format | ||
| #[arg(long = "associated_data_file", required = false, default_value = None)] | ||
| #[arg(long = "associated-data-file", required = false, default_value = None)] | ||
| pub associated_data_file: Option<String>, |
There was a problem hiding this comment.
Fixed in 34c4af1 by removing required = false and default_value = None from associated_data_file, so the absent option now stays None.
| #[arg(long = "base-file-labels", default_value = None)] | ||
| pub base_file_labels: Option<String>, | ||
|
|
||
| /// File containing the query vectors in binary format | ||
| #[arg(long = "query_file", short, required = true)] | ||
| #[arg(long = "query-file", short, required = true)] | ||
| pub query_file: String, | ||
|
|
||
| #[arg(long = "query_file_labels", default_value = None)] | ||
| #[arg(long = "query-file-labels", default_value = None)] | ||
| pub query_file_labels: Option<String>, |
There was a problem hiding this comment.
Fixed in 34c4af1 by removing default_value = None from query-file-labels so the option is truly unset when omitted.
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## main #1217 +/- ##
==========================================
- Coverage 89.64% 89.64% -0.01%
==========================================
Files 503 503
Lines 95695 95695
==========================================
- Hits 85787 85785 -2
- Misses 9908 9910 +2
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
|
@copilot can you check that these renames do not break any CI workflows or tests. |
Checked. For this PR branch, recent CI runs are |
|
@copilot can you check whether this needs any updates to readme files or other documentation in the repo or the associated wiki on github |
Checked. I don’t see required doc updates from these renames right now: repository Markdown docs don’t reference the renamed diskann-tools CLI flags, and the current wiki pages are high-level/research/perf content with no affected CLI usage. So no README/wiki changes appear necessary for this PR. |
…en/standardize_naming_in_tools
…om:microsoft/DiskANN into users/magdalen/standardize_naming_in_tools
This PR standardizes all variable names in diskann-tools. It also enforces that every commandline parameter uses kebab case.