Skip to content

bftree lazy neighbor init#1215

Open
JordanMaples wants to merge 2 commits into
mainfrom
jordanmaples/bftree_lazy_neighbor_init
Open

bftree lazy neighbor init#1215
JordanMaples wants to merge 2 commits into
mainfrom
jordanmaples/bftree_lazy_neighbor_init

Conversation

@JordanMaples

Copy link
Copy Markdown
Contributor

What

Make bf-tree neighbor lists initialize lazily instead of eagerly materializing
an empty list for every id at construction.

Two changes in diskann-bftree:

  1. neighbors.rs — On neighbor read, treat a bf-tree NotFound result as an
    empty adjacency list rather than an error. (Deleted / InvalidKey still error.)
  2. provider.rs — Drop the O(max_points) eager loop in new_empty that
    wrote an empty list for every id up front.

Why

The eager loop was a stop-gap for a missing "exists" query. But bf-tree already
distinguishes Found / NotFound / Deleted natively, so mapping NotFound to
an empty list (identical to the existing Found(0) case) lets neighbor lists be
created on first write.

This makes construction O(num_start_points) instead of O(max_points),
eliminating a billion-insert startup wall for large, larger-than-memory datasets,
and removes the artificial coupling that required capacity to be fully
materialized up front.

Correctness

The consolidation paths that motivated the stop-gap (consolidate_vector /
on_neighbors) read neighbor lists during delete; a NotFound→empty result is
identical to the eager-init outcome for an unwritten id. These remain covered by
the delete-and-search tests, which exercise inplace_delete end to end.

JordanMaples and others added 2 commits June 30, 2026 10:30
Treat a bf-tree `NotFound` on neighbor read as an empty adjacency list
rather than an error, and drop the O(max_points) eager initialization
that wrote an empty list for every id at construction.

The eager loop was a stop-gap for a missing "exists" query, but bf-tree
already distinguishes `Found` / `NotFound` / `Deleted` natively. Mapping
`NotFound` to an empty list (identical to the existing `Found(0)` case)
lets neighbor lists be created lazily on first write. This makes
construction O(num_start_points) instead of O(max_points) — eliminating
a billion-insert startup wall for large, larger-than-memory datasets —
and removes the artificial coupling that required capacity to be fully
materialized up front.

The consolidate paths that motivated the stop-gap remain covered by the
delete-and-search tests, which exercise inplace_delete end to end.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Add coverage for the NotFound -> empty mapping introduced by lazy
neighbor-list initialization: a serial sweep over a wide range of
never-written ids (including sparse written ids and ids beyond any
write) and a concurrent multi-thread hammer. Both assert that absent
ids yield an empty adjacency list with no errors or panics.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@JordanMaples JordanMaples requested review from a team and Copilot June 30, 2026 21:28
@JordanMaples JordanMaples changed the title Jordanmaples/bftree lazy neighbor init bftree lazy neighbor init Jun 30, 2026
@codecov-commenter

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 89.82%. Comparing base (7718073) to head (ffa53a9).
⚠️ Report is 3 commits behind head on main.

Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #1215      +/-   ##
==========================================
- Coverage   90.85%   89.82%   -1.04%     
==========================================
  Files         489      489              
  Lines       93575    93618      +43     
==========================================
- Hits        85020    84093     -927     
- Misses       8555     9525     +970     
Flag Coverage Δ
miri 89.82% <100.00%> (-1.04%) ⬇️
unittests 89.48% <100.00%> (-1.34%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
diskann-bftree/src/neighbors.rs 96.95% <100.00%> (+0.27%) ⬆️
diskann-bftree/src/provider.rs 90.43% <100.00%> (-0.03%) ⬇️

... and 40 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@harsha-simhadri

Copy link
Copy Markdown
Contributor

nice! can you post perf and recall results for this change?

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants