Skip to content

[handler] routing-trie operator-error family (APO-653, APO-654, APO-663)#4

Merged
dilyevsky merged 2 commits into
mainfrom
dsky/apo-653-654-663-routing-trie
Jun 16, 2026
Merged

[handler] routing-trie operator-error family (APO-653, APO-654, APO-663)#4
dilyevsky merged 2 commits into
mainfrom
dsky/apo-653-654-663-routing-trie

Conversation

@dilyevsky

Copy link
Copy Markdown
Contributor

Fixes the routing-trie operator-error family. The trie indexed networks by Dst via LPM and shared the inner srcTrie by prefix containment, producing three defects:

  • APO-653 (S10) — a second VN with identical Src+Dst silently overwrote the first's egress. AddVirtualNetwork now rejects a route owned by a different VNI.
  • APO-654 (S11) — removing one VN blackholed another sharing a route, and emptied nodes leaked. Removal is owner-aware and reclaims a Dst entry once its last owner is gone.
  • APO-663 (S20) — a nested Dst was LPM-located into (and clobbered) the broader entry. Management now keys an exact-match dstEntries map; the data-path LPM is unchanged.

AddVirtualNetwork checks-then-adds (stores networkByID last); UpdateVirtualNetworkRoutes is atomic with rollback on conflict. The data-path lookup now holds the read lock across both Finds, closing a latent unlocked second-tier race (the correctness half of APO-675).

Tests

  • routing_trie_test.go: duplicate-reject, sibling-preserved + empty-entry reclaim, nested-Dst exact keying, atomic-update rollback. Teeth-checked (S20 fails if the old LPM-location bug is reintroduced).
  • Full suite green on darwin and Linux (OrbStack), -race clean.

Second commit

Adds routing_contention_bench_test.go — a b.RunParallel benchmark quantifying the RWMutex readerCount cache-line tax (~2.5× at 8–10 cores) that motivates the RCU follow-up tracked in APO-675. Benchmark only; not run under plain go test.

…ing (APO-653, APO-654, APO-663)

The routing trie indexed networks by Dst via LPM and shared the inner
srcTrie by prefix containment, which produced three operator-error defects:

- APO-653 (S10): a second virtual network with an identical Src+Dst
  silently overwrote the first's egress slot. AddVirtualNetwork now
  rejects a route already owned by a different VNI.
- APO-654 (S11): removing one network blackholed another that shared a
  route, and emptied trie nodes leaked. Removal is now owner-aware and
  reclaims a Dst entry once its last owner is gone.
- APO-663 (S20): a Dst nested inside a broader network's Dst was located
  via LPM and landed in (clobbering) the broader entry. Management now
  keys an exact-match dstEntries map; the data-path LPM is unchanged.

AddVirtualNetwork checks-then-adds and stores networkByID last;
UpdateVirtualNetworkRoutes is atomic with rollback on conflict. The
data-path lookup now holds the read lock across both the outer and inner
Find, closing a latent unlocked second-tier race (see APO-675).

Adds routing_trie_test.go covering duplicate-reject, sibling-preserved +
empty-entry reclaim, nested-Dst exact keying, and atomic-update rollback;
export_test.go gains RouteLookupForTest / DstEntryCountForTest seams.
b.RunParallel benchmark of the per-packet route lookup under the
networksByAddressMu RWMutex versus a lock-free atomic.Pointer snapshot
doing byte-identical trie work. Quantifies the readerCount cache-line
bounce (~2.5x slower at 8-10 cores) that motivates the RCU fix tracked
in APO-675. Benchmark only; not run under plain `go test`.
@linear-code

linear-code Bot commented Jun 16, 2026

Copy link
Copy Markdown

APO-653

APO-654

APO-663

@dilyevsky dilyevsky merged commit 67bd489 into main Jun 16, 2026
1 check failed
@dilyevsky dilyevsky deleted the dsky/apo-653-654-663-routing-trie branch June 16, 2026 07:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant