[handler] routing-trie operator-error family (APO-653, APO-654, APO-663)#4
Merged
Merged
Conversation
…ing (APO-653, APO-654, APO-663) The routing trie indexed networks by Dst via LPM and shared the inner srcTrie by prefix containment, which produced three operator-error defects: - APO-653 (S10): a second virtual network with an identical Src+Dst silently overwrote the first's egress slot. AddVirtualNetwork now rejects a route already owned by a different VNI. - APO-654 (S11): removing one network blackholed another that shared a route, and emptied trie nodes leaked. Removal is now owner-aware and reclaims a Dst entry once its last owner is gone. - APO-663 (S20): a Dst nested inside a broader network's Dst was located via LPM and landed in (clobbering) the broader entry. Management now keys an exact-match dstEntries map; the data-path LPM is unchanged. AddVirtualNetwork checks-then-adds and stores networkByID last; UpdateVirtualNetworkRoutes is atomic with rollback on conflict. The data-path lookup now holds the read lock across both the outer and inner Find, closing a latent unlocked second-tier race (see APO-675). Adds routing_trie_test.go covering duplicate-reject, sibling-preserved + empty-entry reclaim, nested-Dst exact keying, and atomic-update rollback; export_test.go gains RouteLookupForTest / DstEntryCountForTest seams.
b.RunParallel benchmark of the per-packet route lookup under the networksByAddressMu RWMutex versus a lock-free atomic.Pointer snapshot doing byte-identical trie work. Quantifies the readerCount cache-line bounce (~2.5x slower at 8-10 cores) that motivates the RCU fix tracked in APO-675. Benchmark only; not run under plain `go test`.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Fixes the routing-trie operator-error family. The trie indexed networks by Dst via LPM and shared the inner srcTrie by prefix containment, producing three defects:
AddVirtualNetworknow rejects a route owned by a different VNI.dstEntriesmap; the data-path LPM is unchanged.AddVirtualNetworkchecks-then-adds (storesnetworkByIDlast);UpdateVirtualNetworkRoutesis atomic with rollback on conflict. The data-path lookup now holds the read lock across both Finds, closing a latent unlocked second-tier race (the correctness half of APO-675).Tests
routing_trie_test.go: duplicate-reject, sibling-preserved + empty-entry reclaim, nested-Dst exact keying, atomic-update rollback. Teeth-checked (S20 fails if the old LPM-location bug is reintroduced).-raceclean.Second commit
Adds
routing_contention_bench_test.go— ab.RunParallelbenchmark quantifying the RWMutexreaderCountcache-line tax (~2.5× at 8–10 cores) that motivates the RCU follow-up tracked in APO-675. Benchmark only; not run under plaingo test.