shortest_path: Support relationship-type filters and a minimum hop count#2442
Conversation
There was a problem hiding this comment.
Pull request overview
Adds openCypher/Neo4j-aligned enhancements to AGE’s shortest-path SRFs by supporting relationship-type filtering with multiple types and introducing a minimum hop-count constraint (including a DFS/VLE fallback when the minimum exceeds the BFS shortest distance).
Changes:
- Extend
edge_typeshandling to accept an array of relationship types and match edges whose label is in the requested set. - Add
min_hopssupport, with a VLE DFS fallback for the “hard” regime (min_hops> true shortest distance) and guardrails to cap exhaustive enumeration. - Expand regression coverage for multi-type filters, min-hops regimes, error-prefix correctness, and in-cypher Tier 1 forms.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 5 comments.
| File | Description |
|---|---|
| src/backend/utils/adt/age_vle.c | Implements multi-type label filtering, min_hops fallback search, result/materialization caps, and scratch memory context usage. |
| regress/sql/age_shortest_path.sql | Adds regression queries covering new semantics (multi-type filters, min_hops behavior, error-name prefixing, Tier 1 calls). |
| regress/expected/age_shortest_path.out | Updates expected outputs to match new behavior and added regression cases. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
aba432f to
4ff702b
Compare
4ff702b to
6813f19
Compare
Support relationship-type filters and a minimum hop count in shortest_path SRFs age_shortest_path / age_all_shortest_paths gain two related capabilities, both following openCypher / Neo4j semantics. Relationship-type filtering: the edge_types argument now accepts an array of types; an edge matches when its label is any one of the requested types. A bare string or a one-element array keeps the single-type behaviour, an empty string/array or NULL means no filter, and an unknown type matches nothing. sp_run_bfs takes an Oid set rather than a single oid, and sp_compute_paths resolves the argument into that set. Minimum hop count: the new min_hops argument is a lower bound on the path length. When it does not exceed the true shortest distance it imposes no constraint, so the normal BFS shortest-path result is returned. When it exceeds the shortest distance, BFS cannot produce a qualifying path, so the search falls back to the variable-length-edge depth-first engine (sp_minhops_fallback), which enumerates edge-distinct paths (relationship-uniqueness / trail semantics) and returns the shortest path(s) whose length is at least min_hops. This regime permits revisiting a vertex and closed walks back to the start, but never reusing an edge. A private memory context bounds the search and a cost guard caps the number of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the search with a maximum hop count) when the cap is exceeded. The hard regime combined with multiple relationship types is unsupported, because the VLE engine matches a single label; that case raises FEATURE_NOT_SUPPORTED. Regression coverage spans single- and multi-type filters, directed and undirected reachability, multiplicity of equal-length paths, max_hops bounds, NULL and non-existent endpoints, and both min_hops regimes, including a vertex-revisiting longer path (sp_revisit) and a closed-walk cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are exercised as well. Review feedback addressed: 1. Error messages now report the function actually called. age_shortest_path and age_all_shortest_paths share their argument-resolution helpers, which hard-coded an "age_shortest_path" prefix regardless of the caller; the caller's name is now threaded through so each function reports its own (this also corrects a mislabeled multi-type min_hops error). A new regression case (sp_errname) pins the behaviour for both functions. 2. age_all_shortest_paths now bounds the number of materialized result paths. The shortest-path DAG can contain exponentially many equal-length paths, all built up front before the first row streams; enumeration is capped at SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint to narrow the search, mirroring the existing min-hops candidate cap. 3. The BFS search state (visited table, frontier queue, predecessor multiset, and intermediate path arrays) now lives in a private scratch memory context that is deleted once the surviving result Datums are built in the SRF context, rather than persisting in multi_call_memory_ctx for the life of the SRF. This bounds peak memory to the result set plus one search and matches the pattern sp_minhops_fallback already used. 4. A second review round tightened memory hygiene and reporting: the pnstrdup'd relationship-type name is freed once resolved to an oid (it was retained for the life of the SRF) in both the array and scalar cases; the invalid-direction error now carries the called function's name like the other argument errors; the min-hops fallback's private context is renamed to a caller-neutral "age shortest path minhops" (it is shared by both SRFs); and the multi-type label-filter comment is corrected to note that an unknown type merely contributes no matches -- known types in the same set still match, and only an all-unknown set leaves just the zero-length path. 41/41 installcheck. Co-authored-by: Copilot <copilot@github.com> modified: regress/expected/age_shortest_path.out modified: regress/sql/age_shortest_path.sql modified: src/backend/utils/adt/age_vle.c
6813f19 to
feafc50
Compare
gregfelice
left a comment
There was a problem hiding this comment.
LGTM — approving. I read the full age_vle.c diff and traced it against the existing VLE internals; this is a careful, well-tested change.
What I verified:
-
Multi-type label filter —
sp_run_bfsnow matches an edge's label oid against the requested set. Unknown types resolve toInvalidOid(never matches a real edge), so mixed known/unknown sets work and an all-unknown set correctly leaves only the zero-length path — matching the openCypher "unknown type matches nothing" semantics. -
Min-hops fallback — the hand-built
VLE_local_contextinitializes every field the DFS engine andfree_VLE_local_contexttouch (the only unset one,edge_state_hashtable, is created bycreate_VLE_local_state_hashtable). The path-length math is consistent:gid_stack_size(dfs_path_stack)is the hop count, and(2*best_len)+1equalsbuild_VLE_path_container's(ssize*2)+1output size. -
Memory — the BFS/enumeration run in a private
scratchcontext and the surviving Datums are built in the SRF-lifetime context beforescratchis deleted. Dropping the explicithash_destroy(visited)is safe: the HTAB is created withoutHASH_CONTEXT, so its subsidiary context is a child ofscratchand is reclaimed byMemoryContextDelete(as are the collect_all predecessor lists). No per-call leak across the SRF. -
Cost guards —
SP_MAX_RESULT_PATHSandSP_MINHOPS_MAX_PATHScap the explosion cases withPROGRAM_LIMIT_EXCEEDED+ actionable hints, andCHECK_FOR_INTERRUPTSis preserved. -
Tests — good coverage: both min_hops regimes, vertex-revisit (sp_revisit), closed walk (sp_tri), edge-uniqueness, multi-type rejection, the per-function error-name fix (sp_errname), and min>max unsatisfiable.
Two non-blocking notes (no change required):
- The single-path fallback (
!collect_all) still enumerates every path in[min_hops, max_hops]because the DFS yields in arbitrary length order — bounded by the cost guard and consistent with the "provide a maximum hop count" hint, so fine; just noting it's not an early-out. free_VLE_local_context(vlelctx)immediately followed byMemoryContextDelete(tmpctx)is slightly redundant since everything lives in tmpctx, but it's harmless and mirrors the normal VLE teardown path.
Support relationship-type filters and a minimum hop count in shortest_path SRFs
age_shortest_path / age_all_shortest_paths gain two related capabilities, both following openCypher / Neo4j semantics.
Relationship-type filtering: the edge_types argument now accepts an array of types; an edge matches when its label is any one of the requested types. A bare string or a one-element array keeps the single-type behaviour, an empty string/array or NULL means no filter, and an unknown type matches nothing. sp_run_bfs takes an Oid set rather than a single oid, and sp_compute_paths resolves the argument into that set.
Minimum hop count: the new min_hops argument is a lower bound on the path length. When it does not exceed the true shortest distance it imposes no constraint, so the normal BFS shortest-path result is returned. When it exceeds the shortest distance, BFS cannot produce a qualifying path, so the search falls back to the variable-length-edge depth-first engine (sp_minhops_fallback), which enumerates edge-distinct paths (relationship-uniqueness / trail semantics) and returns the shortest path(s) whose length is at least min_hops. This regime permits revisiting a vertex and closed walks back to the start, but never reusing an edge. A private memory context bounds the search and a cost guard caps the number of examined paths, raising PROGRAM_LIMIT_EXCEEDED (with a hint to bound the search with a maximum hop count) when the cap is exceeded. The hard regime combined with multiple relationship types is unsupported, because the VLE engine matches a single label; that case raises FEATURE_NOT_SUPPORTED.
Regression coverage spans single- and multi-type filters, directed and undirected reachability, multiplicity of equal-length paths, max_hops bounds, NULL and non-existent endpoints, and both min_hops regimes, including a vertex-revisiting longer path (sp_revisit) and a closed-walk cycle back to the start (sp_tri). The in-cypher() Tier 1 call forms are exercised as well.
Review feedback addressed:
Error messages now report the function actually called. age_shortest_path and age_all_shortest_paths share their argument-resolution helpers, which hard-coded an "age_shortest_path" prefix regardless of the caller; the caller's name is now threaded through so each function reports its own (this also corrects a mislabeled multi-type min_hops error). A new regression case (sp_errname) pins the behaviour for both functions.
age_all_shortest_paths now bounds the number of materialized result paths. The shortest-path DAG can contain exponentially many equal-length paths, all built up front before the first row streams; enumeration is capped at SP_MAX_RESULT_PATHS (1,000,000), raising PROGRAM_LIMIT_EXCEEDED with a hint to narrow the search, mirroring the existing min-hops candidate cap.
The BFS search state (visited table, frontier queue, predecessor multiset, and intermediate path arrays) now lives in a private scratch memory context that is deleted once the surviving result Datums are built in the SRF context, rather than persisting in multi_call_memory_ctx for the life of the SRF. This bounds peak memory to the result set plus one search and matches the pattern sp_minhops_fallback already used.
41/41 installcheck.
Co-authored-by: Copilot copilot@github.com
modified: regress/expected/age_shortest_path.out
modified: regress/sql/age_shortest_path.sql
modified: src/backend/utils/adt/age_vle.c