Optimize id()/start_id()/end_id() with direct column access by jrgemignani · Pull Request #2295 · apache/age

jrgemignani · 2025-12-30T22:38:12Z

NOTE: This PR was created with AI tools and a human

Optimize id()/start_id()/end_id() with direct graphid column access.

Add optimization to avoid rebuilding full vertex/edge objects when only
the id, start_id, or end_id is needed. Instead of calling age_id() on a
reconstructed _agtype_build_vertex/_agtype_build_edge, directly access
the underlying graphid column. This enables PostgreSQL to use B-tree
indexes on graphid columns for significantly improved query performance.

Implementation:

Export hidden columns as raw graphid type (not agtype-wrapped) when
entities pass between clauses via export_entity_hidden_columns()
Store Var references (id_var, start_id_var, end_id_var) in
transform_entity
try_optimize_id_funcs() returns raw graphid Var for optimizable
patterns
Add cross-type comparison operators (graphid vs int8) with btree
support
Update graphid_ops operator class to include cross-type operators,
enabling Index Cond scans instead of post-scan Filter evaluation
Add graphid support to agtype_volatile_wrapper() for SET/MERGE
contexts

Optimized patterns:
MATCH (p) WHERE id(p) > 0 RETURN p
MATCH ()-[e]->() WHERE id(e) > 0 RETURN e
MATCH ()-[e]->() WHERE start_id(e) > 0 RETURN e
MATCH ()-[e]->() WHERE end_id(e) > 0 RETURN e

All regression tests passed.
Added regression tests for the new graphid operators.

Files changed:

src/include/parser/cypher_transform_entity.h: Add id_var, start_id_var,
end_id_var fields to transform_entity struct
src/backend/parser/cypher_clause.c: Export raw graphid columns
src/backend/parser/cypher_expr.c: Add try_optimize_id_funcs(),
extract_id_var_from_entity_expr(); return raw graphid Var
src/backend/utils/adt/graphid.c: Add 12 cross-type comparison functions
(graphid_eq_int8, etc.) and 2 btree comparison functions
(graphid_btree_cmp_int8, int8_btree_cmp_graphid)
src/backend/utils/adt/agtype.c: Add GRAPHIDOID to
agtype_volatile_wrapper()
sql/age_main.sql: Add cross-type operators, functions, and update
graphid_ops operator class with cross-type btree support
age--1.6.0--y.y.y.sql: Add upgrade definitions for new functions,
operators, and operator class
regress/sql/graphid.sql: Add cross-type operator and index tests
regress/sql/cypher_match.sql: Add WHERE optimization tests
regress/expected/*.out: Update expected output

Copilot

Pull request overview

This PR optimizes the execution of id(), start_id(), and end_id() functions by avoiding full vertex/edge object reconstruction. Instead of calling these functions on rebuilt _agtype_build_vertex/_agtype_build_edge expressions, the implementation directly accesses the underlying graphid columns and wraps them with graphid_to_agtype().

Key Changes:

Added hidden column export mechanism that stores id, start_id, and end_id as target entries when entities pass between clauses
Implemented cross-clause and current-clause optimization patterns in try_optimize_id_funcs()
Added validation logic to prevent stale Var references after WITH clauses

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file

File	Description
src/include/parser/cypher_transform_entity.h	Added id_var, start_id_var, end_id_var, and props_var fields to transform_entity struct to store exposed column Vars
src/backend/parser/cypher_transform_entity.c	Initialized the new Var fields to NULL in make_transform_entity
src/backend/parser/cypher_expr.c	Implemented optimization logic with try_optimize_id_funcs, extract_id_var_from_entity_expr, and find_entity_in_current_cpstate; includes unrelated whitespace changes
src/backend/parser/cypher_clause.c	Added export_entity_hidden_columns to export id columns as hidden target entries and update_entity_vars_from_rte to update entity Vars from RTE columns
regress/sql/cypher_match.sql	Added 13 test cases covering WHERE clause optimizations for both current-clause and cross-clause scenarios
regress/sql/cypher_with.sql	Added 12 test cases verifying id functions work correctly through WITH clauses, including chained WITH and aliasing
regress/expected/cypher_match.out	Updated expected output showing optimized query plans using graphid_to_agtype instead of age_id
regress/expected/cypher_with.out	Updated expected output with correct results for WITH clause id function tests

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot

Pull request overview

Copilot reviewed 8 out of 8 changed files in this pull request and generated 2 comments.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Optimize id()/start_id()/end_id() with direct graphid column access. NOTE: This PR is created with AI tools and a human. Add optimization to avoid rebuilding full vertex/edge objects when only the id, start_id, or end_id is needed. Instead of calling age_id() on a reconstructed _agtype_build_vertex/_agtype_build_edge, directly access the underlying graphid column. This enables PostgreSQL to use B-tree indexes on graphid columns for significantly improved query performance. Implementation: - Export hidden columns as raw graphid type (not agtype-wrapped) when entities pass between clauses via export_entity_hidden_columns() - Store Var references (id_var, start_id_var, end_id_var) in transform_entity - try_optimize_id_funcs() returns raw graphid Var for optimizable patterns - Add cross-type comparison operators (graphid vs int8) with btree support - Update graphid_ops operator class to include cross-type operators, enabling Index Cond scans instead of post-scan Filter evaluation - Add graphid support to agtype_volatile_wrapper() for SET/MERGE contexts Optimized patterns: MATCH (p) WHERE id(p) > 0 RETURN p MATCH ()-[e]->() WHERE id(e) > 0 RETURN e MATCH ()-[e]->() WHERE start_id(e) > 0 RETURN e MATCH ()-[e]->() WHERE end_id(e) > 0 RETURN e All regression tests passed. Added regression tests for the new graphid operators. Files changed: - src/include/parser/cypher_transform_entity.h: Add id_var, start_id_var, end_id_var fields to transform_entity struct - src/backend/parser/cypher_clause.c: Export raw graphid columns - src/backend/parser/cypher_expr.c: Add try_optimize_id_funcs(), extract_id_var_from_entity_expr(); return raw graphid Var - src/backend/utils/adt/graphid.c: Add 12 cross-type comparison functions (graphid_eq_int8, etc.) and 2 btree comparison functions (graphid_btree_cmp_int8, int8_btree_cmp_graphid) - src/backend/utils/adt/agtype.c: Add GRAPHIDOID to agtype_volatile_wrapper() - sql/age_main.sql: Add cross-type operators, functions, and update graphid_ops operator class with cross-type btree support - age--1.6.0--y.y.y.sql: Add upgrade definitions for new functions, operators, and operator class - regress/sql/graphid.sql: Add cross-type operator and index tests - regress/sql/cypher_match.sql: Add WHERE optimization tests - regress/expected/*.out: Update expected output

gregfelice

Reviewed with a focus on the two things that could actually go wrong here: cross-type btree index correctness and the safety of the id()/start_id()/end_id() rewrite. Both hold up — nice work. Details and a few findings below.

Cross-type btree is safe. My main concern with a new cross-type opclass is a signedness/ordering inconsistency that could produce wrong index results. But graphid is typedef int64 (and F_GRAPHIDEQ == F_INT8EQ), so graphid vs int8 is literally int64 vs int64. graphid_btree_cmp_int8 / int8_btree_cmp_graphid do plain signed compares returning -1/0/1 — a consistent total order identical to int8 and to id()'s existing agtype-integer semantics. The graphid_ops wiring is well-formed: same-type strategies 1-5, cross-type (graphid,int8) strategies 1-5, and support FUNCTION 1 for both (graphid,graphid) and (graphid,int8). So the indexed path returns the same answers as the old reconstruct-and-compare path.

Rewrite guards are appropriately conservative. try_optimize_id_funcs only fires on a bare single-part ColumnRef argument, looks the entity up only in the current cpstate (avoiding invalid varnos inside EXISTS/subqueries), and explicitly bails in EXPR_KIND_INSERT_TARGET (MERGE/CREATE). Null semantics are preserved too: an unmatched OPTIONAL MATCH entity yields a NULL graphid column, so WHERE id(p) > 0 filters it the same as id(null) > 0 -> null.

Findings (all minor / non-correctness):

Cross-clause propagation gap (also the subject of one of the Copilot notes). export_entity_hidden_columns skips entities not declared in the current clause (if (!entity->declared_in_current_clause) continue;), yet the doc comment says it re-exports previous-clause entities via their Vars. For chains of 3+ clauses the id_var likely won't propagate past the second boundary, so the optimization silently falls back to the slow path. Results stay correct — it's a missed optimization plus an inaccurate comment. Worth confirming whether multi-hop propagation is intended, and fixing the comment either way.
Memory leak (Copilot): col_name from psprintf is reassigned three times without pfree. Per-query context so it's bounded, but worth tidying.
Minor (Copilot): the misplaced/duplicated doc block before find_entity_in_current_cpstate, and exported_names appending entity_name without pstrdup (safe only while those pointers are stable).
Possible opfamily asymmetry: I don't see the reverse (int8, graphid) operators / int8_btree_cmp_graphid added to graphid_ops, so WHERE <int8col> OP id(p) may not get an index scan. The commutator covers the common 0 < id(p) form, so this is a missed-opt at most — just flagging in case both directions were intended.

Mechanical blockers before merge: the PR is currently CONFLICTING with master and needs a rebase (it dates to Dec 2025 and the parser/agtype areas it touches have moved quite a bit since), and CI is stale — worth a fresh green run after the rebase.

Overall I'd support this after a rebase, a green CI run, and the comment/leak tidy-ups; the core approach and the index-correctness story are solid.

jrgemignani requested review from MuhammadTahaNaveed and Copilot December 30, 2025 22:38

github-actions Bot added master override-stale To keep issues/PRs untouched from stale action labels Dec 30, 2025

Copilot started reviewing on behalf of jrgemignani December 30, 2025 22:38 View session

Copilot AI reviewed Dec 30, 2025

View reviewed changes

Comment thread src/backend/parser/cypher_clause.c

Comment thread src/backend/parser/cypher_expr.c

Comment thread src/backend/parser/cypher_clause.c

Comment thread src/backend/parser/cypher_clause.c

jrgemignani force-pushed the add_id_optimization branch from eac9cf1 to 3ffb136 Compare December 30, 2025 23:18

jrgemignani requested a review from Copilot December 30, 2025 23:57

Copilot started reviewing on behalf of jrgemignani December 30, 2025 23:58 View session

Copilot AI reviewed Dec 31, 2025

View reviewed changes

Comment thread src/backend/parser/cypher_clause.c

Comment thread src/backend/parser/cypher_clause.c

jrgemignani force-pushed the add_id_optimization branch from 3ffb136 to ae06570 Compare January 5, 2026 20:56

gregfelice reviewed Jun 26, 2026

View reviewed changes

gregfelice mentioned this pull request Jun 26, 2026

Propagate null through unnest and single() three-valued logic #2406

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Optimize id()/start_id()/end_id() with direct column access#2295

Optimize id()/start_id()/end_id() with direct column access#2295
jrgemignani wants to merge 1 commit into
apache:masterfrom
jrgemignani:add_id_optimization

jrgemignani commented Dec 30, 2025 •

edited

Loading

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

gregfelice left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Uh oh!

Conversation

jrgemignani commented Dec 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Uh oh!

Uh oh!

gregfelice left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

jrgemignani commented Dec 30, 2025 •

edited

Loading