Add label_filter, keyset pagination, and timestamps to df.list_instances#278
Open
crprashant wants to merge 1 commit into
Open
Add label_filter, keyset pagination, and timestamps to df.list_instances#278crprashant wants to merge 1 commit into
crprashant wants to merge 1 commit into
Conversation
Add a new arity-disjoint overload of df.list_instances exposing label_filter, opaque keyset cursors (after_cursor/next_cursor), and created_at/completed_at, while leaving the existing 6-column function frozen to preserve binary backward compatibility (Scenario B1). Closes microsoft#87, addresses microsoft#146. Add a partial idx_instances_label(label, created_at DESC, id) WHERE label IS NOT NULL so label-filtered pages stay seek-based instead of scanning unrelated instances, and a redundant created_at <= \ leading conjunct on the keyset predicate for a sargable btree bound (result set unchanged). Both index definitions stay byte-identical between the fresh-install DDL and the upgrade script for Scenario A.
4392cb8 to
5bb3732
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
What
Extend
df.list_instances()with label filtering, keyset pagination, and timestamp columns — the first-class label→instance→status path agreed on #167.This is delivered as a new overload of
df.list_instances, not a change to the existing function:df.list_instances(status_filter text, limit_count integer)still returns its 6 columns (instance_id, label, function_name, status, execution_count, output). All existing positional calls (df.list_instances(),df.list_instances('Running'),df.list_instances('Running', 50)) keep resolving to it.df.list_instances(status_filter, limit_count, label_filter, after_cursor)returns the same 6 columns plus three trailing ones:created_at,completed_at,next_cursor.label_filtermatchesdf.instances.labelexactly (NULL = no label filter). Ordering iscreated_at DESC, id ASC, served by the(created_at DESC, id)indexes added in PR 2. Label-filtered listing is served by a dedicated partial index(label, created_at DESC, id) WHERE label IS NOT NULL, so a label-scoped page stays O(rows-for-label) instead of scanning unrelated instances.next_cursoris an opaque token encoding the last row's(created_at, id); passing it asafter_cursorreturns the next page via an index-served keyset range predicate (no OFFSET scan). A malformedafter_cursorraises a clear error rather than silently returning wrong rows.Files:
src/monitoring.rs(cursor helpers + new overload),sql/pg_durable--0.2.3--0.2.4.sql,src/lib.rs(adds the label index + refines index comments), docs, e2e.Why
This is PR 4 of the agreed, incremental monitoring-functions plan on #167. It closes #87 (
label_filteronlist_instances) and addresses #146 (efficient paginated listing for external clients). Keyset pagination keeps page cost constant regardless of how deep into the result set a client reads, and the new overload gives an ergonomic label → instance → status path without forking the monitoring API surface into a separatestatus_by_labelfunction.Design note — overload by arity (preserves B1)
Rather than changing the existing function in place (which would alter its tuple shape from 6 to 9 columns), the new capability is a separate Rust function exported under the same SQL name. This keeps the released
.so'slist_instances_wrappersymbol returning 6 columns, so a customer who loads the new.sobut has not yet runALTER EXTENSION UPDATE(Scenario B1) still gets a correct result from their old 6-column SQL declaration.The two overloads never collide: the old function's two params both default (matches arity 0–2); the new function defaults only
after_cursor, giving it a minimum arity of 3 (matches arity 3–4). The arities are disjoint, so PostgreSQL never reports "function is not unique". pgrx derives the C symbols from the Rust function names, so the two overloads bind to distinct symbols (list_instances_wrapper/list_instances_paged_wrapper).created_atandcompleted_atare existingdf.instancescolumns (already populated by the worker); this PR only surfaces them.completed_atis set on transition tocompletedand is NULL forfailed/cancelled.Upgrade & compatibility
CREATE FUNCTIONin the upgrade script is byte-identical to the pgrx-generated fresh-install DDL, so the Scenario A schema snapshot matches.df.instancesschema (id, label, status, created_at, completed_at), so it also runs correctly against an un-upgraded schema — just without the new index serving the keyset path.completed_atalready exists in the base schema, so the upgrade script adds noALTER TABLE. This PR adds a single partial index —idx_instances_label(label, created_at DESC, id) WHERE label IS NOT NULL— to both the fresh-install DDL (src/lib.rs) and the upgrade script, byte-identical so the Scenario A snapshot matches. The other twodf.instancesindex definitions are unchanged.0.2.4is still unreleased, so there is no version bump.Testing
scripts/test-upgrade.sh— 36/36 (Scenario A byte-identical incl. the newidx_instances_label, B1 against 0.2.2 and 0.2.3, B2 data survival). The zero-argdf.list_instances()B1 check confirms the frozen 6-column path.scripts/test-e2e-local.sh— 38/38.tests/e2e/sql/05_monitoring_and_explain.sqladds coverage for: label filtering, the new timestamp columns, multi-page keyset pagination vianext_cursor/after_cursor, malformed-cursor errors, andcompleted_atbeing NULL for a failed instance.cargo fmt/cargo clippy --features pg17clean.This branch is rebased on the latest
main(includes #276).Scope
label_filter+ keyset pagination + timestamps only. The truncation-policy GUC is PR 5 per the plan. Deferred as a small follow-up (out of this PR's minimal scope): aligningdf.instance_info()'s timestamp surface with the new columns.A performance review of this PR motivated two access-path hardenings now folded in: (1) the partial
idx_instances_labelabove, so label-filtered pages don't scan unrelated instances; and (2) a redundantcreated_at <= $tsleading conjunct on the keyset predicate, giving the btree a sargable upper bound so deep pages stay seek-based rather than degrading to offset-like scans. The cursor result set is unchanged — both disjuncts of the existing predicate already imply that bound.cc @pinodeca