Add label_filter, keyset pagination, and timestamps to df.list_instances by crprashant · Pull Request #278 · microsoft/pg_durable

crprashant · 2026-06-29T23:53:59Z

What

Extend df.list_instances() with label filtering, keyset pagination, and timestamp columns — the first-class label→instance→status path agreed on #167.

This is delivered as a new overload of df.list_instances, not a change to the existing function:

Existing 2-arg function — unchanged. df.list_instances(status_filter text, limit_count integer) still returns its 6 columns (instance_id, label, function_name, status, execution_count, output). All existing positional calls (df.list_instances(), df.list_instances('Running'), df.list_instances('Running', 50)) keep resolving to it.
New 4-arg overload df.list_instances(status_filter, limit_count, label_filter, after_cursor) returns the same 6 columns plus three trailing ones: created_at, completed_at, next_cursor.

-- Page 1: newest 50 instances labeled 'nightly-etl'
SELECT instance_id, status, created_at, next_cursor
FROM df.list_instances(NULL, 50, 'nightly-etl');

-- Page 2: pass the prior page's next_cursor as after_cursor
SELECT instance_id, status, created_at, next_cursor
FROM df.list_instances(NULL, 50, 'nightly-etl', '<next_cursor from page 1>');

label_filter matches df.instances.label exactly (NULL = no label filter). Ordering is created_at DESC, id ASC, served by the (created_at DESC, id) indexes added in PR 2. Label-filtered listing is served by a dedicated partial index (label, created_at DESC, id) WHERE label IS NOT NULL, so a label-scoped page stays O(rows-for-label) instead of scanning unrelated instances. next_cursor is an opaque token encoding the last row's (created_at, id); passing it as after_cursor returns the next page via an index-served keyset range predicate (no OFFSET scan). A malformed after_cursor raises a clear error rather than silently returning wrong rows.

Files: src/monitoring.rs (cursor helpers + new overload), sql/pg_durable--0.2.3--0.2.4.sql, src/lib.rs (adds the label index + refines index comments), docs, e2e.

Why

This is PR 4 of the agreed, incremental monitoring-functions plan on #167. It closes #87 (label_filter on list_instances) and addresses #146 (efficient paginated listing for external clients). Keyset pagination keeps page cost constant regardless of how deep into the result set a client reads, and the new overload gives an ergonomic label → instance → status path without forking the monitoring API surface into a separate status_by_label function.

Design note — overload by arity (preserves B1)

Rather than changing the existing function in place (which would alter its tuple shape from 6 to 9 columns), the new capability is a separate Rust function exported under the same SQL name. This keeps the released .so's list_instances_wrapper symbol returning 6 columns, so a customer who loads the new .so but has not yet run ALTER EXTENSION UPDATE (Scenario B1) still gets a correct result from their old 6-column SQL declaration.

The two overloads never collide: the old function's two params both default (matches arity 0–2); the new function defaults only after_cursor, giving it a minimum arity of 3 (matches arity 3–4). The arities are disjoint, so PostgreSQL never reports "function is not unique". pgrx derives the C symbols from the Rust function names, so the two overloads bind to distinct symbols (list_instances_wrapper / list_instances_paged_wrapper).

created_at and completed_at are existing df.instances columns (already populated by the worker); this PR only surfaces them. completed_at is set on transition to completed and is NULL for failed/cancelled.

Upgrade & compatibility

Scenario A (fresh vs upgrade parity): a fresh 0.2.4 install emits both overloads; the upgrade path keeps the unchanged 6-column function from the 0.2.3 base and the upgrade script adds only the new overload. The CREATE FUNCTION in the upgrade script is byte-identical to the pgrx-generated fresh-install DDL, so the Scenario A schema snapshot matches.
Scenario B1 (binary backward compat): the old function is frozen, so its symbol still returns 6 columns against pre-0.2.4 schemas. The new overload reads only columns present in every prior df.instances schema (id, label, status, created_at, completed_at), so it also runs correctly against an un-upgraded schema — just without the new index serving the keyset path.
One index added. completed_at already exists in the base schema, so the upgrade script adds no ALTER TABLE. This PR adds a single partial index — idx_instances_label(label, created_at DESC, id) WHERE label IS NOT NULL — to both the fresh-install DDL (src/lib.rs) and the upgrade script, byte-identical so the Scenario A snapshot matches. The other two df.instances index definitions are unchanged.

0.2.4 is still unreleased, so there is no version bump.

Testing

scripts/test-upgrade.sh — 36/36 (Scenario A byte-identical incl. the new idx_instances_label, B1 against 0.2.2 and 0.2.3, B2 data survival). The zero-arg df.list_instances() B1 check confirms the frozen 6-column path.
scripts/test-e2e-local.sh — 38/38. tests/e2e/sql/05_monitoring_and_explain.sql adds coverage for: label filtering, the new timestamp columns, multi-page keyset pagination via next_cursor/after_cursor, malformed-cursor errors, and completed_at being NULL for a failed instance.
cargo fmt / cargo clippy --features pg17 clean.

This branch is rebased on the latest main (includes #276).

Scope

label_filter + keyset pagination + timestamps only. The truncation-policy GUC is PR 5 per the plan. Deferred as a small follow-up (out of this PR's minimal scope): aligning df.instance_info()'s timestamp surface with the new columns.

A performance review of this PR motivated two access-path hardenings now folded in: (1) the partial idx_instances_label above, so label-filtered pages don't scan unrelated instances; and (2) a redundant created_at <= $ts leading conjunct on the keyset predicate, giving the btree a sargable upper bound so deep pages stay seek-based rather than degrading to offset-like scans. The cursor result set is unchanged — both disjuncts of the existing predicate already imply that bound.

cc @pinodeca

Add a new arity-disjoint overload of df.list_instances exposing label_filter, opaque keyset cursors (after_cursor/next_cursor), and created_at/completed_at, while leaving the existing 6-column function frozen to preserve binary backward compatibility (Scenario B1). Closes microsoft#87, addresses microsoft#146. Add a partial idx_instances_label(label, created_at DESC, id) WHERE label IS NOT NULL so label-filtered pages stay seek-based instead of scanning unrelated instances, and a redundant created_at <= \ leading conjunct on the keyset predicate for a sargable btree bound (result set unchanged). Both index definitions stay byte-identical between the fresh-install DDL and the upgrade script for Scenario A.

crprashant force-pushed the crprashant/list-instances-filter-pagination branch from 4392cb8 to 5bb3732 Compare June 30, 2026 00:40

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add label_filter, keyset pagination, and timestamps to df.list_instances#278

Add label_filter, keyset pagination, and timestamps to df.list_instances#278
crprashant wants to merge 1 commit into
microsoft:mainfrom
crprashant:crprashant/list-instances-filter-pagination

crprashant commented Jun 29, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Uh oh!

Conversation

crprashant commented Jun 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What

Why

Design note — overload by arity (preserves B1)

Upgrade & compatibility

Testing

Scope

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

crprashant commented Jun 29, 2026 •

edited

Loading