Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@ Pre-1.0 note: while `pg_durable` is in major version `0`, minor releases may inc
- **Instance/node ID collision hardening (#129):** `df.start()` now reserves IDs with `INSERT ... ON CONFLICT DO NOTHING RETURNING id` and re-rolls the random 8-hex value on collision — instances arbitrate on the `df.instances` primary key (`id`), nodes on the new composite `PRIMARY KEY (instance_id, id)` — replacing the previous `SELECT EXISTS` pre-check. Doing the conflict check at the index level (rather than a pre-check `SELECT`) closes a TOCTOU window and, for instances, an RLS blind spot where the pre-check could not see another role's rows. `df.nodes` now uses the composite `PRIMARY KEY (instance_id, id)` instead of a global single-column key, so the random 8-hex node ID is no longer the sole cross-instance collision guard. The `update-node-status` activity now scopes its `df.nodes` update by `instance_id` (a required activity-input field) and asserts it affects exactly one row. IDs stay `VARCHAR(8)` HEX; the `0.2.3 → 0.2.4` upgrade restructures the `df.nodes` keys in place (#238).
- **Breaking for in-flight work:** the new activity-input shape changes the string duroxide records in orchestration history, and duroxide validates activity inputs by exact equality on replay, so any instance left **in flight across the 0.2.3 → 0.2.4 binary upgrade** cannot complete. Drain or cancel in-flight instances before deploying 0.2.4. The in-place `df.nodes` key restructure also takes an `ACCESS EXCLUSIVE` lock whose duration scales with table size — run the upgrade in a maintenance window. See the #129 section of `docs/upgrade-testing.md` for the full drain-before-upgrade contract.
- **`df.grant_usage()` / `df.revoke_usage()`:** dropped the explicit per-function `EXECUTE` allowlist. Schema `USAGE` on `df` is the real access gate for ordinary `df.*` functions, so the helpers now grant/revoke schema `USAGE`, the table privileges, and `EXECUTE` only on the sensitive functions (`df.http`, `df.grant_usage`, `df.revoke_usage`). Function signatures are unchanged and existing privileges are unaffected (#242).
- **`df.list_instances()` page-size cap is now a loud error (#146):** `df.list_instances()` previously truncated `limit_count` silently to a fixed ceiling of 10000. It now raises an error when `limit_count` exceeds the new `pg_durable.list_instances_max_limit` GUC (`SUSET` context, default `1000`, range `1`–`1000000`), so an over-cap request fails fast instead of returning a silently short page that is indistinguishable from "no more rows". Both the basic and paginated overloads enforce the cap; clients needing more rows should lower `limit_count` or use the paginated overload (`after_cursor`/`next_cursor`). A superuser can change the cap at runtime without a restart; by default an ordinary caller cannot.

### Removed

Expand Down
4 changes: 4 additions & 0 deletions USER_GUIDE.md
Original file line number Diff line number Diff line change
Expand Up @@ -1466,6 +1466,8 @@ Filters (`status_filter`, `label_filter`) are sticky across pages — keep passi

> **Note:** `next_cursor` advances over `df.instances` independently of the per-row execution-metadata lookup. In a brief start-up window a freshly-submitted instance can appear in `df.instances` before its execution metadata is queryable and is omitted from that page; in the rare case where *every* row of a non-final page is omitted, the page returns zero rows (so `next_cursor` can't be read) — retry shortly.

> **Page-size limit:** `limit_count` is capped by the `pg_durable.list_instances_max_limit` GUC (default `1000`). A request for more rows than the cap raises an error instead of being silently truncated — lower `limit_count` or use the paginated overload (`after_cursor`/`next_cursor`) for larger result sets. A superuser can raise the cap at runtime (`ALTER SYSTEM SET pg_durable.list_instances_max_limit = …; SELECT pg_reload_conf();`); by default ordinary callers cannot change it. See [docs/api-reference.md](docs/api-reference.md#pg_durablelist_instances_max_limit).

### Instance Details

```sql
Expand Down Expand Up @@ -1939,6 +1941,8 @@ pg_durable.max_user_connections = 10
pg_durable.execution_acquire_timeout = 30
```

> **Other GUCs:** `pg_durable.list_instances_max_limit` (SUSET context, default `1000`) caps the per-call page size of `df.list_instances()`. Unlike the connection-limit GUCs above, it is superuser-settable at runtime (no restart) and is not loaded from `postgresql.conf` at startup only. See [docs/api-reference.md](docs/api-reference.md#pg_durablelist_instances_max_limit).

### Connection Budget Formula

To calculate the total connections pg_durable will use:
Expand Down
30 changes: 29 additions & 1 deletion docs/api-reference.md
Original file line number Diff line number Diff line change
Expand Up @@ -351,7 +351,7 @@ The two overloads have non-overlapping arities (basic matches 0–2 arguments, p
| Parameter | Type | Default | Description |
|-----------|------|---------|-------------|
| `status_filter` | TEXT | `NULL` (basic only) | Only instances with this status (lowercase: `pending`, `running`, `completed`, `failed`, `cancelled`). `NULL` = any. |
| `limit_count` | INTEGER | `100` (basic only) | Max rows per page (must be ≥ 1; capped at 10000). |
| `limit_count` | INTEGER | `100` (basic only) | Max rows per page (must be ≥ 1). A request above `pg_durable.list_instances_max_limit` (default 1000) raises an error instead of being silently truncated — lower `limit_count`, or use the paginated overload (`after_cursor`) for larger result sets. |
| `label_filter` | TEXT | — (required to select the paginated overload) | Only instances whose label equals this value (issue #87). `NULL` = any. |
| `after_cursor` | TEXT | `NULL` | Opaque keyset cursor from a prior page's `next_cursor`; returns the page that sorts strictly after it (issue #146). `NULL` = first page. |

Expand Down Expand Up @@ -607,3 +607,31 @@ SHOW pg_durable.enable_superuser_instances;

**Security note:** Setting this GUC to `on` in a multi-tenant environment allows any role with `BYPASSRLS` to forge `submitted_by` to a superuser OID and execute arbitrary SQL as superuser. Keep `off` unless you have a specific need and understand the risk. See [docs/superuser_guc.md](superuser_guc.md) for the full threat analysis.

---

### pg_durable.list_instances_max_limit

Maximum number of rows `df.list_instances()` returns in a single call. A request for more rows than this raises an error instead of silently truncating the result, so external clients paginate explicitly (via `after_cursor`/`next_cursor`) rather than relying on a silent cap.

| Property | Value |
|----------|-------|
| Type | `integer` |
| Default | `1000` |
| Range | `1` – `1000000` |
| Context | `SUSET` (superuser can change at runtime; no restart needed) |

Both `df.list_instances()` overloads (basic and paginated) enforce this cap. By default an ordinary (non-superuser) caller cannot raise it, so the guardrail holds from a user session — a superuser may delegate that ability with `GRANT SET ON PARAMETER pg_durable.list_instances_max_limit TO <role>`, but without that grant it stays superuser-settable only.

> **Sizing note:** `df.list_instances()` materializes up to `limit_count` rows per call, so raise the cap only as high as a single response should reasonably hold. For very large exports, prefer paging with `after_cursor`/`next_cursor` over one huge page rather than setting the cap near its maximum.

```sql
-- Inspect the current cap
SHOW pg_durable.list_instances_max_limit;

-- Raise it for an admin reporting workload (requires superuser)
ALTER SYSTEM SET pg_durable.list_instances_max_limit = 5000;
SELECT pg_reload_conf();
```

> **Behavior change (v0.2.4):** prior to v0.2.4, `df.list_instances()` silently truncated `limit_count` to 10000. It now raises an error when `limit_count` exceeds this GUC (default 1000). Callers that previously requested very large pages should lower `limit_count` or use the paginated overload (`after_cursor`/`next_cursor`).

9 changes: 9 additions & 0 deletions docs/upgrade-testing.md
Original file line number Diff line number Diff line change
Expand Up @@ -254,6 +254,15 @@ what the upgrade script handles, and any backward compatibility considerations.
- **Scenario B2 considerations:** No data migration. The `CREATE FUNCTION` adds catalog metadata only; `df.instances` rows are untouched. The new `created_at`/`completed_at` result columns are read from columns that already exist and are already populated on every prior install.
- **Dependent-object note:** Because the upgrade only adds a function (no `DROP FUNCTION`), no customer-owned object that depends on the existing two-argument `df.list_instances` is affected — there is nothing to drop or repoint.

#### `pg_durable.list_instances_max_limit` GUC — page-size cap is now a loud error (issue #146)
- **DDL change:** None. The cap is enforced entirely in the `.so`: a new `pg_durable.list_instances_max_limit` GUC (`SUSET` context, default `1000`, range `1`–`1000000`) is registered in `_PG_init` (`src/lib.rs`) and read on the `df.list_instances()` query path (`src/monitoring.rs`). There are no SQL function, table, or index changes, so this change adds no upgrade-script DDL and has no Scenario A snapshot impact.
- **Behavior change:** Both `df.list_instances()` overloads previously truncated `limit_count` silently to a fixed 10000. They now raise an error when `limit_count` exceeds the GUC (default `1000`), so an over-cap request fails fast instead of returning a silently short page. This is a runtime behavior change to a function that shipped in 0.2.2/0.2.3; it is recorded in `CHANGELOG.md` under the unreleased 0.2.4 changeset.
- **Scenario A considerations:** No schema changes — the `df` schema equivalence contract is unchanged.
- **Scenario B1 considerations:** The new `.so` works against all previous schemas. The guard runs before any SQL is issued and reads no catalog or table state — it only compares the caller's `limit_count` against an in-memory GUC value — so it is correct against a pre-0.2.4 schema that has not run `ALTER EXTENSION UPDATE`. The only visible difference against an old schema is the intended one: a large `limit_count` now errors instead of being silently capped at 10000.
- **Scenario B2 considerations:** No data migration. The GUC has no effect on schema shape or existing data.

This follows the runtime-only precedent of `pg_durable.enable_superuser_instances` below (DDL change: None, enforced entirely in the `.so`).

### v0.2.2 → v0.2.3

#### Rename duroxide provider schema to `_duroxide` for fresh installs
Expand Down
27 changes: 27 additions & 0 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -29,6 +29,16 @@ pub static EXECUTION_ACQUIRE_TIMEOUT: GucSetting<i32> = GucSetting::<i32>::new(3
/// functions are explicitly desired. See docs/superuser_guc.md.
pub static ENABLE_SUPERUSER_INSTANCES: GucSetting<bool> = GucSetting::<bool>::new(false);

/// Maximum page size (`limit_count`) accepted by `df.list_instances()`. A call
/// requesting more rows than this raises an error instead of silently truncating
/// the result, steering external clients toward keyset pagination
/// (`after_cursor`/`next_cursor`) for large result sets (issue #146). Superuser-
/// settable (Suset) so it can be tuned at runtime without a restart.
///
/// NOTE: the default (`1000`) and max (`1_000_000`) below are also documented in
/// `docs/api-reference.md` and `USER_GUIDE.md`; update those mirrors if you change them.
pub static LIST_INSTANCES_MAX_LIMIT: GucSetting<i32> = GucSetting::<i32>::new(1000);

// Module declarations
pub mod activities;
pub mod client;
Expand Down Expand Up @@ -136,6 +146,23 @@ pub extern "C-unwind" fn _PG_init() {
GucFlags::SUPERUSER_ONLY,
);

// First Suset-context GUC in this extension (the connection-limit and
// enable_superuser_instances GUCs above are Postmaster). Suset is deliberate:
// this guardrail is read on the df.list_instances() query path, so a superuser
// must be able to tune it per-session at runtime. Its long description is
// intentionally populated (unlike the c"" connection GUCs) because the
// raise-instead-of-truncate behavior is not obvious from the name alone.
GucRegistry::define_int_guc(
c"pg_durable.list_instances_max_limit",
c"Maximum number of rows df.list_instances() returns in a single call before raising an error",
c"A call to df.list_instances() with limit_count above this value raises an error instead of silently truncating the result. Clients needing more rows should use the paginated df.list_instances overload (after_cursor/next_cursor). Superusers can change this at runtime without a restart; by default ordinary callers cannot raise it.",
&LIST_INSTANCES_MAX_LIMIT,
1,
1_000_000,
GucContext::Suset,
GucFlags::default(),
);

worker::register_background_worker();
}

Expand Down
38 changes: 30 additions & 8 deletions src/monitoring.rs
Original file line number Diff line number Diff line change
Expand Up @@ -97,6 +97,29 @@ fn cursor_timestamp_well_formed(ts: &str) -> bool {
})
}

/// Enforce the configurable upper bound on a `df.list_instances()` page size.
///
/// Both `df.list_instances` overloads share this guard. A `limit_count` below 1 is
/// always rejected. When it exceeds `pg_durable.list_instances_max_limit` (default
/// 1000) the call raises a loud error rather than silently truncating to the cap:
/// a silently short page is indistinguishable from "no more rows" and breaks
/// external clients that page by `limit_count`. Such clients should use keyset
/// pagination (`after_cursor`/`next_cursor`) for large result sets (issue #146).
/// The cap is a superuser-settable (Suset) GUC, so it is tunable at runtime without
/// a restart and, by default, cannot be raised by an ordinary caller — the guardrail
/// holds for ordinary user sessions unless a superuser delegates SET on the parameter.
fn enforce_list_instances_limit(limit_count: i32) {
if limit_count < 1 {
pgrx::error!("limit_count must be at least 1");
}
let max_limit = crate::LIST_INSTANCES_MAX_LIMIT.get();
if limit_count > max_limit {
pgrx::error!(
"df.list_instances: limit_count ({limit_count}) exceeds pg_durable.list_instances_max_limit ({max_limit}); request fewer rows, or use the paginated df.list_instances overload (after_cursor/next_cursor) for large result sets"
);
}
}

/// Batch-fetch (function_name, execution_count, output) for a set of instance ids
/// from duroxide's published `<schema>.get_instance_info(TEXT)` SQL function.
///
Expand Down Expand Up @@ -214,10 +237,7 @@ pub fn list_instances(
name!(output, Option<String>),
),
> {
if limit_count < 1 {
pgrx::error!("limit_count must be at least 1");
}
let limit_count = limit_count.min(10000);
enforce_list_instances_limit(limit_count);

let pg_conn_str = postgres_connection_string();
let provider_schema = backend_duroxide_schema();
Expand Down Expand Up @@ -325,10 +345,7 @@ pub fn list_instances_paged(
name!(next_cursor, Option<String>),
),
> {
if limit_count < 1 {
pgrx::error!("limit_count must be at least 1");
}
let limit_count = limit_count.min(10000);
enforce_list_instances_limit(limit_count);

// Decode the opaque keyset cursor up front. A malformed cursor is a client
// error (the token must be passed back verbatim from a prior next_cursor), so
Expand Down Expand Up @@ -636,6 +653,11 @@ pub fn instance_executions(
if limit_count < 1 {
pgrx::error!("limit_count must be at least 1");
}
// NOTE: df.instance_executions reports a single instance's execution history
// (keyed by instance_id) — a different surface from df.list_instances, with a
// naturally small result set. It intentionally keeps this fixed min(10000)
// clamp rather than the tunable pg_durable.list_instances_max_limit GUC
// (PR5 / #146); give it its own bound if per-instance history ever needs one.
let limit_count = limit_count.min(10000);

let pg_conn_str = postgres_connection_string();
Expand Down
Loading
Loading