Extended-protocol fixes: Flush handling + prepared-statement client-cache consistency by ikeboy003 · Pull Request #953 · postgresml/pgcat

ikeboy003 · 2026-05-26T23:18:54Z

Four commits. Two distinct fixes that share a theme: pgcat losing extended-protocol state in ways the client cannot detect or recover from.

1. Flush ('H') handling — 3 commits

Drivers using describeFirst (e.g. postgres.js with prepare: false + params) send Flush between Parse/Describe and Bind/Execute. pgcat logged Unexpected code: H and dropped it, hanging the client.

Handle Flush ('H') extended-protocol message — drains buffered extended-protocol messages into the send buffer, forwards the Flush byte, keeps the server checked out for the follow-up Bind/Execute/Sync.
Read Flush responses without waiting for ReadyForQuery — Server::recv() only breaks on 'Z', which the server never sends after Flush. Adds recv_flush_response() that breaks on the actual terminal markers (RowDescription / NoData / ErrorResponse / BindComplete).
Flush response: drop double-copy parse and clone — perf: inspect framing byte without re-parsing length or copying the body for messages we only need to identify.

2. Prepared-statement client-cache desync — 1 commit

ensure_prepared_statement_is_on_server on PreparedStatementError:

Removed the entry from the client's prepared_statements map
Returned Ok(())

The client never received Close('S') so it still believed the statement was valid. The next Bind for that name missed in buffer_bind and pgcat dropped the entire TCP conn with ClientError(\"Prepared statement sN doesn't exist\").

For drivers that cache prepared statements per logical connection (tokio-postgres, asyncpg, pgx) and reuse them across many transactions, this is unrecoverable — the client has no way to know its view diverged from pgcat's.

Fix:

Keep the client cache intact — it mirrors what the client believes; dropping it without Close is a protocol-level lie.
Mark the offending server bad so the pool replaces it on next checkout.
Propagate PreparedStatementError so the caller fails this one op cleanly and retries on a fresh backend.

Test plan

cargo build clean
cargo fmt --check
postgres.js prepare: false + params: completes in ~500ms (was hanging)
Happy to add an integration test for Flush + describeFirst if you point me at the right harness

postgres.js (and likely other drivers) sends Flush between Parse/Describe and Bind/Execute when describeFirst is enabled (the default for parameterized non-prepared queries). pgcat was logging 'Unexpected code: H' and dropping the message, leaving the client hung waiting for the ParameterDescription/RowDescription response that never arrived. Add a Flush arm that drains the buffered extended-protocol messages into self.buffer (same as Sync), appends the Flush byte, sends to the server, forwards the server response, and keeps the server checked out so the subsequent Bind/Execute/Sync can complete on the same backend.

The previous commit added a Flush arm in the client loop but reused send_and_receive_loop, which calls Server::recv() — that recv loop only terminates on ReadyForQuery ('Z'). Servers do NOT send ReadyForQuery in response to Flush, so the read hung forever. Add Server::recv_flush_response() that breaks on RowDescription, NoData, ErrorResponse, or BindComplete (the terminal describe / bind response markers). Wire it into the Flush arm via a direct send + recv_flush_response call path. Verified locally: postgres.js with prepare:false + params now completes in ~500ms through pgcat (previously hung).

- recv_flush_response: skip body parse for non-S/non-1 codes (5–N byte protocol frames; framing byte is enough to dispatch). - recv_flush_response: replace self.buffer.clone() with std::mem::take to avoid one memcpy+alloc per Flush response. - Merge separate saw_terminal flag into single matches!() check on the framing byte. Net effect on cloudrest describe-heavy workload: removes one full-buffer clone and one redundant get_u8/get_i32/read_string pass per Flush. At intra-DC RTT (~0.3ms) and 10k+ qps the saved CPU is the dominant win.

…re-prepare failure When a checked-out backend rejected re-Parse of a previously-prepared statement (e.g. after schema-change cleanup, DEALLOCATE, or transient backend state), the old code removed the entry from the *client* prepared_statements HashMap and returned Ok(). This left tokio-postgres clients in an unrecoverable state: the client still believed the statement was valid (it never received a Close), so its very next Bind for that name missed in buffer_bind and the whole TCP conn dropped with 'Prepared statement sN doesn't exist'. For long-lived ETL clients (pgraft's tokio-postgres prepares many shapes via execute() and caches them per-conn by SQL), this killed every write inflight on that conn and orphaned any open COPY-IN, leaving AccessExclusiveLock on temp staging tables until session termination. New behavior: keep the client cache intact (it mirrors the client's own view), mark the misbehaving server bad so the pool replaces it, and propagate PreparedStatementError so the caller can fail this one operation and retry on a fresh backend.

Nathanael Aninweze added 4 commits May 26, 2026 19:16

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Extended-protocol fixes: Flush handling + prepared-statement client-cache consistency#953

Extended-protocol fixes: Flush handling + prepared-statement client-cache consistency#953
ikeboy003 wants to merge 4 commits into
postgresml:mainfrom
YCWagerWise:fix/extended-protocol-flush-and-prepared-cache

ikeboy003 commented May 26, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

ikeboy003 commented May 26, 2026

1. Flush ('H') handling — 3 commits

2. Prepared-statement client-cache desync — 1 commit

Test plan

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant