pthread_mutex in manage_GRAPH_global_contexts causes permanent self-deadlock on VLE queries

**Describe the bug**

`pthread_mutex` in `manage_GRAPH_global_contexts()` causes permanent self-deadlock on VLE queries. When `ereport(ERROR)` is raised while the mutex is held (e.g., statement timeout, query cancellation, OOM), PostgreSQL's `siglongjmp` jumps to the error handler, skipping `pthread_mutex_unlock()`. The mutex remains permanently locked. Any subsequent VLE query on the same backend connection deadlocks on itself — the process hangs forever in `pthread_mutex_lock()` with `__owner == own PID`.

**How are you accessing AGE (Command line, driver, etc.)?**
- psql (command line), but the bug affects any client/driver.

**What data setup do we need to do?**
```pgsql
LOAD 'age';
SET search_path = ag_catalog, "$user", public;

SELECT create_graph('test_deadlock');

SELECT * FROM cypher('test_deadlock', $$
    UNWIND range(1, 50000) AS i
    CREATE (:Node {id: i})
$$) AS (v agtype);

SELECT * FROM cypher('test_deadlock', $$
    MATCH (a:Node), (b:Node)
    WHERE b.id = a.id + 1
    CREATE (a)-[:LINK {weight: a.id}]->(b)
$$) AS (e agtype);

-- Load graph context into cache first
SELECT * FROM cypher('test_deadlock', $$
    MATCH path = (a)-[r*1..2]->(b)
    RETURN path LIMIT 1
$$) AS (path agtype);

-- Invalidate cached context by modifying the graph
SELECT * FROM cypher('test_deadlock', $$
    CREATE (:Dummy {x: 1})
$$) AS (v agtype);
```

**What is the necessary configuration info needed?**
- Any AGE version that includes PR #1881 
- PostgreSQL 16, 17, or 18
- No special configuration needed

**What is the command that caused the error?**

Repeat cache-invalidate + timeout in a loop. The timeout must hit during graph context reload (while the mutex is held). It may take a few iterations depending on machine speed.

```pgsql
-- Repeat: invalidate cache, then cancel VLE query via statement_timeout.
-- Each round has a chance of hitting the mutex-held window.
-- Once it hits, every subsequent VLE query on this connection hangs forever.

-- Round 1
SELECT * FROM cypher('test_deadlock', $$ CREATE (:T1 {x: 1}) $$) AS (v agtype);
SET statement_timeout = '1ms';
SELECT * FROM cypher('test_deadlock', $$
    MATCH path = (a)-[r*1..3]->(b) RETURN path LIMIT 1
$$) AS (path agtype);
RESET statement_timeout;

-- Round 2
SELECT * FROM cypher('test_deadlock', $$ CREATE (:T2 {x: 2}) $$) AS (v agtype);
SET statement_timeout = '1ms';
SELECT * FROM cypher('test_deadlock', $$
    MATCH path = (a)-[r*1..3]->(b) RETURN path LIMIT 1
$$) AS (path agtype);
RESET statement_timeout;

-- Round 3
SELECT * FROM cypher('test_deadlock', $$ CREATE (:T3 {x: 3}) $$) AS (v agtype);
SET statement_timeout = '1ms';
SELECT * FROM cypher('test_deadlock', $$
    MATCH path = (a)-[r*1..3]->(b) RETURN path LIMIT 1
$$) AS (path agtype);
RESET statement_timeout;

-- (add more rounds if needed)

-- Final test: if any round above hit the mutex window,
-- this query hangs forever (self-deadlock).
SELECT * FROM cypher('test_deadlock', $$
    MATCH path = (a)-[r*1..2]->(b) RETURN path LIMIT 1
$$) AS (path agtype);
-- If it returns results, add more rounds above and retry.
```

To confirm with GDB:
```bash
gdb -batch -p <hung_pid> \
  -ex "print global_graph_contexts_container.mutex_lock.__data.__owner"
# Output: $1 = <hung_pid>   (owner == self → self-deadlock)
```

**Expected behavior**

VLE queries should continue to work normally after a query error or cancellation. A statement timeout on one query should not permanently break the backend connection.

**Environment (please complete the following information):**
- AGE Version: master (also affects PG16, PG17, PG18 branches — any version with PR #1881)
- PostgreSQL Version: 16, 17, 18

**Additional context**

The mutex was introduced in PR #1881 (fix for issue #1878). However, it is both unnecessary and harmful:

1. **Unnecessary:** The protected variable is a process-local `static` — no concurrent access exists. The test failure in #1878 was a catalog-level race, already fixed by the Assert→runtime check and `strndup` in the same PR. For cross-backend cache invalidation, PostgreSQL syscache uses `sinval` callbacks, and AGE PR #2376 already uses lock-free `pg_atomic_uint64` version counters in shared memory for this.

2. **Harmful:** `pthread_mutex` is incompatible with PostgreSQL's error handling. `ereport(ERROR)` uses `siglongjmp` to jump directly to the error handler, bypassing all code between the error site and the handler — including `pthread_mutex_unlock()`. Once skipped, the mutex is permanently locked for that backend process, and any subsequent VLE query self-deadlocks.

We will submit a PR with a fix.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

pthread_mutex in manage_GRAPH_global_contexts causes permanent self-deadlock on VLE queries #2432

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

pthread_mutex in manage_GRAPH_global_contexts causes permanent self-deadlock on VLE queries #2432

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions