Merge/v1.5 variegata into main by evertlammerts · Pull Request #507 · duckdb/duckdb-python

evertlammerts · 2026-06-25T12:22:50Z

No description provided.

…with a live connection / transaction

…only once

…only

This PR unifies arrow exports across query result types, and makes sure we always provide the schema from within a transaction. We are dealing with 3 arrow export types: - Arrow Table - Arrow RecordBatch - Arrow C Stream ... across 3 result types: - StreamingQueryResult - ArrowQueryResult - StreamingQueryResult The `StreamingQueryResult` paths are now unified. We re-feed the backing ColumnDataCollection to the engine for parallel conversion into a `ArrowQueryResult`, and then we delegate to the corresponding `ArrowQueryResult` path. The `ArrowQueryResult` paths deal with materialized data already, and we have no way to plug into the transaction that generated it. The actual fix for this is to cache the schema when creating the `ArrowQueryResult`, during `Finalize`. This is a core change that we will probably apply in v2.0. The workaround is to fetch the schema in a separate transaction. For all paths, since we are already dealing with materialized data, we create an arrow table. Then for the streaming paths we return the corresponding stream types directly from the table. The `StreamingQueryResult` paths always have access to a valid transaction context, and can get the arrow schema on demand even when that requires catalog access. As a side effect of this PR, consuming an arrow c stream (reading from `con.sql(q).__arrow_c_stream__()`) is now lazy, i.e. not materialized. This makes consumption of course slower, but allows streaming much larger datasets. The materialized paths are overall a little faster, and the non-c stream streaming paths as well. ``` ┌───────────────────────────────────────────────────┬────────────────────┬───────────────────┬───────────────────┐ │ benchmark expression │ wall base→now (ms) │ CPU base→now (ms) │ mem base→now (MB) │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ r=con.sql(q); r.execute(); r.to_arrow_table() │ 159 → 161 │ 259 → 286 │ 847 → 875 │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ r=con.sql(q); r.execute(); r.to_arrow_reader() │ 161 → 144 │ 255 → 263 │ 896 → 877 │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ r=con.sql(q); r.execute(); r.__arrow_c_stream__() │ 157 → 136 │ 282 → 235 │ 854 → 881 │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ con.sql(q).to_arrow_table() │ 52 → 35 │ 267 → 244 │ 855 → 854 │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ con.execute(q).to_arrow_table() │ 202 → 174 │ 212 → 193 │ 548 → 554 │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ con.sql(q).to_arrow_reader() │ 186 → 175 │ 199 → 187 │ 552 → 552 │ ├───────────────────────────────────────────────────┼────────────────────┼───────────────────┼───────────────────┤ │ con.sql(q).__arrow_c_stream__() │ 48 → 173 │ 250 → 189 │ 857 → 554 │ └───────────────────────────────────────────────────┴────────────────────┴───────────────────┴───────────────────┘ ```

Bump duckdb submodule: - Target branch: v1.5-variegata - Date: 2026-06-17 07:32:20 - DuckDB SHA: ceb2aef3e30c5c04cf97eea4af3990a274bd49bb - Trigger: https://github.com/duckdb/duckdb-python/actions/runs/27671964362

Bump duckdb submodule: - Target branch: v1.5-variegata - Date: 2026-06-21 06:31:40 - DuckDB SHA: c4770ecba48065b691843da2e6eb9f91e3fea77b - Trigger: https://github.com/duckdb/duckdb-python/actions/runs/27895532903

Periodic forward-merge of release-branch bugfixes into main. Notably brings in duckdb#495 "Unify arrow exports across all query result types" (the materialized slow-path lifetime / connection-GC fix and the test_arrow_refeed suite), replacing main's older SchemaCachingStreamWrapper/ArrowQueryResultStreamWrapper approach. Submodule: external/duckdb is kept at main's pin 0361de441a (v1.5's submodule bumps discarded; git fast-forwarded the gitlink to main's newer pin). Conflict resolution: - .github/workflows/packaging_wheels.yml: applied both intents — v1.5's windows-2025 -> windows-2022 (consistent with targeted_test.yml) and main's ARM64-comment removal. Adaptation for main's newer core: - pyresult.cpp: core's ColumnDataRef now takes vector<Identifier> (not vector<string>); promote the deduplicated scan names to Identifiers explicitly in MakeColumnDataScanStatement. Verified: clean build; tests/fast/arrow + tests/fast/udf = 2436 passed, 0 failed (incl. test_capsule_slow_path_survives_connection_gc and the new test_arrow_refeed suite).

evertlammerts and others added 15 commits June 12, 2026 23:21

Pull materialized CDCs through the engine again for arrow conversion …

426f6cc

…with a live connection / transaction

strip comments

5fd7f69

run schema fetching in same transaction as arrow data conversion and …

7ddc75f

…only once

Get the arrow schema in a separate transaction for materialized data …

c8c35eb

…only

force windows 2022 runners

3211977

Pin duckdb at release hash 08e34c447b

305369d

Bump submodule

fcf4359

[duckdb-labs bot] Bump DuckDB submodule (duckdb#496)

87c56d0

Bump duckdb submodule: - Target branch: v1.5-variegata - Date: 2026-06-17 07:32:20 - DuckDB SHA: ceb2aef3e30c5c04cf97eea4af3990a274bd49bb - Trigger: https://github.com/duckdb/duckdb-python/actions/runs/27671964362

pin torch

61a9822

Bump submodule

e963043

[duckdb-labs bot] Bump DuckDB submodule (duckdb#501)

c400b90

Bump duckdb submodule: - Target branch: v1.5-variegata - Date: 2026-06-21 06:31:40 - DuckDB SHA: c4770ecba48065b691843da2e6eb9f91e3fea77b - Trigger: https://github.com/duckdb/duckdb-python/actions/runs/27895532903

bump submodule to June 26 nightly

4c63e39

QualifiedName and ProfilerPrintFormat

3d73752

evertlammerts merged commit 56c26cc into duckdb:main Jun 26, 2026
15 checks passed

evertlammerts deleted the merge/v1.5-variegata-into-main branch June 26, 2026 09:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Merge/v1.5 variegata into main#507

Merge/v1.5 variegata into main#507
evertlammerts merged 15 commits into
duckdb:mainfrom
evertlammerts:merge/v1.5-variegata-into-main

evertlammerts commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

evertlammerts commented Jun 25, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants