feat: streaming FromRow mapping (stream_as) — constant-memory struct-mapped queries (#91)#94
Merged
Merged
Conversation
- Multi-chunk tests now insert 140K rows (> 2x the 65536-row chunk size) so they genuinely cross a chunk boundary and exercise the build-once-then-reuse path. At 5000 rows the whole result fit in a single chunk and the cross-chunk path was untested. - Schema-absent fallback now uses unwrap_or_default() (empty map -> per-row Missing error) in both sync and async paths, matching fetch_all_as, instead of sync silently truncating / async skipping the chunk. - Remove broken intra-doc link to nonexistent Rowset::typed_rows.
…g forms Runnable companion to docs/ROW_MAPPING.md: self-hosts a HyperProcess, seeds the products table, and maps its rows via all five forms (manual streaming, named access, hand-written FromRow, derive(FromRow), and streaming stream_as), each printing identical output. Registered in Cargo.toml and cross-referenced from the doc intro.
…n-all-examples scripts - row_mapping_forms now also demonstrates AsyncConnection::stream_as (impl Stream) alongside the sync iterator, on a small current-thread Tokio runtime; all six sections print identical output. - Register the example in run_all_examples.sh, run_all_examples.ps1, and run_examples_wsl.sh so it runs in the full example sweep.
run_examples_wsl.sh listed ~17 examples that no longer exist (reading_data, struct_mapping, query_builder, geography, catalog_and_schema, the sea_query feature example, etc.) — leftovers from a prior example layout that would fail every run. Aligned its list (and the phantom feature-examples block) with the canonical run_all_examples.sh set. Also added the two examples run_all_examples.ps1 was missing (async_parity_smoke, prepared_statements). All three scripts now run the identical set of registered hyperdb-api examples; benchmarks and the feature-gated compile_time_validation example remain intentionally excluded.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Closes #91.
fetch_all_as::<T>()was the only public API wiringFromRowinto typedstruct mapping, but it calls
fetch_allfirst — collecting all rows into aVec<Row>before any mapping occurs (memory O(total rows)), making itunusable for large or unbounded result sets. The streaming path
(
execute_query+next_chunk) only yielded untypedRows, andRowAccessor::{new,build_indices}arepub(crate), so callers couldn't wire a#[derive(FromRow)]struct into the chunk loop themselves.This PR adds Form 5: streaming
FromRowmapping — a lazy, constant-memorytyped iterator/stream:
Design
The central constraint:
RowAccessor<'a>holdsindices: &'a HashMap<&'a str, usize>whose keys borrow from theResultSchema. A lazy iterator mustpersist the index map across
next()calls, which would make it own both theschema and a map borrowing from it — a self-referential struct, not expressible
in safe Rust.
Resolution: an owned-key index map (
HashMap<String, usize>) for thestreaming path, exposed via a small crate-internal
Indices<'a>enum onRowAccessor:Indices::Borrowed(&HashMap<&str, usize>)— the existing zero-allocfetch_*_ashot path, unchanged.Indices::Owned(&HashMap<String, usize>)— the streaming path, whose map theiterator/stream owns outright.
Both look up by
&strviaHashMap'sBorrowbound, so theget/get_optgetters are agnostic and their error shapes (
Missing/Null/TypeMismatch)are byte-for-byte unchanged. No
unsafe.TypedRowIterator<'conn, T>(crate-private;stream_asreturnsimpl Iterator<Item = Result<T>>) mirrors the existingRowIteratorchunk-buffering pattern.
async_stream::try_stream!awaitsnext_chunk()in a loop;the query string is
.to_owned()so nothing borrows the&strarg across anawait.
The column-name → index map is built exactly once (on the first chunk,
guarded by
Option::is_none) and reused for every row across all chunks, soper-row mapping is O(1) in column count and total memory is O(chunk_size).
Acceptance criteria (issue #91)
Connection::stream_as::<T>(query)returns a lazy iterator ofResult<T>AsyncConnection::stream_asreturnsimpl Stream<Item = Result<T>>docs/ROW_MAPPING.mdas Form 5Error semantics
Errors surface in two places and robust callers handle both:
Result(sync) carries failures detected while opening thestream — transport/connection errors, and on the gRPC transport SQL
parse/server errors. On the default TCP transport the query streams
lazily, so a SQL error such as a missing table is reported as the first
yielded item, not by the outer
Result. (The async stream is fully lazy,so its submission error is always the first
Erritem.)Result<T>— per-row mapping failures (missing column,type mismatch, NULL in a non-
Optionfield) or a transport error hit whilefetching a later chunk.
This is documented on both methods and in the Form 5 doc.
Dependencies
async-stream = "0.3"(new) — powers the asynctry_stream!.futures-core = "0.3"(promoted from transitive to a direct dep) — providesthe
Streamtrait named in the public return type.futures = "0.3"(dev-dependency only) —StreamExt/TryStreamExtfordraining streams in tests.
Testing
11 new integration tests (sync + async parity): happy path matches
fetch_all_as, multi-chunk crossing the 65 536-row chunk boundary (140 000rows — proving build-once-then-reuse across chunks), submit error, per-row map
error, empty result, and lenient extra-column. Plus owned-path unit tests in
row_accessor.rs.Full verification (all clean):
cargo clippy --workspace --all-targets --all-features -- -D warningscargo doc -p hyperdb-api --no-deps --all-featureswithRUSTDOCFLAGS=-D warningscargo test --workspace(debug and release): 1331 passed, 0 failedasync_usage,transactions,arrow,threaded_inserterexamples run end-to-end (exit 0).Review notes
The branch went through an adversarial review pass that caught and fixed three
issues before this PR:
so they never actually crossed a chunk boundary (the cross-chunk path was
untested despite the test name). Bumped to 140 000 rows.
chunk) if a schema were ever absent after a non-empty chunk. Both now fall
back to an empty index map (→ per-row
Missingerror), matchingfetch_all_as'sunwrap_or_default().Rowset::typed_rowsmethodwas removed.