impl(bigquery): basic row parsing#5916
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces robust error handling and type conversion capabilities for BigQuery query results, including the FromSql trait for mapping database values to Rust types and a schema-aware Row parser. The reviewer feedback highlights critical correctness and performance improvements: addressing unnecessary clones and allocations during row parsing, preventing silent data corruption when handling special float values (like NaN and Infinity), ensuring consistent row-to-schema length validation, and improving error diagnostics by propagating actual field names instead of hardcoding 'unknown' in conversion errors.
|
Splitting errors into #5917. Later will split FromSql initial impl. |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #5916 +/- ##
==========================================
- Coverage 97.88% 97.85% -0.04%
==========================================
Files 236 236
Lines 60183 60379 +196
==========================================
+ Hits 58913 59083 +170
- Misses 1270 1296 +26 ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Split FromSql trait and some basic conversion into PR #5924 |
| pub(crate) mod private { | ||
| /// A sealed trait to prevent external implementation of `ColumnIndex`. | ||
| pub trait Sealed {} |
There was a problem hiding this comment.
I think we more commonly say:
mod sealed {
pub trait ColumnIndex {}
}| "STRING" => Ok(Value::String(value)), | ||
| "BYTES" => Ok(Value::String(value)), | ||
| "TIMESTAMP" => Ok(Value::String(value)), | ||
| "DATE" => Ok(Value::String(value)), | ||
| "TIME" => Ok(Value::String(value)), | ||
| "DATETIME" => Ok(Value::String(value)), | ||
| "NUMERIC" | "BIGNUMERIC" => Ok(Value::String(value)), | ||
| "BIGINT" => Ok(Value::String(value)), | ||
| "GEOGRAPHY" => Ok(Value::String(value)), | ||
| "JSON" => Ok(Value::String(value)), | ||
| "INTERVAL" => Ok(Value::String(value)), | ||
| "RANGE" => Ok(Value::String(value)), |
There was a problem hiding this comment.
optional style nit: combine all of these? "STRING" | "BYTES" | ... | "RANGE" => Ok(Value::String(value))
| }) | ||
| .as_object() | ||
| .expect("row is an object") | ||
| .clone(); |
There was a problem hiding this comment.
optionally, you could make the json Map directly and avoid expect() and the clone():
use serde_json::{json, Map};
let raw_row = Map::from_iter([(
"f".to_string(),
json!([
{ "v": "James" },
{ "v": "272793" },
{ "v": "TRUE" },
{ "v": null },
{ "v": "64.0" },
]),
)]);| #[test_case("UNKNOWN", "value", None; "unknown type")] | ||
| fn convert_basic_type_cases(field_type: &str, value: &str, expected: Option<Value>) { | ||
| let res = convert_basic_type(value.to_string(), "test_col", field_type); | ||
| match expected { |
There was a problem hiding this comment.
Any ifs in a test are a hint that we should separate the tests. We probably want 3 tests:
convert_basic_type_cases_successconvert_basic_type_cases_conversion_failconvert_basic_type_cases_invalid_row_format
| len: self.schema.len(), | ||
| })?; | ||
|
|
||
| T::from_sql(val.clone()).map_err(|e| { |
There was a problem hiding this comment.
Aside: we could probably have an API like a:
pub fn take<T: FromSql, I: ColumnIndex>(&mut self, index: I) -> Result<T>that can extract the value without needing to clone.
It has downsides of replacing with a default value though.
I think it is not necessary if we have a zero-clone way to convert the whole row into an application struct. Just a thought.
Towards #5844