Skip to content

Reject malformed vector column types with exact-match parsing#310

Open
stumpylog wants to merge 1 commit into
asg017:mainfrom
stumpylog:fix/strict-vector-column-type-parsing
Open

Reject malformed vector column types with exact-match parsing#310
stumpylog wants to merge 1 commit into
asg017:mainfrom
stumpylog:fix/strict-vector-column-type-parsing

Conversation

@stumpylog

Copy link
Copy Markdown

The vec0 column-type parser matched element type names with a prefix-only sqlite3_strnicmp (comparing only the first N bytes against "float", "int8", "bit", etc.). Any identifier sharing a prefix with a real type was silently coerced to that type:

  • float16[768] → a 32-bit float column (silently, double the intended storage)
  • bitcoin[2] → a bit column
  • typos like floaty[2] → accepted instead of rejected

This compares the full identifier length so only exact element-type spellings parse. float32 is added as an explicit alias, since it was previously accepted via the float prefix and is a natural spelling to keep working.

This also unblocks adding real float16/bfloat16 types (#27), which would otherwise collide with the float prefix.

Testing

  • New tests/test-column-type-parse.py asserts valid spellings (float, f32, float32, int8, i8, bit) are accepted and lookalikes (float16, bitcoin, floaty, ...) are rejected.
  • Two tests in test-loadable.py that incidentally used the bogus float8[1] were corrected to float[1].
  • Full loadable suite passes.

The vec0 column-type parser matched element type names with a
prefix-only `sqlite3_strnicmp` (e.g. comparing the first 5 bytes
against "float"). Any identifier sharing a prefix with a real type
was silently coerced to that type: `float16[768]` became a 32-bit
`float` column, `bitcoin[2]` became `bit`, and typos like `floaty`
were accepted instead of erroring.

Compare the full identifier length so only exact element-type
spellings parse. `float32` is added as an explicit alias since it
was previously accepted via the `float` prefix and is a natural
spelling to keep working.

This also unblocks adding real `float16`/`bfloat16` types (asg017#27),
which would otherwise collide with the `float` prefix.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant