refactor(datashare-python): update artifacts write to match the new artifact spec#58
Open
ClemDoum wants to merge 7 commits into
Open
refactor(datashare-python): update artifacts write to match the new artifact spec#58ClemDoum wants to merge 7 commits into
ClemDoum wants to merge 7 commits into
Conversation
7349dda to
e0e9ece
Compare
5b22dad to
698d4cf
Compare
pirhoo
requested changes
Jun 30, 2026
pirhoo
left a comment
Member
There was a problem hiding this comment.
Thanks, I've made few suggestions!
Co-authored-by: Pierre Romera Zhang <promera@icij.org>
Co-authored-by: Pierre Romera Zhang <promera@icij.org>
…ce it's not free threaded compatible
# Conflicts: # workers/extract-worker/uv.dist.lock # workers/extract-worker/uv.lock
74be2c0 to
38f45ad
Compare
38f45ad to
adade91
Compare
pirhoo
reviewed
Jul 3, 2026
|
|
||
| pagination_discriminator = make_enum_discriminator("type", PaginationType) | ||
| Pagination = Annotated[ | ||
| tagged_union(BasePagination.__subclasses__(), lambda x: x.type), |
Member
There was a problem hiding this comment.
crucial: the classvar + custom serializer route dumps type correctly, but the union still can't parse
it back. The tag extractor lambda x: x.type returns the FieldInfo (since type is a ClassVar set to
Field(...)), not the enum, so every Pagination payload fails with union_tag_invalid. Pulling the
default off the FieldInfo fixes both directions (verified round-trip on both subclasses):
Suggested change
| tagged_union(BasePagination.__subclasses__(), lambda x: x.type), | |
| tagged_union(BasePagination.__subclasses__(), lambda x: x.type.default), |
A round-trip test through the Pagination adapter would lock this down.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Description
Update artifact write to match the new artifact spec:
{ "transciption": { "status": "complete", "taskInput": {<ASRArgs>} }, "structure": { "status": "complete", "pages": { "total": 12, "pagination": { "type": "filesystem" }, }, "taskInput": {<MarkdownContentExtraxtArgs>}, } }Changes
datashare-pythonAdded
ManifestEntry,ManifestEntryStatus,ArtifactType,TaskArgs,PaginationType,FilesystemPagination,ByteRangesPaginationobjects in order to help with artifact persistence/serializationChanged
write_artifactfunction to pop thecompletestatus when starting writing and set it once artifacts have been updated !extract-pythonChanged
extract-python>=0.7.0to support structure artifacts paginated by byte ranges