Skip to content

refactor(datashare-python): update artifacts write to match the new artifact spec#58

Open
ClemDoum wants to merge 7 commits into
mainfrom
refactor(datashare-python)/artifact-manifest
Open

refactor(datashare-python): update artifacts write to match the new artifact spec#58
ClemDoum wants to merge 7 commits into
mainfrom
refactor(datashare-python)/artifact-manifest

Conversation

@ClemDoum

@ClemDoum ClemDoum commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

Update artifact write to match the new artifact spec:

{
  "transciption": {
    "status": "complete",
    "taskInput": {<ASRArgs>}
  },
  "structure": {
    "status": "complete",
    "pages": {
    	"total": 12,
    	"pagination": { "type": "filesystem" },
    },
    "taskInput": {<MarkdownContentExtraxtArgs>},
  }
}

Changes

datashare-python

Added

  • added ManifestEntry, ManifestEntryStatus, ArtifactType, TaskArgs, PaginationType, FilesystemPagination, ByteRangesPagination objects in order to help with artifact persistence/serialization

Changed

  • updated the write_artifact function to pop the complete status when starting writing and set it once artifacts have been updated !

extract-python

Changed

  • bumped to extract-python>=0.7.0 to support structure artifacts paginated by byte ranges

@ClemDoum ClemDoum force-pushed the refactor(datashare-python)/artifact-manifest branch from 7349dda to e0e9ece Compare June 29, 2026 14:34
@ClemDoum ClemDoum marked this pull request as ready for review June 29, 2026 14:47
@ClemDoum ClemDoum force-pushed the refactor(datashare-python)/artifact-manifest branch 2 times, most recently from 5b22dad to 698d4cf Compare June 29, 2026 15:09

@pirhoo pirhoo left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, I've made few suggestions!

Comment thread datashare-python/datashare_python/utils.py
Comment thread datashare-python/datashare_python/utils.py Outdated
Comment thread datashare-python/datashare_python/utils.py Outdated
Comment thread datashare-python/datashare_python/objects.py Outdated
ClemDoum and others added 6 commits July 2, 2026 18:09
Co-authored-by: Pierre Romera Zhang <promera@icij.org>
Co-authored-by: Pierre Romera Zhang <promera@icij.org>
# Conflicts:
#	workers/extract-worker/uv.dist.lock
#	workers/extract-worker/uv.lock
@ClemDoum ClemDoum force-pushed the refactor(datashare-python)/artifact-manifest branch 3 times, most recently from 74be2c0 to 38f45ad Compare July 3, 2026 09:46
@ClemDoum ClemDoum force-pushed the refactor(datashare-python)/artifact-manifest branch from 38f45ad to adade91 Compare July 3, 2026 10:24
@ClemDoum ClemDoum requested a review from pirhoo July 3, 2026 10:25

pagination_discriminator = make_enum_discriminator("type", PaginationType)
Pagination = Annotated[
tagged_union(BasePagination.__subclasses__(), lambda x: x.type),

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crucial: the classvar + custom serializer route dumps type correctly, but the union still can't parse
it back. The tag extractor lambda x: x.type returns the FieldInfo (since type is a ClassVar set to
Field(...)), not the enum, so every Pagination payload fails with union_tag_invalid. Pulling the
default off the FieldInfo fixes both directions (verified round-trip on both subclasses):

Suggested change
tagged_union(BasePagination.__subclasses__(), lambda x: x.type),
tagged_union(BasePagination.__subclasses__(), lambda x: x.type.default),

A round-trip test through the Pagination adapter would lock this down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants