Skip to content

feat: mid-sized structured asset creator workflow#118

Open
droarty wants to merge 4 commits into
mainfrom
issue-117-mid-sized-structured-asset-creator
Open

feat: mid-sized structured asset creator workflow#118
droarty wants to merge 4 commits into
mainfrom
issue-117-mid-sized-structured-asset-creator

Conversation

@droarty

@droarty droarty commented Jun 22, 2026

Copy link
Copy Markdown
Owner

Closes #117. See issue for full plan.

droarty and others added 4 commits June 22, 2026 12:06
Adds a configurable workflow for importing structured data from PDFs via
multi-turn chat with Claude. Users upload a PDF, negotiate a schema
through conversation, and Claude extracts all records. Versioning detects
name/category conflicts immediately after schema proposal, prompting the
user to replace, increment version, or rename before extraction begins.

- New `StructuredAsset` MongoDB model with compound unique index on
  {name, category, assetVersion}
- New `file-chat` and `file-extract` AI step types in WorkflowEngine
  backed by Anthropic Files API (SDK beta namespace)
- `FileExtractService` wraps multi-turn file chat and one-shot extraction
- HTTP upload route `POST /api/documents/channel/:channelId/upload`
  handles Files API upload + initial AI analysis + version check
- `ChatUploadPanel` and `JsonViewer` layout components registered in
  layoutRegistry for use in any workflow config
- Workflow JSON drives the full state machine: idle → upload → schema
  discussion → version-check → confirm → extract → review → persist

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
MongoDB ObjectIds don't serialize to plain strings through structuredClone
+ msgpack, so doc._id arrived on the client as an object — React coerced
it to '[object Object]', causing a duplicate-key warning whenever 2+
documents were in the list.

Fix at both layers: serialize _id in QueryExecutor for all document
queries, and add String() in DocumentList as a defensive fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…results

structuredClone preserves ObjectIds as binary data which msgpack delivers
to the client as plain objects — their toString() returns '[object Object]',
causing duplicate React keys in DocumentList. JSON.parse/stringify calls
ObjectId.toJSON() (returns the hex string) recursively on the whole doc,
fixing _id fields at all nesting levels including state.documents[*]._id.

Also keys DocumentList on currentChannelId (always a UUID string) as a
defensive fallback.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
The worker thread (EventProcessorWorker) uses a raw MongoClient and never
calls mongoose.connect(). Importing StructuredAssetModel (Mongoose) caused
all executeQuery calls to throw silently, so initializeState and defaultView
responses were never sent — producing a permanent "Loading…" screen.

Replace check-asset-exists and persist-structured-asset with raw MongoDB
collection calls to match the rest of QueryExecutor.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

feat: mid-sized structured asset creator workflow

1 participant