Skip to content

feat(importer): store blog media as gitsheets attachments#109

Merged
themightychris merged 3 commits into
mainfrom
feat/blog-media-attachments
May 30, 2026
Merged

feat(importer): store blog media as gitsheets attachments#109
themightychris merged 3 commits into
mainfrom
feat/blog-media-attachments

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

Before this PR, blog post bodies referenced media via `https://codeforphilly.org/thumbnail//` — every image breaks at cutover when laddr decommissions. This PR captures the original bytes at import time as attachments scoped to the owning blog post and rewrites body references to `/api/attachments/blog-posts//`.

Filenames are human-readable when captions are present:

  • `2023-launchpad-kick-off-event-at-city-hall-3349.jpg`
  • `image-3127.jpg` (when caption is empty)
  • Extension comes from response Content-Type so `/api/attachments/:key`'s ext→MIME inference serves the right content-type header.

Runtime thumbnail resizing (so a 200×200 card doesn't pull a full original) is deferred to #108.

How it works

  1. Translator emits placeholder URLs (`cfp-media:`) in the markdown body for `Item\Media` items, plus rewrites legacy media URLs inside `Item\Embed` HTML. Returns the record alongside a media-asset plan.
  2. Importer's pre-fetch phase fetches every distinct `(mediaId, owner)` pair in parallel (concurrency=4) from `https:///media//original`.
  3. Per response: Content-Type → extension → final filename → final URL. Body placeholders get substituted.
  4. Transact callback: `BlobObject.write` each artifact, `tx['blog-posts'].setAttachments(record, blobs)`, then upsert the record.

Failed media fetches log a warning and drop just the asset — the post still imports with the rest of its body.

Test plan

  • 11 new translator cases: placeholder emission, caption-slug fallback, Embed URL scan, third-party URL pass-through, Order sorting, unknown class warning, empty items
  • 340 API + all web + shared tests pass
  • `npm run type-check && npm run lint` clean

Ship plan

After merge:

  1. Rebuild + push sandbox image
  2. Restart deployment
  3. Re-run laddr importer → `legacy-import` carries fresh content + ~215 attachments
  4. Merge `legacy-import` → `published` → hot-reload picks up

🤖 Generated with Claude Code

themightychris and others added 3 commits May 30, 2026 13:34
Capture each referenced blog media item's bytes at import time, store
as a gitsheets attachment scoped to the owning blog-posts record,
rewrite the body's media URLs to /api/attachments/:key. Filename
format: <caption-slug>-<MediaID>.<ext> or image-<MediaID>.<ext> when
caption is empty. Runtime resizing deferred to #108.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Before this PR, blog post bodies referenced media via
`https://codeforphilly.org/thumbnail/<id>/<dim>` — every image broke
at cutover when laddr decommissions. This PR captures the original
bytes at import time as attachments scoped to the owning blog post
and rewrites body references to /api/attachments/blog-posts/<slug>/
<filename>.

Filename derivation:

  caption non-empty: slugify(caption).slice(0, 80) + '-' + mediaId
                     + '.' + ext           (caption-derived, readable)
  caption empty:     'image-' + mediaId + '.' + ext

Ext comes from response Content-Type (image/jpeg → .jpg, image/png →
.png, etc.). The /api/attachments/:key route already infers
Content-Type from the file extension at serve time, so the extension
is load-bearing.

Translator changes:

  - translateBlogPost now returns { record, mediaAssets } rather than
    just BlogPost. mediaAssets describes the planned attachments
    (mediaId, captionSlug, ownerSlug, sourceUrl).
  - Markdown body emits placeholder URLs (`cfp-media:<mediaId>`) for
    Media items; importer's pre-fetch phase substitutes them with
    final URLs once the file extension is known.
  - Embed HTML gets regex-scanned for codeforphilly.org/(thumbnail|
    media)/<id>/... URLs — those become placeholders too. Third-party
    URLs (YouTube iframes, external sites) pass through untouched.

Importer changes:

  - After translation, fetches every distinct (mediaId, owner) pair
    in parallel with concurrency=4. Failed fetches log a warning and
    drop the asset; the post still imports with the rest of its body.
  - Inside the transact callback, for each post: BlobObject.writes
    each artifact's bytes into the git object DB, calls
    tx['blog-posts'].setAttachments(record, blobs), then upserts the
    record (with placeholder URLs already substituted).

Runtime thumbnail resizing is deferred to #108 — this PR stores
originals only; downsizing on demand is the SPA's problem today.

Tests: 11 translator cases including placeholder emission, caption
fallback, interleaved item classes, Order sorting, unknown class
warning, Embed URL scan + third-party pass-through.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
All 6 validation checkboxes ticked. Notes covers the translator
return-shape refactor ripple, the doubled payload from include=*,
the cross-post MediaID dedup via git object DB hashing, and the
split-join substitution choice.

Follow-ups: runtime thumbnail service (#108), featuredImageKey
wiring (deferred), lazy body loading (#45).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@themightychris themightychris merged commit 8c888c4 into main May 30, 2026
1 check passed
@themightychris themightychris deleted the feat/blog-media-attachments branch May 30, 2026 18:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant