feat(importer): store blog media as gitsheets attachments by themightychris · Pull Request #109 · CodeForPhilly/codeforphilly-ng

themightychris · 2026-05-30T18:00:03Z

Summary

Before this PR, blog post bodies referenced media via `https://codeforphilly.org/thumbnail//` — every image breaks at cutover when laddr decommissions. This PR captures the original bytes at import time as attachments scoped to the owning blog post and rewrites body references to `/api/attachments/blog-posts//`.

Filenames are human-readable when captions are present:

`2023-launchpad-kick-off-event-at-city-hall-3349.jpg`
`image-3127.jpg` (when caption is empty)
Extension comes from response Content-Type so `/api/attachments/:key`'s ext→MIME inference serves the right content-type header.

Runtime thumbnail resizing (so a 200×200 card doesn't pull a full original) is deferred to #108.

How it works

Translator emits placeholder URLs (`cfp-media:`) in the markdown body for `Item\Media` items, plus rewrites legacy media URLs inside `Item\Embed` HTML. Returns the record alongside a media-asset plan.
Importer's pre-fetch phase fetches every distinct `(mediaId, owner)` pair in parallel (concurrency=4) from `https:///media//original`.
Per response: Content-Type → extension → final filename → final URL. Body placeholders get substituted.
Transact callback: `BlobObject.write` each artifact, `tx['blog-posts'].setAttachments(record, blobs)`, then upsert the record.

Failed media fetches log a warning and drop just the asset — the post still imports with the rest of its body.

Test plan

11 new translator cases: placeholder emission, caption-slug fallback, Embed URL scan, third-party URL pass-through, Order sorting, unknown class warning, empty items
340 API + all web + shared tests pass
`npm run type-check && npm run lint` clean

Ship plan

After merge:

Rebuild + push sandbox image
Restart deployment
Re-run laddr importer → `legacy-import` carries fresh content + ~215 attachments
Merge `legacy-import` → `published` → hot-reload picks up

🤖 Generated with Claude Code

Capture each referenced blog media item's bytes at import time, store as a gitsheets attachment scoped to the owning blog-posts record, rewrite the body's media URLs to /api/attachments/:key. Filename format: <caption-slug>-<MediaID>.<ext> or image-<MediaID>.<ext> when caption is empty. Runtime resizing deferred to #108. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Before this PR, blog post bodies referenced media via `https://codeforphilly.org/thumbnail/<id>/<dim>` — every image broke at cutover when laddr decommissions. This PR captures the original bytes at import time as attachments scoped to the owning blog post and rewrites body references to /api/attachments/blog-posts/<slug>/ <filename>. Filename derivation: caption non-empty: slugify(caption).slice(0, 80) + '-' + mediaId + '.' + ext (caption-derived, readable) caption empty: 'image-' + mediaId + '.' + ext Ext comes from response Content-Type (image/jpeg → .jpg, image/png → .png, etc.). The /api/attachments/:key route already infers Content-Type from the file extension at serve time, so the extension is load-bearing. Translator changes: - translateBlogPost now returns { record, mediaAssets } rather than just BlogPost. mediaAssets describes the planned attachments (mediaId, captionSlug, ownerSlug, sourceUrl). - Markdown body emits placeholder URLs (`cfp-media:<mediaId>`) for Media items; importer's pre-fetch phase substitutes them with final URLs once the file extension is known. - Embed HTML gets regex-scanned for codeforphilly.org/(thumbnail| media)/<id>/... URLs — those become placeholders too. Third-party URLs (YouTube iframes, external sites) pass through untouched. Importer changes: - After translation, fetches every distinct (mediaId, owner) pair in parallel with concurrency=4. Failed fetches log a warning and drop the asset; the post still imports with the rest of its body. - Inside the transact callback, for each post: BlobObject.writes each artifact's bytes into the git object DB, calls tx['blog-posts'].setAttachments(record, blobs), then upserts the record (with placeholder URLs already substituted). Runtime thumbnail resizing is deferred to #108 — this PR stores originals only; downsizing on demand is the SPA's problem today. Tests: 11 translator cases including placeholder emission, caption fallback, interleaved item classes, Order sorting, unknown class warning, Embed URL scan + third-party pass-through. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

All 6 validation checkboxes ticked. Notes covers the translator return-shape refactor ripple, the doubled payload from include=*, the cross-post MediaID dedup via git object DB hashing, and the split-join substitution choice. Follow-ups: runtime thumbnail service (#108), featuredImageKey wiring (deferred), lazy body loading (#45). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

themightychris and others added 3 commits May 30, 2026 13:34

themightychris merged commit 8c888c4 into main May 30, 2026
1 check passed

themightychris deleted the feat/blog-media-attachments branch May 30, 2026 18:03

This was referenced May 30, 2026

fix(importer): catch inline markdown image URLs + alt laddr URL shapes #110

Merged

Revive blog posts as a content-typed gitsheets sheet (revises the deferred decision) #45

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(importer): store blog media as gitsheets attachments#109

feat(importer): store blog media as gitsheets attachments#109
themightychris merged 3 commits into
mainfrom
feat/blog-media-attachments

themightychris commented May 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

themightychris commented May 30, 2026

Summary

How it works

Test plan

Ship plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant