Skip to content

fix(importer): assemble blog body from laddr's items array (closes #106)#107

Merged
themightychris merged 1 commit into
mainfrom
fix/blog-body-import
May 30, 2026
Merged

fix(importer): assemble blog body from laddr's items array (closes #106)#107
themightychris merged 1 commit into
mainfrom
fix/blog-body-import

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

The cutover-blog importer (PR #101) populated 138 blog posts but bodies came through empty. Root cause: laddr's `/blog?format=json` doesn't expose a `Body` field at all. The body lives in a typed `items` array on Emergence's `AbstractContent`, only surfaced when the request uses `?include=*`.

Three item classes appear in production:

Class `Data` shape Rendering
`Item\Markdown` string append verbatim
`Item\Media` `{ MediaID, Caption }` `Caption`
`Item\Embed` string (raw HTML) append as-is (CommonMark allows raw HTML blocks)

Items are sorted by `Order` before joining with blank-line separators. `markdownlint` on gitsheets serialize normalizes any drift.

Drops the unused `Body` field from `RawBlogPostSchema` (laddr never returned it). Adds `items` + a new `RawBlogPostItem` type.

Closes #106.

Test plan

  • 5 new translator cases: interleaved item classes (Markdown/Media/Embed in one body), Order-based sorting, absent-items → empty body, unknown-class warning + skip, falls back gracefully
  • Orchestrator mock seeds `items` and expects `?include=*` query
  • 32 importer tests pass; full type-check + lint clean

Ship plan

After merge:

  1. Build + push new sandbox image (same `:sandbox` tag)
  2. `kubectl rollout restart deploy/codeforphilly` (image tag is mutable so a digest pull isn't automatic)
  3. Re-run `npm run -w apps/api script:import-laddr` against `codeforphilly.org` → fresh `legacy-import` commit with full bodies
  4. Merge `legacy-import` → `published` → hot-reload picks up the new content

🤖 Generated with Claude Code

Earlier import populated 138 blog posts with titles/authors/summaries
but empty bodies — laddr's /blog?format=json doesn't expose a Body
field at all. Bodies live in a typed `items` array on Emergence
AbstractContent and are only surfaced when the request uses
`?include=*`.

Three item classes appear in production:

  Emergence\CMS\Item\Markdown — Data is the raw markdown body
                                fragment. Append verbatim.
  Emergence\CMS\Item\Media    — Data is { MediaID, Caption }. Render
                                as ![Caption](https://codeforphilly.org/
                                thumbnail/<id>/1920x1920). Eventually
                                images should migrate into the data
                                repo as attachments — separate concern.
  Emergence\CMS\Item\Embed    — Data is raw HTML (iframes, etc.).
                                CommonMark allows raw HTML blocks,
                                so append as-is.

Items are sorted by Order before joining with blank-line separators.
markdownlint (run on gitsheets serialize) will normalize any drift.

Drops the unused `Body` field from RawBlogPostSchema (laddr never
returned it anyway). Adds `items` + a new RawBlogPostItem type.

Tests: 5 new translator cases (interleaved item classes, order
sorting, missing items → empty body, unknown class warning + skip,
empty items array). Updated orchestrator mock to seed items and to
expect the `?include=*` query string.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@themightychris themightychris merged commit 6ff4f58 into main May 30, 2026
1 check passed
@themightychris themightychris deleted the fix/blog-body-import branch May 30, 2026 17:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

importer: scrape blog-post bodies from laddr's HTML (JSON endpoint doesn't surface Body)

1 participant