Skip to content

fix(importer): catch inline markdown image URLs + alt laddr URL shapes#110

Merged
themightychris merged 1 commit into
mainfrom
fix/blog-media-inline-urls
May 30, 2026
Merged

fix(importer): catch inline markdown image URLs + alt laddr URL shapes#110
themightychris merged 1 commit into
mainfrom
fix/blog-media-inline-urls

Conversation

@themightychris
Copy link
Copy Markdown
Member

Summary

Sandbox smoke check after PR #109 found 25 `codeforphilly.org/...` media URLs still leaking into blog bodies. Two root causes:

  1. Inline markdown image syntax inside Markdown items — authors wrote `alt` directly rather than using the structured Media item path. The Embed-only URL scan never reached them.
  2. Alternate URL shapes the regex didn't accept: `/media/` (no slash), `/media/open/`, `/sitedata/`.

Fix: broaden the regex to the four observed URL shapes, and apply it as a final pass over the assembled body (after items are joined). Catches Markdown-item inline references the per-item path can't reach. The function gets renamed `rewriteEmbedHtml` → `rewriteLaddrMediaUrls` since it now applies beyond embeds.

Test plan

  • 2 new translator cases: inline markdown image rewrite, alt URL shapes
  • 36 import-laddr tests pass; type-check + lint clean

🤖 Generated with Claude Code

After PR #109 landed, a sandbox smoke check found 25 codeforphilly.org
media URLs still leaking through to bodies. Two root causes:

  1. **Inline markdown image syntax inside Markdown items.** Authors
     wrote `![alt](https://codeforphilly.org/thumbnail/...)` directly
     in some posts rather than using the structured Media item path.
     The Embed-only URL scan never reached them.
  2. **Alternate URL shapes** my regex didn't accept:
       /media/<id>         (no trailing slash)
       /media/open/<id>    (legacy "open media" namespace)
       /sitedata/<id>      (older asset namespace)

Fix: rename `rewriteEmbedHtml` → `rewriteLaddrMediaUrls`, broaden the
regex to accept the four observed URL shapes, and apply it as a final
pass over the **assembled body** (after items have been joined). That
catches Markdown-item inline image references the per-item code path
can't reach.

Per-item Embed rewriting still happens — same regex, same dedup map —
because some Embeds reference media that doesn't appear elsewhere
(YouTube thumbs etc.) and we want to capture them too.

Two new tests cover the inline-markdown case and the alt URL shapes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@themightychris themightychris merged commit 2d2253d into main May 30, 2026
1 check passed
@themightychris themightychris deleted the fix/blog-media-inline-urls branch May 30, 2026 18:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant