Skip to content

fix(fetcher): harden against SSRF and oversized downloads (closes #201)#205

Merged
williamzujkowski merged 1 commit into
mainfrom
fix/fetcher-hardening
Jun 23, 2026
Merged

fix(fetcher): harden against SSRF and oversized downloads (closes #201)#205
williamzujkowski merged 1 commit into
mainfrom
fix/fetcher-hardening

Conversation

@williamzujkowski

Copy link
Copy Markdown
Collaborator

Closes #201 (security review findings).

SSRF — restrict release-point links to the OLRC host

parseReleasePoints' link regex accepted an optional absolute-URL prefix:

/href="((?:https?:\/\/[^"/]+)?\/download\/releasepoints\/...

so an href to any host (on a compromised/MitM'd OLRC page) could become a uslmUrl that fetchXml then fetched verbatim — the only gate being a valid ZIP signature (attacker-suppliable). Fix: drop the absolute-URL alternative; the parser now matches only OLRC server-relative paths, so every uslmUrl is anchored to https://uscode.house.gov. The anti-ReDoS anchoring ([^"/]) is preserved.

Response-size cap — avoid runner OOM

fetchXml read the body with no limit. Fix: add MAX_DOWNLOAD_BYTES (300 MiB — generous vs the ~100–150 MB all-titles archive) and exceedsContentLengthLimit(response); a Content-Length over the cap returns an error Result up-front, with a post-read buffer.length backstop for missing/unparseable headers. Applied to the archive download and the release-point page reads.

Tests

  • foreign-host href="https://evil.example/.../xml_usc18@1-1.zip" never yields a uslmUrl containing evil.example (always OLRC-anchored).
  • a mocked response with Content-Length over the cap yields an error Result.

Verification: pnpm --filter @civic-source/fetcher test127 passed; pnpm build8/8.

🤖 Generated with Claude Code

Two defense-in-depth fixes from the security review (#201):

SSRF: parseReleasePoints' link regex accepted an optional absolute-URL
prefix `(?:https?://[^"/]+)?`, so an href to any host on a
compromised/MitM'd OLRC page could become a uslmUrl that fetchXml then
fetched verbatim. Drop the absolute-URL alternative — the parser now
matches only OLRC server-relative paths, so every uslmUrl is anchored to
https://uscode.house.gov. Anti-ReDoS anchoring preserved.

Response size cap: fetchXml read the body with no limit, risking runner
OOM on a hostile/oversized response. Add MAX_DOWNLOAD_BYTES (300 MiB —
generous vs the ~100-150 MB all-titles archive) and a Content-Length
pre-check (exceedsContentLengthLimit) plus a post-read length backstop,
applied to the archive download and the release-point page reads.

New tests: foreign-host href never yields an off-host uslmUrl; a
Content-Length over the cap yields an error Result. fetcher: 127 tests
pass; monorepo builds.

Closes #201

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@williamzujkowski williamzujkowski requested a review from a team as a code owner June 23, 2026 03:52
@williamzujkowski williamzujkowski merged commit f4fbd50 into main Jun 23, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

security: harden OlrcFetcher — restrict fetch host (SSRF) and cap response size

1 participant