fix(fetcher): harden against SSRF and oversized downloads (closes #201)#205
Merged
Conversation
Two defense-in-depth fixes from the security review (#201): SSRF: parseReleasePoints' link regex accepted an optional absolute-URL prefix `(?:https?://[^"/]+)?`, so an href to any host on a compromised/MitM'd OLRC page could become a uslmUrl that fetchXml then fetched verbatim. Drop the absolute-URL alternative — the parser now matches only OLRC server-relative paths, so every uslmUrl is anchored to https://uscode.house.gov. Anti-ReDoS anchoring preserved. Response size cap: fetchXml read the body with no limit, risking runner OOM on a hostile/oversized response. Add MAX_DOWNLOAD_BYTES (300 MiB — generous vs the ~100-150 MB all-titles archive) and a Content-Length pre-check (exceedsContentLengthLimit) plus a post-read length backstop, applied to the archive download and the release-point page reads. New tests: foreign-host href never yields an off-host uslmUrl; a Content-Length over the cap yields an error Result. fetcher: 127 tests pass; monorepo builds. Closes #201 Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #201 (security review findings).
SSRF — restrict release-point links to the OLRC host
parseReleasePoints' link regex accepted an optional absolute-URL prefix:so an
hrefto any host (on a compromised/MitM'd OLRC page) could become auslmUrlthatfetchXmlthen fetched verbatim — the only gate being a valid ZIP signature (attacker-suppliable). Fix: drop the absolute-URL alternative; the parser now matches only OLRC server-relative paths, so everyuslmUrlis anchored tohttps://uscode.house.gov. The anti-ReDoS anchoring ([^"/]) is preserved.Response-size cap — avoid runner OOM
fetchXmlread the body with no limit. Fix: addMAX_DOWNLOAD_BYTES(300 MiB — generous vs the ~100–150 MB all-titles archive) andexceedsContentLengthLimit(response); aContent-Lengthover the cap returns an error Result up-front, with a post-readbuffer.lengthbackstop for missing/unparseable headers. Applied to the archive download and the release-point page reads.Tests
href="https://evil.example/.../xml_usc18@1-1.zip"never yields auslmUrlcontainingevil.example(always OLRC-anchored).Content-Lengthover the cap yields an error Result.Verification:
pnpm --filter @civic-source/fetcher test→ 127 passed;pnpm build→ 8/8.🤖 Generated with Claude Code