Skip to content

Widen the savename self-test and cover Content-Disposition end to end#477

Open
xroche wants to merge 2 commits into
masterfrom
phase1-savename-tests
Open

Widen the savename self-test and cover Content-Disposition end to end#477
xroche wants to merge 2 commits into
masterfrom
phase1-savename-tests

Conversation

@xroche

@xroche xroche commented Jul 3, 2026

Copy link
Copy Markdown
Owner

Follow-up to #476 (audit P1-4). The savename type-probe is down to one implementation, but its self-test only drove one path: statuscode 200, status 0, default options. -#test=savename now takes key=value knobs plus repeatable prior=adr|fil|sav args that register an already-crawled link. 01_engine-savename.test uses them to pin what a refactor of the extension decision could break: the still-downloading mime variant (deliberately not folded into resolve_extension()), delayed naming on redirects, dedup and collision suffixes, the 8-3 modes, --strip-query dedup, the urlhack negatives, and hostile fils (traversal, control characters, oversized names, all exercised under the ASan CI job).

The #476 review also noted that no end-to-end test ever sends Content-Disposition. A new cdispo/ endpoint in local-server.py and 32_local-cdispo.test cover it: the attachment filename names the saved file, and a traversal filename is cut down to its last component inside the mirror.

xroche and others added 2 commits July 3, 2026 09:00
… to end

-#test=savename grows key=value knobs (cdispo=, statuscode=, status=, adr=,
strip=, urlhack= and its negatives, n83=, type=) plus repeatable
prior=adr|fil|sav rows that register an already-crawled link, so the .test
can pin the still-downloading mime path, redirect delayed naming, dedup and
collision suffixing, 8-3 modes, --strip-query dedup and hostile fils
(traversal, control chars, oversized names) - the regression net for the
upcoming resolve_extension work.

A new cdispo/ endpoint in local-server.py and 32_local-cdispo.test give the
Content-Disposition branches their first end-to-end coverage, including a
traversal filename reduced to its last component.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
…tives

The review found the e2e traversal row masked by two independent layers
(the wire parser and url_savename both strip path components), so a new
-#test=header self-test pins treathead's Content-Disposition parse alone.
Three negative rows keep dedup honest: a kept query key that differs, a
distinct URL under urlhack, and a same-basename-different-directory prior
must all produce a fresh name, not a false match. route_cdispo now reuses
send_raw via an extra_headers argument.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Signed-off-by: Xavier Roche <roche@httrack.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant