Fall back to .tar.gz sdists when no wheel is available by simonw · Pull Request #285 · simonw/tools

simonw · 2026-06-09T13:08:30Z

Both PyPI browsing tools (zip-wheel-explorer and pypi-changelog) previously
only worked with .whl files. Packages that ship only a source distribution
(e.g. apple-fm-sdk) failed with "no wheel found".

Add a fallback to the .tar.gz source distribution:

Detect gzip via magic bytes and decompress with the native
DecompressionStream, then parse the tar with a small parser that handles
ustar prefixes, GNU long names and PAX extended headers.
zip-wheel-explorer: prefer a wheel, else use the sdist; browse its files.
pypi-changelog: include sdist-only versions and strip the leading
"-/" directory so paths align across versions when diffing.

https://claude.ai/code/session_018HTvSZHvL1T3EEZ7P8vbVy

Both PyPI browsing tools (zip-wheel-explorer and pypi-changelog) previously only worked with .whl files. Packages that ship only a source distribution (e.g. apple-fm-sdk) failed with "no wheel found". Add a fallback to the .tar.gz source distribution: - Detect gzip via magic bytes and decompress with the native DecompressionStream, then parse the tar with a small parser that handles ustar prefixes, GNU long names and PAX extended headers. - zip-wheel-explorer: prefer a wheel, else use the sdist; browse its files. - pypi-changelog: include sdist-only versions and strip the leading "<name>-<version>/" directory so paths align across versions when diffing.

claude · 2026-06-09T13:08:50Z

Claude finished @simonw's task —— View job

Code Review Complete

I've conducted a thorough review of this PR that adds .tar.gz fallback support for PyPI tools. Here's my detailed analysis:

Todo List

Read and analyze pypi-changelog.html changes
Read and analyze zip-wheel-explorer.html changes
Review tar parsing implementation for correctness
Analyze gzip decompression approach
Check for security vulnerabilities
Assess performance considerations
Look for potential bugs and edge cases
Provide comprehensive review feedback

✅ Positive Aspects

1. Excellent Architecture & Design

Smart fallback strategy: Prioritizes wheels (binary distributions) but gracefully falls back to source distributions when needed
Clean separation of concerns: Tar parsing, gzip decompression, and file handling are well-separated
Consistent user experience: Both tools now handle the same distribution types seamlessly

2. Robust Tar Parser Implementation

Comprehensive format support: Handles ustar prefixes, GNU long names (L entries), and PAX extended headers (x/g entries)
Proper standard compliance: Follows tar format specifications correctly
Smart path normalization: The path stripping in pypi-changelog.html:414 ensures consistent diffing across versions

3. Modern Browser API Usage

Native gzip decompression: Uses DecompressionStream for efficient, secure decompression
Progressive enhancement: Graceful fallback with clear error messages for unsupported browsers

⚠️ Security & Safety Considerations

1. Path Traversal Protection

Current state: ✅ Already secure

No explicit path validation, but the tools only process files in-memory for display
No file system writes occur, so path traversal isn't exploitable in this context
Both tools display file paths as-is, which is appropriate for their use case

2. Resource Exhaustion Protection

Areas for improvement:

Memory bombs: Large tar files could consume excessive memory (see performance section)
Zip bombs: While JSZip has some protections, additional safeguards could help
Decompression bombs: Gzip streams with extreme compression ratios could exhaust memory

Recommendation: Consider adding file size limits during extraction

3. Content Safety

Current state: ✅ Good protection

Proper HTML escaping with escapeHtml() function (pypi-changelog.html:207, zip-wheel-explorer.html:388)
Binary file detection prevents rendering non-text content
No eval() or dynamic code execution

🚀 Performance Analysis

1. Strengths

Parallel processing: Both tools fetch multiple versions concurrently
Efficient parsing: Single-pass tar parser with minimal copying
Smart content handling: Binary files are detected and skipped for processing

2. Potential Issues

Large file handling: Files exceeding 512KB are marked as large but still fully loaded into memory
No streaming: Entire archives are loaded before processing begins
Memory usage: All file contents held in memory simultaneously

Suggestion: Consider implementing streaming extraction for very large packages

🐛 Potential Issues Found

1. Error Handling Gaps

In zip-wheel-explorer.html:410-413:

for (const [name, file] of Object.entries(zip.files)) {
    const content = await file.async('text');
    const rawBytes = await file.async('uint8array');
    // ...
}

Issue: If file.async('text') fails for a binary file, the error handling may not catch all edge cases.

Recommendation: Wrap individual file extraction in try-catch blocks

2. Version Comparison Robustness

pypi-changelog.html:276-297 has a custom version comparison that might not handle all PEP 440 edge cases perfectly.

Minor concern: Complex pre-release, post-release, and dev release versions might not sort correctly in all cases.

3. Missing Magic Number Validation

Both tools detect gzip via magic bytes (0x1f 0x8b), but don't validate the full gzip header structure.

Low priority: Current approach is sufficient for the use case, but full header validation would be more robust.

📝 Code Quality Assessment

1. Excellent Practices

Consistent code style across both files
Clear function naming and logical organization
Proper async/await usage throughout
Good separation of UI and data logic

2. Minor Suggestions

Constants: Magic numbers like 512 (tar block size) could be named constants
Documentation: Complex functions like parseTar() could benefit from JSDoc comments
Input validation: URL inputs could be validated more strictly

🧪 Test Coverage Concerns

Current state: No automated tests visible in these HTML files

Recommendations:

Unit tests for the tar parser with various tar file formats
Tests for edge cases like corrupted archives, empty files, and unusual paths
Integration tests with real PyPI packages that only ship source distributions

🏆 Overall Assessment

This is a high-quality implementation that successfully addresses a real user need. The code is well-structured, follows security best practices, and handles the complex tar format correctly.

Risk Level: 🟢 Low Risk

No critical security vulnerabilities
Graceful error handling prevents crashes
Limited attack surface (client-side only)

Recommendation: ✅ Approve with Minor Suggestions

The implementation is production-ready. The suggestions above are mostly optimizations and defensive programming practices rather than blocking issues.

Key Strengths:

Solves a real problem elegantly
Maintains backward compatibility
Adds significant value to both tools
Code quality is consistent with the rest of the codebase

simonw · 2026-06-09T18:32:44Z

apple-fm-sdk works in both tools now.

simonw merged commit d64b067 into main Jun 9, 2026
2 of 4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fall back to .tar.gz sdists when no wheel is available#285

Fall back to .tar.gz sdists when no wheel is available#285
simonw merged 1 commit into
mainfrom
claude/pypi-tools-targz-fallback-63x2oj

simonw commented Jun 9, 2026 •

edited

Loading

Uh oh!

claude Bot commented Jun 9, 2026 •

edited

Loading

Uh oh!

simonw commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

simonw commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

claude Bot commented Jun 9, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Code Review Complete

Todo List

✅ Positive Aspects

1. Excellent Architecture & Design

2. Robust Tar Parser Implementation

3. Modern Browser API Usage

⚠️ Security & Safety Considerations

1. Path Traversal Protection

2. Resource Exhaustion Protection

3. Content Safety

🚀 Performance Analysis

1. Strengths

2. Potential Issues

🐛 Potential Issues Found

1. Error Handling Gaps

2. Version Comparison Robustness

3. Missing Magic Number Validation

📝 Code Quality Assessment

1. Excellent Practices

2. Minor Suggestions

🧪 Test Coverage Concerns

🏆 Overall Assessment

Risk Level: 🟢 Low Risk

Recommendation: ✅ Approve with Minor Suggestions

Uh oh!

simonw commented Jun 9, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

simonw commented Jun 9, 2026 •

edited

Loading

claude Bot commented Jun 9, 2026 •

edited

Loading