Skip to content

Fix MAS Data-Not-Collected misclassification (fixes #1, pending live verification)#5

Draft
adamXbot wants to merge 1 commit into
mainfrom
fix/mas-data-not-collected
Draft

Fix MAS Data-Not-Collected misclassification (fixes #1, pending live verification)#5
adamXbot wants to merge 1 commit into
mainfrom
fix/mas-data-not-collected

Conversation

@adamXbot

Copy link
Copy Markdown
Collaborator

Fixes #1.

The bug

A Mac App Store app that declared "Data Not Collected" was rendered as "developer has not provided any details" — the opposite state. Apps with genuinely-populated labels rendered correctly, so the regression was isolated to the sparse not-collected payload.

Root cause

In AppStorePrivacyLabelFetcher.parse(html:), a positive .provided result was only returned when one of three rigid structured-JSON paths in privacyTypeItems(in:) matched. A sparse "Data Not Collected" payload those paths miss fell through to a whole-page regex (hasNoDetailsCopy) that matches the phrase "No Details Provided" — which also appears in App Store page chrome — yielding a false .noDetailsProvided. There was no positive path for the "Data Not Collected" declaration at all.

The fix (additive — the working happy path can't regress)

New ordering in parse(html:):

structured items → deep JSON sweeppositive "data not collected" texthasNoDetailsCopyparseFailure

  1. privacyTypeItemsDeep(in:) — a graph-wide fallback that recursively walks the decoded JSON for any object whose identifier is one of the four canonical privacy-type ids (sourced from PrivacyLabels.TypeIdentifier.allCases) and that carries an item-shaped key (title/detail/categories/purposes), so a bare enum/schema listing can't be mistaken for a real declaration. Deduped by identifier. The structured check becomes privacyTypeItems(in:) ?? privacyTypeItemsDeep(in:).

  2. hasDataNotCollectedCopy(html:) — a positive text fallback, checked before hasNoDetailsCopy, matching the specific phrase "does not collect any data from this app" — specific enough to avoid colliding with page chrome, unlike the terse "Data Not Collected" heading.

No UI change is needed — the Dashboard already renders a lone DATA_NOT_COLLECTED label correctly once the fetcher returns it.

Tests

New Tests/privacycommandCoreTests/AppStorePrivacyLabelFetcherTests.swift drives the pure parse(html:) against canned HTML, covering all four outcomes:

  • structured labels → .provided
  • not-collected via deep JSON (item nested under an unrecognised shelf) → .provided + isExplicitlyNotCollected
  • not-collected via text fallback (empty JSON + disclaimer chrome) → .provided + isExplicitlyNotCollected
  • not-collected wins over boilerplate — chrome contains both the "does not collect any data" line and "No Details Provided" → still .provided + isExplicitlyNotCollected (this is the exact reported failure)
  • genuine no-details → throws .noDetailsProvided
  • nothing recognisable → throws .parseFailure

Verification

  • swift build
  • DEVELOPER_DIR=/Applications/Xcode.app/Contents/Developer swift test ✅ — all 48 tests pass (6 new), run from the privacycommand/ package dir (CLT lacks XCTest, so the full Xcode toolchain is required).

⚠️ Why this is a DRAFT — verification gap

These tests use synthetic HTML mirroring Apple's documented JSON shapes. The live 2026 App Store JSON shape for a real "Data Not Collected" app is NOT pinned by any fixture in this PR. The deep-JSON and text fallbacks are designed defensively, but they're validated against shapes I constructed, not against Apple's current bytes.

Before merging / closing #1, please capture a real page for a known not-collected MAS app and add a fixture test from the actual bytes:

curl -A "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_0) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/17.0 Safari/605.1.15" \
  "<apps.apple.com app URL>" > page.html

Then feed those bytes through AppStorePrivacyLabelFetcher.parse(html:) in a new test asserting .provided + isExplicitlyNotCollected. That closes the loop between "matches the shapes I expect" and "matches what Apple actually ships."

🤖 Generated with Claude Code

A Mac App Store app that declared "Data Not Collected" was rendered as
"developer has not provided any details" — the opposite state.

Root cause: in AppStorePrivacyLabelFetcher.parse(html:), a positive
".provided" result was only returned when one of three rigid
structured-JSON paths in privacyTypeItems(in:) matched. A sparse
"Data Not Collected" payload those paths miss fell through to a
whole-page regex (hasNoDetailsCopy) that matches "No Details Provided"
— a phrase that also appears in App Store page chrome — yielding a
false .noDetailsProvided. There was no positive path for the
"Data Not Collected" declaration.

Fix (additive, so the working happy path can't regress). New ordering
in parse(html:): structured items -> deep JSON sweep -> positive
"data not collected" text -> hasNoDetailsCopy -> parseFailure.

  1. privacyTypeItemsDeep(in:): a graph-wide fallback that recursively
     walks the decoded JSON for any object whose `identifier` is one of
     the four canonical privacy-type ids AND that carries an item-shaped
     key (title/detail/categories/purposes), so a bare enum/schema
     listing can't be mistaken for a declaration. Deduped by identifier.

  2. hasDataNotCollectedCopy(html:): a positive text fallback, checked
     BEFORE hasNoDetailsCopy, matching the specific phrase "does not
     collect any data from this app" — specific enough to avoid
     colliding with page chrome, unlike the terse "Data Not Collected"
     heading.

No UI change needed; the Dashboard already renders a lone
DATA_NOT_COLLECTED label correctly once the fetcher returns it.

Tests: new AppStorePrivacyLabelFetcherTests covering structured labels,
deep-JSON not-collected, text-fallback not-collected, not-collected
winning over "No Details Provided" boilerplate (the exact reported
failure), genuine no-details, and unknown-layout parse failure.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@adamXbot adamXbot force-pushed the fix/mas-data-not-collected branch from 13b014e to f974354 Compare June 14, 2026 05:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug]: Privacy labels from MAS don't show correctly

1 participant