Skip to content

feat(substitution): translateUniversalMacros helper + parser fix (helper 1/2: code)#2263

Open
BaiyuScope3 wants to merge 13 commits into
adcontextprotocol:mainfrom
BaiyuScope3:BaiyuScope3/universal-macro-substitution
Open

feat(substitution): translateUniversalMacros helper + parser fix (helper 1/2: code)#2263
BaiyuScope3 wants to merge 13 commits into
adcontextprotocol:mainfrom
BaiyuScope3:BaiyuScope3/universal-macro-substitution

Conversation

@BaiyuScope3

@BaiyuScope3 BaiyuScope3 commented Jun 19, 2026

Copy link
Copy Markdown
Collaborator

Seller-side universal-macro translation, plus the parser fix it surfaced.

translateUniversalMacros(pixel_url, mapping) (seller helper). Translates universal macros in a pixel URL's query-parameter values from a seller mapping. Two arms: value (literal, RFC-3986 percent-encoded via the shared encoder) and native (ad-server token, inserted raw). Parameters with an unmapped macro are dropped (reported in dropped_params / unmapped_macros); already-minted params pass through untouched; macros in key position are not translated. The result also reports suspect_native_valuesvalue entries shaped like a native token (%%…%%, {{…}}, ${…}, […]), a likely wrong-arm mapping. Docstring calls out the dropped-consent-macro privacy hazard.

Parser fix. SubstitutionObserver's residual-entity check no longer drops legitimate multi-parameter tracker URLs (the named-entity branch now requires a trailing ;, matching browser decoding) while preserving the scheme-smuggling defense. Security-reviewed; all hardening/SSRF tests pass.

undici 6.25.0 → 6.27.0 (within ^6.25.0) to clear the high-severity advisories flagged by the Security Audit lane.

Full lib suite green. Changeset included (minor).


Removed per review: the expect_universal_macro_substituted storyboard assertion and its runner wiring have been pulled from this PR. As @bokelley noted, it observes a build_creative preview, but {MEDIA_BUY_ID}/{PACKAGE_ID} resolve at serve time and aren't serialized on the wire, so it skip-dominates for conformant sellers. It's deferred to the forthcoming Live Integration (decisioning-roundtrip) check, where the alignment logic is reusable. This PR is now helper + parser fix only.

🤖 Generated with Claude Code

BaiyuScope3 and others added 10 commits June 19, 2026 14:58
Introduce StoryboardStep fields (source, source_path, macro_template,
macro_bindings) and the no_preview_surface detailed skip reason to
support the new pseudo-task. The handler reads rendered HTML from a
prior step's parsed response body via a JSON Pointer, runs
SubstitutionObserver.parse_html + match_bindings, and asserts each
configured macro binding was substituted with the correct context value.
Absence of a preview surface is a neutral skip (not_applicable), not a
conformance failure.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…nner dispatch

Import and dispatch the new handler after the webhook-assertion branch in
executeStep. Export UNIVERSAL_MACRO_ASSERTION_TASKS and
executeUniversalMacroAssertionStep from the storyboard index so callers
can identify and invoke the task independently.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…eness check

Add the new task name to the HARNESS_TASKS allowlist so the structural
completeness test does not flag it as a missing-schema task when it
appears in storyboard YAML files.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…-entity check

The residual-entity check used RESIDUAL_ENTITY_RE with an optional
trailing semicolon on the named-entity branch, so a literal query
separator like ?mb=1&pkg=2 (after &amp; decoding) matched on &pkg and
the whole URL was dropped as a smuggled entity. Browsers render &pkg
literally, so this was a false positive.

Require a trailing semicolon for the NAMED branch (browsers only decode
a fixed legacy no-semicolon set, none of which are the dangerous
URL-scheme chars : / \, which need a semicolon: &colon; &sol; &bsol;).
Keep the NUMERIC branch matching with an optional semicolon, since
browsers decode &adcontextprotocol#58 / &#x3a leniently and query keys never start with #.

Export __hasResidualEntity for direct unit testing.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
…ution assertion

Add two tests that exercise the canonical real-world case where both
{MEDIA_BUY_ID} and {PACKAGE_ID} appear in a single tracker URL with
&amp; HTML-encoding: a pass case (both macros correct) and a negative
case (pkg= wrong, mb= correct) verifying per-binding granularity.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… identifier assertion

- resolvePointer now guards against prototype chain traversal using
  POINTER_FORBIDDEN_KEYS and Object.prototype.hasOwnProperty.call, matching
  the discipline established in path.ts
- getLastPriorResult (renamed getPriorResult) now respects step.triggered_by
  when set, falling back to the last Map entry otherwise — mirrors the
  webhook-assertions.ts pattern
- Empty/absent macro_bindings now emits a failing ValidationResult explaining
  the misconfiguration instead of silently grading passed:false with zero validations
- __hasResidualEntity removed from the three package-level barrels
  (observer/index.ts, substitution/index.ts, src/lib/index.ts); tests that
  need it import via the deep path dist/lib/substitution/observer/html-parser.js

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Implements a query-parameter macro translator that accepts a
MacroMapping (native raw or RFC 3986 value) and returns the translated
URL with dropped_params/unmapped_macros metadata.  Reuses encodeUnreserved
from rfc3986.ts so produce and verify share one encoding path.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…esult shape

Reuse the module-level UNIVERSAL_MACRO constant in the replace() call instead
of inlining a duplicate literal; fix the misleading comment to accurately
describe String.replace single-pass semantics; expand TranslateResult doc
comments to note the dropped_params/unmapped_macros asymmetry; add tests for
macro tokens in key position (pass-through) and non-ASCII UTF-8 encoding.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes Jun 19, 2026

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving on the strength of the clean security verdict plus the test coverage. The encoding follow-up below is load-bearing — fix it before this assertion is ever pointed at a macro whose value isn't unreserved-safe.

The substitution invariant holds throughout: value mappings are percent-encoded, native tokens pass raw, observed values are compared verbatim. Witness, not translator.

Things I checked

  • RESIDUAL_ENTITY_RE relaxation is not a smuggling bypass. security-reviewer: clean. None of the 106 legacy no-semicolon named entities decode to a scheme-relevant char (:, /, \); the dangerous ones (&colon;, &sol;, &bsol;, &Tab;) all require the semicolon and still trip the named branch. The one borderline case — &nbsp → U+00A0 — is not stripped by the WHATWG URL parser, so it breaks scheme parsing rather than enabling javascript:. Numeric branch keeps ;? so &#58 still drops. html-parser.ts:291.
  • The regex change is internally consistent with decodeHtmlEntities. Line 299 still decodes known semicolon-less legacy entities from the limited table, so dangerous entities the SDK knows (colon, sol, Tab) decode to the real char and get caught by scheme classification — never reaching the residual check. The residual check only fires now on a complete-but-unknown &name;, which is the right shape to drop.
  • translate.ts is sound. value mappings go through the shared encodeUnreserved (RFC 3986 unreserved whitelist — &, =, :, braces all encoded, no param injection); native inserted raw by design for seller-supplied ad-server tokens. URLSearchParams correctly avoided so %%GDPR%% survives. Single-pass String.replace means a translated value containing {…} is not re-expanded. The shared global UNIVERSAL_MACRO regex is safe — only .match/.replace consume it, no .exec/.test to leave lastIndex dangling.
  • Strict-equality comparison verified against the alignment layer. match.observed_value for query positions is the raw pre-decode wire substring (alignment.ts:240-241, :312-313). For the macros this PR targets — {MEDIA_BUY_ID}, {PACKAGE_ID}, server-minted unreserved-safe IDs — raw equals encoded, so observed === expected is correct. The author's comment at the catalogBindings site documents exactly this assumption.
  • Changeset type. minor matches the surface: two new exports (universal_macro_translation + MacroMapping/TranslateResult), one new runner task, one new skip reason. Additive, no breaking change. Correct.
  • Prototype-pollution guard in resolvePointer (POINTER_FORBIDDEN_KEYS + own-property check) is the right shape for a JSON-pointer walk over an untrusted response body.

Follow-ups (non-blocking — file as issues)

  • Encoding assumption in the assertion is unenforced. universal-macro-assertions.ts builds catalogBindings with expected_encoded: expectedValue (the unencoded context value) and compares match.observed_value === expected strictly. For {MEDIA_BUY_ID}/{PACKAGE_ID} this is fine. But the field shape (macro_bindings: Array<{ macro, context_key }>) and the task name invite binding any universal macro — and a context value carrying a space, &, or non-ASCII char would be percent-encoded on the wire, so the strict compare would false-fail a conformant seller. It fails closed (never a false pass), so it's safe to ship, but before this assertion is pointed at a macro whose value isn't unreserved-safe, compare against an encoded expectation via equalUnderHexCasePolicy (the pattern assertions.ts:30 already uses) rather than assuming byte-identity. code-reviewer flagged this as a blocker; I'm downgrading it because the shipped, documented scope is correct and the failure is fail-closed and out-of-scope.
  • Error-report redaction. Failing-binding messages echo the raw expected/observed business identifiers into validation output. Confirm this honors the same error_report_payload_policy (SHA-256 redaction for non-fixture values) the existing custom-vector path uses, so real IDs don't land in CI logs.
  • Pin the smuggling invariant with a test. The residual defense rests on "no legacy no-semicolon entity decodes to a scheme char" — true today, enforced only by a comment. security-reviewer (Low): add a regression test (javascript&colonalert, javascript&nbsp:alert, javascript&#58alert) so a future NAMED_ENTITIES expansion can't silently reopen it.

Minor nits (non-blocking)

  1. noPreviewSurfaceResult sets error: detail on a passed step. universal-macro-assertions.ts returns passed: true, skipped: true alongside a populated error field. Reads as a failure to anything that keys off error being non-null. The webhook-assertion skip path is the convention to match here — worth confirming it leaves error unset on neutral skips.

Approved.

Resolves the high-severity undici advisories (GHSA-p88m-4jfj-68fv,
GHSA-vxpw-j846-p89q, GHSA-35p6-xmwp-9g52, GHSA-g8m3-5g58-fq7m) flagged by
the Security Audit CI lane. 6.27.0 is within the existing ^6.25.0 range.
npm audit --audit-level=high now passes.
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes Jun 19, 2026

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approving. The residual-entity regex change is the load-bearing part, and it tightens a false-positive without widening the scheme-smuggling surface — fail-closed stays fail-closed.

Things I checked

  • RESIDUAL_ENTITY_RE change is security-sound (html-parser.ts:291). Requiring a trailing ; on the NAMED branch while keeping the numeric branch's ;? optional. security-reviewer verified against the WHATWG frozen legacy/no-semicolon named set: every scheme-breakout char (:, /, \, tab/newline) requires a semicolon-terminated ref (&colon;, &sol;, &bsol;, &Tab;), so the named branch still trips on all of them; the legacy no-semicolon entities decode only to &/</>/Latin-1, none usable to reconstruct javascript:. Layered smuggle (javascript&amp;#58;x) still caught — one decode pass leaves &#58;, which the unchanged numeric branch drops. The new tests at substitution-hardening.test.js:759 (keep &amp; multi-param) and :780 (&#58 numeric smuggle still dropped) pin both directions.
  • translate.ts has no injection path. value entries go through encodeUnreserved (RFC 3986 unreserved whitelist), so &/=/:/?/#/{/} are all percent-encoded — a value can't break its query slot or inject a param (a&b=ca%26b%3Dc, translate.ts:280). native entries are raw by design for ad-server tokens, and are seller-self-supplied, not cross-tenant input. Fragment split off first and never touched; single-pass replace, no nested re-expansion.
  • resolvePointer prototype-pollution guards (universal-macro-assertions.ts:461). Read-only traversal, own-property hasOwnProperty check plus POINTER_FORBIDDEN_KEYS deny-list. Belt-and-suspenders, sufficient.
  • passed gate is non-vacuous (universal-macro-assertions.ts:711): validations.length > 0 && every(...). Empty macro_bindings and missing context keys both fail-fast before observation runs.
  • Witness-not-translator holds. universal_macro_translation is a seller-side producer minting its own outgoing pixel URL — not a path that re-shapes a buyer-facing response. The assertion handler is purely observational. No fabricated fields.
  • Skip taxonomy is additive (types.ts): no_preview_surfacenot_applicable mirrors force_scenario_unsupported / fixture_seed_unsupported exactly. New detailed reason onto an existing canonical value, no enum break.
  • Changeset present and correctly typed. .changeset/universal-macro-substitution.md, minor — right for net-new additive exports (universal_macro_translation, MacroMapping, TranslateResult). package.json version not hand-edited (still 9.0.0); the package-lock.json 9.0.0-beta.319.0.0 delta is a lockfile resync against the already-released package.json, and the undici 6.25.0 → 6.27.0 bump is within the existing ^6.25.0 range.

Follow-ups (non-blocking — file as issues)

  • Unmapped-macro semantics: drop-param vs empty-string. ad-tech-protocol-expert flagged this as the one genuine policy choice in the helper. Dropping the whole param is the safe choice for a tracking pixel, but VAST 4.x and several OpenRTB macro conventions resolve unknown macros to empty string. Confirm adcontextprotocol/adcp media-buy-identifier-substitution-check actually specifies drop-param — if the spec is silent or says empty-string, this is SDK opinion baked ahead of the spec. Per CLAUDE.md triage order (spec → mock → SDK), worth a one-line spec confirmation before the next consumer leans on the behavior.
  • Confirm the cited storyboard exists upstream. The string media-buy-identifier-substitution-check appears nowhere in this repo's compliance/cache. If the driving storyboard isn't merged on adcontextprotocol/adcp yet, the assertion ships ahead of the contract it grades — fine to land, but note it.

Minor nits (non-blocking)

  1. snake_case free-function export. universal_macro_translation (translate.ts:223) breaks the module convention — every other free function out of substitution/index.ts is camelCase (encodeUnreserved, extractTrackerUrls, matchBindings); only protocol-mirroring methods on the observer use snake_case (parse_html). Cheaper to rename to translateUniversalMacros now than after it ships in a minor and renaming becomes a major.
  2. Value-only scope undocumented. Macros in key position pass through untranslated (translate.ts:244), which the key-position test confirms is intended, but the module docstring says "translates universal macros in a pixel URL's query parameters" without scoping to values. One line would close the gap.
  3. observed === expected is raw-vs-encoded (universal-macro-assertions.ts:636, :686). Correct only because business identifiers (alphanumeric + hyphen) encode to themselves — the inline comment acknowledges it. Load-bearing assumption; would silently false-fail if a bound context_key ever carried reserved chars. Leave as-is given scope, but the helper doesn't enforce the identifier shape it assumes.
  4. async with no await (universal-macro-assertions.ts:550). The observer calls are synchronous; the keyword is cosmetic for the dispatcher contract. Harmless.

Note for the maintainers: the deferred defense-in-depth (also dropping the legacy no-semicolon entity set) is correctly deferred — adding it would re-introduce the &reg= / &copy= query-param false positives this PR just fixed. The security review confirms the current boundary is sound without it.

LGTM. Follow-ups noted below.

@bokelley

Copy link
Copy Markdown
Contributor

Reviewed against the schemas alongside adcontextprotocol/adcp#5646. The helper (translate.ts) is good — LGTM on that part. The value/native discriminated union, single-pass no-re-expansion (a {MACRO} inside a substituted value is encoded as data, not re-expanded), the RFC-3986 unreserved encoder applied to the query value only, and the return-the-report contract are all right. The SubstitutionObserver parser fix holds up too — verified the loosened residual-entity rule still blocks scheme smuggling (no legacy HTML entity decodes to :///\ without a trailing ;), and the undici bump is reasonable currency.

Two helper polish items (optional, non-blocking):

  • A coding agent can silently pick the wrong arm — { value: '%%CACHEBUSTER%%' } would percent-encode an ad-server token and break it. Cheap guard: when a value matches a native-token shape (%%…%%, {{…}}, ${…}, […]), surface it in the report. A legitimate value never has that shape.
  • Drop-unmapped is the right default, but worth documenting (or a dropped_privacy_params signal) that silently dropping a consent macro on a forgotten mapping is a compliance hazard — callers should inspect unmapped_macros.

On the assertion (expect_universal_macro_substituted): suggest pulling/deferring it along with the storyboard in #5646. Short version (full reasoning over there): it reads a build_creative preview to verify {MEDIA_BUY_ID}/{PACKAGE_ID} substitution, but that's the wrong surface — those macros resolve at serve time, not build_creative, and nothing serializes the resolved tracker on the wire, so the check skip-dominates for conformant sellers. The substitution we actually want to certify is a decisioning roundtrip — a Live Integration check (RFC incoming). The good news: the alignment logic isn't wasted — it's the inverse of what the collector side of that check needs (matching macros in a received fired pixel vs. a fetched preview), so it can be repurposed there.

Fixture: to stop JS↔Python encoder drift (Python helper filed at adcontextprotocol/adcp-client-python#956), let's pin both SDKs to a language-neutral test-vector file — I have a draft vector set covering the reserved-char / native-raw / drop / no-re-expansion cases. Want me to PR it here or into the adcp repo under static/test-vectors/?

@bokelley bokelley left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Two inline suggestions on the translate.ts polish items below. Top-level reply to the rest in a moment.


Generated by Claude Code

Comment thread src/lib/substitution/translate.ts
Comment thread src/lib/substitution/translate.ts
@bokelley

Copy link
Copy Markdown
Contributor

Thanks for the detailed pass, @bokelley. Left two inline suggestions on translate.ts for the helper polish items — the compliance-note doc addition is a one-click suggestion; the native-token-shape guard is left as a sketch since it adds a field to TranslateResult and that's @BaiyuScope3's call on shape/naming, not mine to force via auto-suggestion. (Note: I don't have push access to BaiyuScope3/universal-macro-substitution — it's a fork — so I can't land either as a direct commit; suggestions are the closest I can do.)

On deferring expect_universal_macro_substituted: the critique holds up against the code — executeUniversalMacroAssertionStep resolves source_path against the prior step's parsed response (e.g. a build_creative preview), and if {MEDIA_BUY_ID}/{PACKAGE_ID} genuinely don't resolve until serve time, this step will skip-dominate (no_preview_surface) for conformant sellers rather than catch anything. That's a real scope question though — pulling 326 lines (the assertion + runner wiring + its test file) out of a green PR — and the full reasoning lives in adcp#5646, which is outside this session's repo scope to verify firsthand. I'll leave that call to you and @BaiyuScope3 rather than act on it unilaterally from here.

On the fixture: leaning toward the spec repo (adcontextprotocol/adcp under static/test-vectors/) rather than this repo — the explicit goal is JS↔Python parity, and a protocol-neutral location means neither SDK takes a dependency on the other's release cadence to consume it. But that's your call to drive since you're the one offering to author it.


Triaged by Claude Code. Session: https://claude.ai/code/session_01LgzNUkvXwbTxpVo4pm864n


Generated by Claude Code

Per maintainer review (adcontextprotocol#2263), pull the expect_universal_macro_substituted
assertion and its runner wiring: it observes a build_creative preview, but
{MEDIA_BUY_ID}/{PACKAGE_ID} resolve at serve time and aren't serialized on the
wire, so the check skip-dominates for conformant sellers. Deferred to the
forthcoming Live Integration (decisioning-roundtrip) check, where the alignment
logic is reusable.

Keep + polish the helper:
- rename universal_macro_translation -> translateUniversalMacros (camelCase, to
  match the substitution module's free-function convention)
- add suspect_native_values to flag value entries shaped like native ad-server
  tokens (%%..%%, {{..}}, ${..}, [..]) that were likely mapped to the wrong arm
- document the value-only scope and the dropped-consent-macro privacy hazard

The SubstitutionObserver residual-entity parser fix and undici bump are retained.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
@BaiyuScope3 BaiyuScope3 changed the title feat(substitution): universal-macro substitution assertion + translation helper feat(substitution): translateUniversalMacros helper + tracker-URL parser fix Jun 22, 2026
@BaiyuScope3

Copy link
Copy Markdown
Collaborator Author

Thanks @bokelley — acted on all of it:

  • Pulled expect_universal_macro_substituted (assertion + runner wiring + its test, ~800 lines removed). Your serve-time critique holds — observing a build_creative preview can't certify substitution that resolves on the wire. Deferring to the Live Integration / decisioning-roundtrip check; the macro-alignment logic carries over to the collector side there. The driving storyboard in feat(compliance): macro-identifier substitution storyboard (parked: Live Integration RFC) adcp#5646 is parked accordingly (now docs-only split out separately).
  • Renamed universal_macro_translationtranslateUniversalMacros (camelCase, before it ships in a minor).
  • Native-token guard: added suspect_native_values to TranslateResult — flags any value entry shaped like %%…%% / {{…}} / ${…} / […] so an agent that picks the wrong arm is surfaced rather than silently percent-encoding a token.
  • Privacy note: docstring now calls out that an unmapped (e.g. consent) macro is dropped, and tells callers to inspect unmapped_macros.
  • Documented the value-only scope.

On the test-vector fixture for JS↔Python parity: yes please, and adcontextprotocol/adcp under static/test-vectors/ is the right home — protocol-neutral, no cross-SDK release coupling. Happy to consume it here once it lands.

BaiyuScope3 added a commit to adcontextprotocol/adcp that referenced this pull request Jun 22, 2026
…red)

Seller-scoped storyboard asserting {MEDIA_BUY_ID}/{PACKAGE_ID} substitution in
creative tracking URLs. Parked pending the Live Integration check — the driving
assertion was pulled from @adcp/sdk (adcontextprotocol/adcp-client#2263) because a
build_creative preview can't certify serve-time substitution.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>
aao-ipr-bot[bot]
aao-ipr-bot Bot previously approved these changes Jun 22, 2026

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean, contained change: a seller-side producer helper plus the parser false-positive it surfaced. The parser fix tightens the named-entity branch to require a trailing ; — fail-closed where it matters (scheme chars), fail-open only where a browser would render the bytes literally anyway.

Things I checked

  • Residual-entity regex relaxation does not reopen scheme smuggling (src/lib/substitution/observer/html-parser.ts:291). The named branch now demands ;; numeric keeps ;?. security-reviewer enumerated the full HTML5 legacy no-semicolon named-reference set (~106 names) — every one decodes to a basic/Latin-1 char, none to : / \ TAB/LF/CR. The scheme-relevant entities (&colon; &sol; &bsol; &Tab; &NewLine;) are all semicolon-mandatory and still tripped. Decode→check ordering is correct: decodeHtmlEntities runs before __hasResidualEntity (:321) at the call site. The author's claim holds.
  • __hasResidualEntity is pre-existing, not a new export (:321) — the test just imports it. No surface change there.
  • value-arm encoding cannot break out of its parameterencodeUnreserved (rfc3986.ts) is a strict RFC 3986 unreserved whitelist, UTF-8-first, uppercase hex, so a value macro cannot inject &/=/://. Tests confirm (a&b=ca%26b%3Dc, cafécaf%C3%A9).
  • No re-expansion of macros inside a substituted value — single-pass String.replace with global regex; the has-{GDPR}-inside test confirms the inner token encodes as data.
  • Textual query parsing edge cases trace clean — first-= split preserves = in values, repeated keys handled independently, key-position macros pass through (matching runs on values only), empty/no-query returns verbatim.
  • Changeset present, minor — correct for additive exports (translateUniversalMacros, MacroMapping, TranslateResult) plus a non-breaking false-positive fix. No manual package.json version edit — package.json isn't in the diff; the lock's 9.0.0 is the GA release merged from main.
  • No witness-violation — the helper translates a caller-supplied mapping over a caller-supplied URL; it does not fabricate or normalize wire responses. native is raw by design, value is encoded.

Follow-ups (non-blocking — file as issues)

  • Consent-macro silent-drop deserves a louder signal. An unmapped {GDPR_CONSENT}/{US_PRIVACY} drops the whole param but still returns a usable url; a caller ignoring unmapped_macros/dropped_params ships a consent-degraded pixel. Thoroughly documented and structurally surfaced, which is the right shape — but a dedicated dropped_consent_macros field would keep the compliance case from being lumped in with benign drops.

Minor nits (non-blocking)

  1. suspect_native_values is mapping-scoped, not URL-scoped. It iterates the entire mapping, so a suspect value entry whose macro never appears in the URL is still reported — asymmetric with unmapped_macros/dropped_params, which are URL-scoped. Defensible as proactive config-lint; worth one doc line so the asymmetry doesn't surprise.
  2. NATIVE_TOKEN_SHAPE [.+] branch over-matches. A legitimate value like [1,2,3] or [redacted] flags as suspect. Advisory-only (output unchanged), low impact, but a foreseeable false positive.
  3. Empty query strips the ?. https://px.example/i?https://px.example/i — a silent normalization the docstring doesn't mention. Harmless for pixels.

Approving on the strength of the enumerated scheme-smuggle proof plus the strict value-arm encoding.

@BaiyuScope3 BaiyuScope3 changed the title feat(substitution): translateUniversalMacros helper + tracker-URL parser fix feat(substitution): translateUniversalMacros helper + parser fix (helper 1/2: code) Jun 22, 2026
- add dropped_consent_macros: surface dropped consent/privacy macros
  ({GDPR_CONSENT}, {US_PRIVACY}, …) separately from benign drops
- tighten NATIVE_TOKEN_SHAPE bracket arm to [UPPER_SNAKE] so ordinary
  bracketed values ([1,2,3], [redacted]) are no longer flagged suspect
- document that suspect_native_values is mapping-scoped (config-lint), and
  that a bare trailing '?' is dropped

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

@aao-ipr-bot aao-ipr-bot Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Clean. The named-entity-branch semicolon requirement is the right shape — it restores legitimate multi-param tracker URLs without re-opening the scheme-smuggling defense, because the dangerous-char axis is closed under HTML5's frozen legacy no-semicolon set.

Things I checked

  • The load-bearing security claim holds. RESIDUAL_ENTITY_RE at html-parser.ts:291 now demands a trailing ; on the NAMED branch. security-reviewer confirmed against the WHATWG reference set: the ~106 legacy no-semicolon named refs decode exclusively to Latin-1 (© ® ° µ Æ ß …) plus the five ASCII basics — none yield :, /, or \. The scheme-breakout chars are reachable only via the ;-terminated forms (&colon;, &sol;, &bsol;), all already in NAMED_ENTITIES. The NUMERIC branch correctly keeps the optional ;, so javascript&#58alert(0) still classifies as javascript: (locked in at substitution-hardening.test.js:405).
  • Single-pass, no re-expansion. value.replace(UNIVERSAL_MACRO, fn) does not re-scan substituted output; the has-{GDPR}-inside test confirms inner braces percent-encode to %7B…%7D rather than re-expanding (translate.ts:349).
  • Fragment-before-query split ordered correctly (# first, then ? on the remainder) — translate.ts:288-305.
  • Witness, not translator. Diagnostics arrays carry macro token names ({GDPR}), never mapped values — a secret mapped into a value entry never echoes into unmapped_macros/suspect_native_values. No fabrication, no silent normalization.
  • Consent-macro drop surfaced, not swallowed. Unmapped {GDPR_CONSENT}/{US_PRIVACY}/{GPP_*} drops land in dropped_consent_macros distinct from benign drops (translate.ts:338-341). Right call — a silently-dropped consent signal is a compliance hazard, and the API forces it to the caller.
  • Changeset present, correctly typed. .changeset/universal-macro-substitution.mdminor, matching three net-new public exports (translateUniversalMacros, MacroMapping, TranslateResult) wired through substitution/index.ts:99-100index.ts:77,85-86 (value + type both present).
  • package-lock 9.0.0-beta.319.0.0 is a lockfile sync to main's already-released package.json, not a hand-edit. package.json is not in the diff. undici 6.25.06.27.0 stays inside the existing ^6.25.0 range.

Follow-ups (non-blocking — file as issues)

  • Decoder/residual-check asymmetry is the entire security rationale. decodeHtmlEntities decodes named entities without a semicolon (html-parser.ts:299), while the residual check now flags them only with one. Safe today — the gap decodes only to Latin-1 — but it is load-bearing: if NAMED_ENTITIES ever gains a non-Latin-1 punctuation mapping, this branch needs a re-review. Worth a comment at the table tying the two together so a future edit trips the right alarm.

Minor nits (non-blocking)

  1. Redundant suspectSeen dedup. translate.ts:279-285Object.entries(mapping) yields unique keys, so the Set guard can never fire. Harmless; droppable.
  2. ?-strip on leading-empty-param inputs. translate.ts:361-362 — a malformed ?&x={FOO} where the only real param drops yields outputParts = [''] → the ? is stripped. Consistent with the documented bare-? behavior; cosmetic, only on malformed input.

LGTM. Follow-ups noted below.

@BaiyuScope3 BaiyuScope3 requested a review from bokelley June 23, 2026 14:52
bokelley pushed a commit to adcontextprotocol/adcp that referenced this pull request Jun 25, 2026
…cs) (#5661)

* docs(creative): document translateUniversalMacros SDK helper

Adds an implementation-guide example under docs/creative/universal-macros showing
sales agents how to translate universal macros via the @adcp/sdk helper
(value vs native arms, dropped/unmapped reporting, suspect_native_values).

Documents the helper added in adcontextprotocol/adcp-client#2263.

Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

* chore: drop changeset (docs-only, non-protocol per repo policy)

* docs(creative): mention dropped_consent_macros and tighten native-token shape note

---------

Co-authored-by: Claude Opus 4.8 <noreply@anthropic.com>
bokelley added a commit to adcontextprotocol/adcp that referenced this pull request Jun 25, 2026
… vectors (#5650)

Pins the universal_macro_translation helper contract (value RFC-3986
unreserved-encoded, native raw, unmapped-macro params dropped, minted
params untouched, single-pass no re-expansion) as language-neutral test
vectors. Both @adcp/sdk and the Python adcp SDK validate against this
file so the two encoders cannot drift — the conformance storyboard only
exercises alphanumeric IDs, so cross-language drift is otherwise invisible.

Refs adcontextprotocol/adcp-client#2263, adcontextprotocol/adcp-client-python#956

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants