Skip to content

fix: uncited-case dedupe collapse + appendix-title path collision (closes #202)#206

Merged
williamzujkowski merged 1 commit into
mainfrom
fix/annotator-transformer-correctness
Jun 23, 2026
Merged

fix: uncited-case dedupe collapse + appendix-title path collision (closes #202)#206
williamzujkowski merged 1 commit into
mainfrom
fix/annotator-transformer-correctness

Conversation

@williamzujkowski

Copy link
Copy Markdown
Collaborator

Closes #202 (two confirmed correctness bugs from code review).

1. Uncited cases collapsed to one

deduplicateCases (packages/annotator/src/citation-utils.ts) keyed on normalizeCitation(citation). CourtListener results frequently lack a structured citation, and normalizeCitation('') === '', so the first uncited case was kept and all other distinct uncited opinions were dropped as duplicates.
Fix: new dedupeKey() — cases with a citation still collapse on the normalized citation (real duplicates still merge); uncited cases fall back to a composite caseName|date|sourceUrl key so distinct opinions survive.

2. Appendix titles collided with the main title

findTitleNumber (packages/transformer/src/parser.ts) only matched /\/t(\d+)$/, so /us/usc/t18a fell through to "18" and Title 18 Appendix sections were written under title-18/, colliding with main Title 18 (the fetcher already preserves the suffix as 18a).
Fix: capture the optional appendix letter (/\/t(\d+[a-zA-Z]?)$/, plus the num fallback) so t18a → "18a" and appendix sections land under title-18a/. Frontmatter usc_title stays numeric (parseInt → 18) per the Zod schema — no schema change.

Tests

  • distinct uncited cases are both retained; same-citation cases still collapse.
  • /us/usc/t18a yields title "18a" and a path under title-18a/, distinct from title-18/.

Verification: annotator 82 pass, transformer 67 pass, pnpm build 8/8.

🤖 Generated with Claude Code

Two correctness bugs from code review (#202):

1. Uncited cases collapsed to one. deduplicateCases keyed on
   normalizeCitation(citation); empty citations all normalize to '',
   so distinct CourtListener opinions lacking a structured citation
   were dropped as duplicates of each other. Add dedupeKey(): cases
   with a citation still collapse on the normalized citation; uncited
   cases fall back to a composite caseName|date|sourceUrl key so
   distinct opinions are retained.

2. Appendix titles collided with the main title. findTitleNumber only
   matched /\/t(\d+)$/, so "/us/usc/t18a" fell through to "18" and
   Title 18 Appendix sections were written under title-18/, colliding
   with main Title 18. Capture the optional appendix letter
   (/\/t(\d+[a-zA-Z]?)$/ and the num fallback) so t18a → "18a" and
   sections land under title-18a/. Frontmatter usc_title stays numeric
   (parseInt → 18) per the Zod schema.

New tests cover both. annotator: 82 pass; transformer: 67 pass;
monorepo builds.

Closes #202

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@williamzujkowski williamzujkowski requested a review from a team as a code owner June 23, 2026 04:02
@williamzujkowski williamzujkowski merged commit c7193dd into main Jun 23, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix: two correctness bugs in annotator/transformer (uncited-case dedupe collapse; appendix-title path collision)

1 participant