Skip to content

fix(ci): decode+unzip OLRC archive before transforming in sync (closes #199)#204

Merged
williamzujkowski merged 1 commit into
mainfrom
fix/sync-unzip-before-transform
Jun 23, 2026
Merged

fix(ci): decode+unzip OLRC archive before transforming in sync (closes #199)#204
williamzujkowski merged 1 commit into
mainfrom
fix/sync-unzip-before-transform

Conversation

@williamzujkowski

Copy link
Copy Markdown
Collaborator

Closes #199.

Problem

OlrcFetcher.fetchXml() returns base64-encoded ZIP bytes (comment: "caller will extract XML from the ZIP"), but sync-law.yml's "Transform statutes" step fed that string straight into transformer.transformToFiles(), which expects raw USLM XML. Result: every title failed to parse → sections=0, nothing ever committed. The weekly sync was effectively inert (this was masked until #194 fixed the import resolution that prevented the step from running at all).

Fix

  • New packages/fetcher/src/zip.tsextractXmlFromZip(zip: Buffer): string | null. Pure-Node ZIP local-file-header walker lifted from the proven logic in scripts/fetch-title.ts; returns stored (method 0) bytes directly and inflates deflate (method 8) via inflateRawSync. Exported from the package index.
  • sync-law.yml Transform step now: extractXmlFromZip(Buffer.from(xmlResult.value, 'base64')) → null-guards (counts a failed title) → transformToFiles(xml). Unchanged-skip, failure counting, and output writing all preserved. The existing per-title try/catch still covers any malformed-archive throw.

Tests

New packages/fetcher/src/__tests__/zip.test.ts builds ZIPs in memory and covers:

  • stored (method 0) extraction
  • deflate (method 8) round-trip via deflateRawSync/inflateRawSync
  • archive with no .xml entry → null

Verification: pnpm --filter @civic-source/fetcher test126 passed; pnpm build8/8.

Follow-up

A CI smoke test that transforms a real fixture end-to-end (proposed in #194) would guard against both this and the earlier resolution bug. Tracked there.

🤖 Generated with Claude Code

OlrcFetcher.fetchXml() returns base64-encoded ZIP bytes (the archive
containing the USLM .xml), but the sync workflow's "Transform statutes"
step passed that value straight into transformer.transformToFiles(),
which expects raw XML. Every title failed to parse, so the weekly sync
transformed 0 sections and never committed anything.

Add a tested, pure-Node `extractXmlFromZip(zip: Buffer)` to the fetcher
package (lifted from the proven logic in scripts/fetch-title.ts; uses
inflateRawSync for stored/deflate entries) and wire the workflow to
decode base64 → unzip → transform, failing the title cleanly if the
archive has no .xml entry.

New unit tests cover stored (method 0), deflate (method 8 round-trip),
and no-.xml-entry cases. fetcher: 126 tests pass; full monorepo builds.

Closes #199

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@williamzujkowski williamzujkowski requested a review from a team as a code owner June 23, 2026 03:43
@williamzujkowski williamzujkowski merged commit e7712cd into main Jun 23, 2026
3 checks passed
williamzujkowski added a commit that referenced this pull request Jun 23, 2026
…214)

* chore: remove unused @civic-source/pipeline and observability packages

Both packages were built, tested, and documented but had zero importers:
nothing in the repo (workflows, scripts, apps, other packages) imported
@civic-source/pipeline or @civic-source/observability. The live
sync-law.yml inlines its own fetch→transform→annotate logic and never
called orchestrate(); pipeline didn't even depend on observability.

Per the repo's YAGNI principle, remove the speculative packages rather
than carry the maintenance/build surface. orchestrate() also still
carried the pre-#204 base64-ZIP bug and lacked the data-repo
git/commit/push half, so it was not a drop-in for the workflow anyway.

- Delete packages/pipeline and packages/observability (+ their tests).
- Drop their rows from README.md and ARCHITECTURE.md; relabel the
  ARCHITECTURE data-flow diagram's orchestration box "Pipeline" →
  "Workflow" (the GitHub Actions sync now fills that role).
- Regenerate pnpm-lock.yaml.

Build drops 8→6 packages; full build + tests green; frozen-lockfile
install clean.

Closes #208

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

* docs: de-stale README dev section after package removal

Removing pipeline/observability made "267 tests across 8 packages"
inaccurate; replace the brittle hardcoded count with a non-numeric note.
Also correct the toolchain line (Node 22.x/pnpm 9.x → Node 24.x/pnpm
11.x) to match the #183 migration, mirroring the earlier CLAUDE.md fix.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

---------

Co-authored-by: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

fix(ci): sync pipeline is inert — fetchXml returns base64 ZIP, workflow never unzips before transform

1 participant