Skip to content

ADR-011: precomputed GAUL lookup enrichment (Stages 1 + 3)#25

Merged
Polichinel merged 9 commits into
developmentfrom
feature/adr-011-gaul-lookup-enrichment
Jun 18, 2026
Merged

ADR-011: precomputed GAUL lookup enrichment (Stages 1 + 3)#25
Polichinel merged 9 commits into
developmentfrom
feature/adr-011-gaul-lookup-enrichment

Conversation

@Polichinel

Copy link
Copy Markdown
Collaborator

ADR-011: replace the runtime spatial mapper with a precomputed GAUL lookup (Stages 1 + 3)

Umbrella: #20 · Stage 1: #22 · Stage 3: #24

Important

Do not merge yet. This is ready for review, not for merge. The end-to-end production-run verification (needs the prod host: pipeline-core + Appwrite + the zarr) has not happened, and going global (Stage 4) is a separate change in views-models. Merge is the author's call once those are settled.

What this does

Replaces the 3,171-line runtime spatial mapper (mapping.py, 774 MB of LFS shapefiles, geopandas) with a precomputed GAUL lookup table + a merge-by-cell-id enricher, sourced from the views-datafactory's area-majority GAUL parquets (v1.3.0). The geography of each prediction cell is now attached by a table join, not live spatial computation.

  • Stage 1 — build the lookup + GaulLookupEnricher + tests (purely additive).
  • Stage 3 — the one behavioral change: UNFAOPostProcessorManager enriches via the lookup instead of the mapper.

mapping.py stays in the repo but is no longer used by the manager; it (and geopandas) are slated for removal after one verified production cycle.

Why it's safe

  • Schema is reproduced exactly — the same 9 columns, dtypes, and index. So views-faoapi needs zero changes (acceptance criterion).
  • Fail-loud preserved — unknown/incomplete cells merge to null, so the manager's _validate() gate still crashes rather than shipping a hole.
  • Behaviour fully characterised — a shadow diff ran the old mapper and the new enricher on all 13,110 africa_me cells (real production shapefiles):
cells share
Identical (all 9 columns) 12,597 96.1%
Changed (all on borders) 512 3.9%
Unexplained 0 0%

Every difference is the FAO-contracted area-majority rule arriving end-to-end. See reports/enrichment_diff/report.md + maps.

Two findings worth a reviewer's eye

  • C-35 (register, Tier 1): the current pipeline ships Somaliland cells to FAO with country_iso_a3 = "-99" (Natural Earth's no-ISO sentinel), which passes the null-only gate. The new lookup removes all -99 (Somaliland → SOM per GAUL). A data-quality fix.
  • Morocco / Western Sahara: 69 cells shift Morocco → Western Sahara (both valid codes; GAUL keeps Western Sahara separate). Politically sensitive; to be disclosed in the Stage 4 FAO release note.

Both align the data with GAUL — FAO's own boundary product — which is the contractual outcome.

Verification

  • ruff clean; functional suite 109 passed (the 43 red test_falsification_* are pre-existing by-design audit stubs, unmodified here).
  • Manager can't be instantiated without pipeline-core, so _append_metadata and _validate are covered by replica tests (tests/test_append_metadata.py, tests/test_validation.py).

Out of scope (separate, later)

  • Going global (REGION = "land_gaul" in views-models) — Stage 4 / views-models#127.
  • Deleting mapping.py + dropping geopandas — after one verified production cycle.

🤖 Generated with Claude Code

Polichinel and others added 9 commits June 18, 2026 10:51
…34/D-07..D-10, deploy-readiness stubs

Session artifacts from the FAO global-delivery investigation:
- cross_repo_integration_report.md: datafactory<->postprocessing<->Appwrite<->faoapi
- ADR-011 assessment §10: datafactory prerequisites met (v1.3.0 area-majority)
- register: 9 new concerns + 4 disagreements from expert review + cross-repo audit
- test_datafactory_deploy_readiness.py: deploy-gate falsification stubs

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces the runtime spatial mapper's enrichment with a precomputed table:
- scripts/build_gaul_lookup.py: joins the datafactory's 7 area-majority GAUL
  parquets (v1.3.0), renames to the 9-column contract, computes coords from
  the gid, keeps only fully-complete cells. No geopandas.
- data/gaul_lookup.parquet: 64,742 land_gaul cells, 0 nulls, 0.9 MB.
- unfao/enrichment.py: GaulLookupEnricher, merge-by-gid, categorical strings,
  fail-loud (unknown cell -> null -> validation catches).
- tests/test_enrichment.py: 16 tests (integrity, contract, coords, fail-loud).

Not yet wired into the live pipeline (that is Stage 3). Core suite: 104 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…w enricher)

africa_me (13,110 cells), all 9 columns, on exact production shapefiles:
- 96.1% identical (12,597); 512 changed; ZERO unexplained
- 259 country_reassignment + 247 admin_reallocation (all border cells),
  4 ocean cells dropped by land_gaul, 2 coastal recovered
- no ISO code-system noise (NE and GAUL agree on alpha-3)

scripts/diff_enrichment.py: runs both engines, classifies every difference.
scripts/plot_enrichment_diff.py: 5 maps (agree/disagree, by-class,
country before/after, Lesotho + Namibia/Botswana border zooms).
reports/enrichment_diff/: report.md, summary.json, diff_cells.csv, maps/.

Every change is the FAO-contracted area-majority arriving; none regress.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…itories analysis

C-35 (Tier 1): the current mapper ships Somaliland cells to FAO with
country_iso_a3 = -99 (Natural Earth's no-ISO sentinel); it passes the
null-only validation gate. Measured: 64 africa_me cells. Resolved by the
ADR-011 lookup (0 occurrences of -99 in all 64,742 cells -> SOM/Somalia).

Added a disputed-territories section to the enrichment diff report with the
Morocco/Western Sahara (boundary-placement) and Somalia/Somaliland (-99 fix)
worked examples, confirmed at the shapefile level.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
W1: portable datafactory resolution ($VIEWS_DATAFACTORY or sibling repo) in
    build_gaul_lookup.py and diff_enrichment.py — no more machine-specific path.
W2: enricher logs ignored mapper-only kwargs at debug (not wholly silent).
S1/S2: new gaul_schema.py is the single source of truth for METADATA_COLS, the
    source->contract rename map, and the PRIO-GRID coordinate formula; enricher,
    build, diff, and plot all import it (no duplicated contract/constants).
S3: add docs/CICs/GaulLookupEnricher.md.
S4: reports/enrichment_diff/README.md documents the committed evidence artifacts.

Rebuilt lookup is byte-identical to the committed one; core suite 104 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… ambiguous var

- diff_enrichment.py: move package imports to top (E402), drop unused CODE_COLS
- test_datafactory_deploy_readiness.py: l -> ln (E741)

ruff: all checks passed. Functional suite: 104 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The one behavioral change in the plan: UNFAOPostProcessorManager now enriches
via the precomputed GAUL lookup instead of the runtime spatial mapper.

- unfao.py: import + __init__ (self._mapper -> self._enricher) + the one
  _append_metadata call; drop mapper-only batch_size kwarg; delete two dead
  commented AppwriteConfig blocks; description string no longer names the mapper.
- UNFAOPostProcessorManager.md: enrichment source is now the lookup; mapper kept
  inert, slated for removal after one verified production cycle.
- tests/test_append_metadata.py: replicates the _append_metadata join sequence
  with the enricher (manager can't be instantiated without pipeline-core).

mapping.py stays in the repo, no longer used by the manager. Revert this commit
to restore the mapper path. ruff clean; functional suite 109 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
D1: manager now imports gaul_schema.METADATA_COLS instead of hardcoding the
    9-column contract twice (filter_cols in _append_metadata, and
    _necessary_metadata_cols in _validate) — removes drift risk now that there
    is a single source of truth.
F1: UNFAOPostProcessorManager CIC test-alignment note updated to reference the
    replica tests (test_validation.py, test_append_metadata.py).

ruff clean; functional suite 109 passed.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-merge full-PR review caught the manager README still describing
PriogridCountryMapper / _mapper as the enrichment mechanism — false after the
Stage 3 swap. Updated the architecture diagram, data-flow text, attribute table
(_mapper -> _enricher), troubleshooting, and doc links; the runtime mapper link
is kept but labelled legacy/retained.

Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@Polichinel Polichinel merged commit 40e19ce into development Jun 18, 2026
2 of 3 checks passed
@Polichinel Polichinel deleted the feature/adr-011-gaul-lookup-enrichment branch June 18, 2026 13:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant