ADR-011: precomputed GAUL lookup enrichment (Stages 1 + 3)#25
Merged
Conversation
…34/D-07..D-10, deploy-readiness stubs Session artifacts from the FAO global-delivery investigation: - cross_repo_integration_report.md: datafactory<->postprocessing<->Appwrite<->faoapi - ADR-011 assessment §10: datafactory prerequisites met (v1.3.0 area-majority) - register: 9 new concerns + 4 disagreements from expert review + cross-repo audit - test_datafactory_deploy_readiness.py: deploy-gate falsification stubs Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Replaces the runtime spatial mapper's enrichment with a precomputed table: - scripts/build_gaul_lookup.py: joins the datafactory's 7 area-majority GAUL parquets (v1.3.0), renames to the 9-column contract, computes coords from the gid, keeps only fully-complete cells. No geopandas. - data/gaul_lookup.parquet: 64,742 land_gaul cells, 0 nulls, 0.9 MB. - unfao/enrichment.py: GaulLookupEnricher, merge-by-gid, categorical strings, fail-loud (unknown cell -> null -> validation catches). - tests/test_enrichment.py: 16 tests (integrity, contract, coords, fail-loud). Not yet wired into the live pipeline (that is Stage 3). Core suite: 104 passed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…w enricher) africa_me (13,110 cells), all 9 columns, on exact production shapefiles: - 96.1% identical (12,597); 512 changed; ZERO unexplained - 259 country_reassignment + 247 admin_reallocation (all border cells), 4 ocean cells dropped by land_gaul, 2 coastal recovered - no ISO code-system noise (NE and GAUL agree on alpha-3) scripts/diff_enrichment.py: runs both engines, classifies every difference. scripts/plot_enrichment_diff.py: 5 maps (agree/disagree, by-class, country before/after, Lesotho + Namibia/Botswana border zooms). reports/enrichment_diff/: report.md, summary.json, diff_cells.csv, maps/. Every change is the FAO-contracted area-majority arriving; none regress. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
…itories analysis C-35 (Tier 1): the current mapper ships Somaliland cells to FAO with country_iso_a3 = -99 (Natural Earth's no-ISO sentinel); it passes the null-only validation gate. Measured: 64 africa_me cells. Resolved by the ADR-011 lookup (0 occurrences of -99 in all 64,742 cells -> SOM/Somalia). Added a disputed-territories section to the enrichment diff report with the Morocco/Western Sahara (boundary-placement) and Somalia/Somaliland (-99 fix) worked examples, confirmed at the shapefile level. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
W1: portable datafactory resolution ($VIEWS_DATAFACTORY or sibling repo) in
build_gaul_lookup.py and diff_enrichment.py — no more machine-specific path.
W2: enricher logs ignored mapper-only kwargs at debug (not wholly silent).
S1/S2: new gaul_schema.py is the single source of truth for METADATA_COLS, the
source->contract rename map, and the PRIO-GRID coordinate formula; enricher,
build, diff, and plot all import it (no duplicated contract/constants).
S3: add docs/CICs/GaulLookupEnricher.md.
S4: reports/enrichment_diff/README.md documents the committed evidence artifacts.
Rebuilt lookup is byte-identical to the committed one; core suite 104 passed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
… ambiguous var - diff_enrichment.py: move package imports to top (E402), drop unused CODE_COLS - test_datafactory_deploy_readiness.py: l -> ln (E741) ruff: all checks passed. Functional suite: 104 passed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
The one behavioral change in the plan: UNFAOPostProcessorManager now enriches via the precomputed GAUL lookup instead of the runtime spatial mapper. - unfao.py: import + __init__ (self._mapper -> self._enricher) + the one _append_metadata call; drop mapper-only batch_size kwarg; delete two dead commented AppwriteConfig blocks; description string no longer names the mapper. - UNFAOPostProcessorManager.md: enrichment source is now the lookup; mapper kept inert, slated for removal after one verified production cycle. - tests/test_append_metadata.py: replicates the _append_metadata join sequence with the enricher (manager can't be instantiated without pipeline-core). mapping.py stays in the repo, no longer used by the manager. Revert this commit to restore the mapper path. ruff clean; functional suite 109 passed. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
D1: manager now imports gaul_schema.METADATA_COLS instead of hardcoding the
9-column contract twice (filter_cols in _append_metadata, and
_necessary_metadata_cols in _validate) — removes drift risk now that there
is a single source of truth.
F1: UNFAOPostProcessorManager CIC test-alignment note updated to reference the
replica tests (test_validation.py, test_append_metadata.py).
ruff clean; functional suite 109 passed.
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Pre-merge full-PR review caught the manager README still describing PriogridCountryMapper / _mapper as the enrichment mechanism — false after the Stage 3 swap. Updated the architecture diagram, data-flow text, attribute table (_mapper -> _enricher), troubleshooting, and doc links; the runtime mapper link is kept but labelled legacy/retained. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
ADR-011: replace the runtime spatial mapper with a precomputed GAUL lookup (Stages 1 + 3)
Umbrella: #20 · Stage 1: #22 · Stage 3: #24
Important
Do not merge yet. This is ready for review, not for merge. The end-to-end production-run verification (needs the prod host: pipeline-core + Appwrite + the zarr) has not happened, and going global (Stage 4) is a separate change in views-models. Merge is the author's call once those are settled.
What this does
Replaces the 3,171-line runtime spatial mapper (
mapping.py, 774 MB of LFS shapefiles, geopandas) with a precomputed GAUL lookup table + a merge-by-cell-id enricher, sourced from the views-datafactory's area-majority GAUL parquets (v1.3.0). The geography of each prediction cell is now attached by a table join, not live spatial computation.GaulLookupEnricher+ tests (purely additive).UNFAOPostProcessorManagerenriches via the lookup instead of the mapper.mapping.pystays in the repo but is no longer used by the manager; it (and geopandas) are slated for removal after one verified production cycle.Why it's safe
_validate()gate still crashes rather than shipping a hole.Every difference is the FAO-contracted area-majority rule arriving end-to-end. See
reports/enrichment_diff/report.md+ maps.Two findings worth a reviewer's eye
country_iso_a3 = "-99"(Natural Earth's no-ISO sentinel), which passes the null-only gate. The new lookup removes all-99(Somaliland →SOMper GAUL). A data-quality fix.Both align the data with GAUL — FAO's own boundary product — which is the contractual outcome.
Verification
ruffclean; functional suite 109 passed (the 43 redtest_falsification_*are pre-existing by-design audit stubs, unmodified here)._append_metadataand_validateare covered by replica tests (tests/test_append_metadata.py,tests/test_validation.py).Out of scope (separate, later)
REGION = "land_gaul"in views-models) — Stage 4 / views-models#127.mapping.py+ dropping geopandas — after one verified production cycle.🤖 Generated with Claude Code