Stage 2 — Shadow diff: runtime mapper baseline vs lookup enricher on africa_me_legacy
Part of the FAO global delivery plan — umbrella: #20. Gated on Stage 0 (#21, provides the baseline) and Stage 1 (#22, provides the enricher). This is the verification gate before any behavioral change — Stage 3 (#LINK_F) does not start until this report shows zero unexplained differences.
Why this exists
The engine swap changes not just how the 9 metadata columns are computed but, for a minority of cells, what values they hold. That is intentional — the lookup implements the FAO-contracted area-majority rule from a single consistent boundary source — but every changed value must be explained before it ships, not discovered by FAO. The africa_me_legacy baseline from Stage 0 is real production output on real input; diffing against it characterizes the swap completely at the current coverage before global multiplies the stakes.
Spec
A diff harness script (scripts/diff_enrichment.py or similar, committed):
- Load the Stage-0 baseline enriched output (historical dataframe; 13,110 unique gids).
- Run the Stage-1 enricher on the same gid list.
- Compare all 9 metadata columns per gid. Output: per-column match counts, and a per-gid record for every difference.
- Classify every difference into one of the expected classes:
| Class |
Cause |
Expected scale (from the cross-repo investigation) |
| ISO source switch |
country_iso_a3: mapper uses Natural Earth, lookup uses GAUL — disputed/differently-drawn borders |
Small; concentrated at known disputed territories (e.g. Western Sahara, Abyei, Hala'ib) |
| Border-cell algorithm |
Mapper = NE-country-first then GAUL-within-country (sequential); lookup = single GAUL L2 area-majority |
Subset of the ~711 cells (5.4%) where the platform's algorithms historically disagreed |
| Coastal completion |
Cells where one engine assigns and the other doesn't (incl. how the baseline handled the 5 ocean cells — see Stage 0 notes) |
Up to ~149 coastal-class cells |
| UNEXPLAINED |
Anything not in the above |
Must be zero |
- Write
reports/enrichment_diff_report.md: counts per class, methodology, and the full per-cell difference list as a CSV artifact alongside.
Reading the result
- Zero unexplained → Stage 3 is unblocked. The explained classes are not regressions; they are the contractually-correct behavior arriving (area-majority per FAO Release Note 02). Their counts feed the FAO release note (Stage 4, #LINK_G).
- Any unexplained difference → stop. Diagnose in this issue. Plausible causes: dtype drift in the lookup build, a rename-map error, a stale source parquet. Do not rationalize after the fact — if it does not fit a pre-declared class, it is a bug until proven otherwise.
Definition of done
Stage 2 — Shadow diff: runtime mapper baseline vs lookup enricher on africa_me_legacy
Part of the FAO global delivery plan — umbrella: #20. Gated on Stage 0 (#21, provides the baseline) and Stage 1 (#22, provides the enricher). This is the verification gate before any behavioral change — Stage 3 (#LINK_F) does not start until this report shows zero unexplained differences.
Why this exists
The engine swap changes not just how the 9 metadata columns are computed but, for a minority of cells, what values they hold. That is intentional — the lookup implements the FAO-contracted area-majority rule from a single consistent boundary source — but every changed value must be explained before it ships, not discovered by FAO. The africa_me_legacy baseline from Stage 0 is real production output on real input; diffing against it characterizes the swap completely at the current coverage before global multiplies the stakes.
Spec
A diff harness script (
scripts/diff_enrichment.pyor similar, committed):country_iso_a3: mapper uses Natural Earth, lookup uses GAUL — disputed/differently-drawn bordersreports/enrichment_diff_report.md: counts per class, methodology, and the full per-cell difference list as a CSV artifact alongside.Reading the result
Definition of done
reports/enrichment_diff_report.mdcommitted with per-class counts + CSV artifact of all differing cells