[UMBRELLA] FAO global delivery: lookup-based enrichment (ADR-011) + global coverage

# [UMBRELLA] FAO global delivery: lookup-based enrichment (ADR-011) + global coverage

## Why this exists

The UN FAO postprocessing pipeline must deliver **global historical conflict data** to the FAO Appwrite prediction store, followed shortly by two or three additional Appwrite-based prediction stores. Today the pipeline covers only `africa_me_legacy` (13,110 PRIO-GRID cells) and enriches predictions with geographic metadata via a 3,171-line runtime spatial mapper (`mapping.py`) that loads 774 MB of Git-LFS shapefiles at import time.

A full cross-repo investigation (2026-06-12, see `docs/cross_repo_integration_report.md` in this repo) established that scaling the runtime mapper to global coverage is the **higher-risk** path: its behavior at 64,818 cells has never been observed, cannot be tested outside the production machine (shapefiles are LFS stubs elsewhere), and its failure modes (unknown runtime, unknown memory, unknown Natural-Earth coverage gaps) would surface at runtime, hours into a delivery run. By contrast, the precomputed lookup table proposed by **ADR-011** can be verified completely in advance: the views-datafactory shipped area-majority GAUL assignments (its issue #115 → PR #127, v1.2.28/29), and all 7 source parquets were regenerated 2026-06-11 with 259,200 rows each. We verified directly that **64,736 of the 64,818 land cells** have complete metadata; exactly **82 cells** (sub-Antarctic islands not covered by FAO GAUL 2024) are unassigned.

This umbrella tracks the agreed plan ("Plan C"): test the unchanged pipeline on africa_me first, use that run's output as the verification baseline, swap the enrichment engine, then go global.

## The plan (Plan C)

| Stage | Issue | What happens | Behavioral change? | Definition of done |
|---|---|---|---|---|
| 0. Baseline | (this repo, see child issues) | Run the pipeline **unchanged** on africa_me_legacy on the production machine; archive output as ground truth | None | Green run; baseline parquet + schema archived |
| 1. Build | (this repo) | Lookup parquet from datafactory's 7 area-majority parquets + merge-by-gid enricher + tests | None — purely additive | pytest green incl. coverage tests |
| — parallel | views-datafactory | New bundled curated region `land_gaul` (land ∩ GAUL coverage, 64,736 cells) | None — purely additive | Test pins count; version released |
| 2. Diff | (this repo) | Shadow comparison: Stage-0 baseline vs enricher output, same 13,110 cells, all 9 columns | None | Diff report, zero unexplained differences |
| 3. Swap | (this repo) | `_append_metadata` uses the enricher (one method body) | **The only behavioral change in the plan** | africa_me green; schema identical to baseline |
| 4. Go global | views-models | `REGION = "land_gaul"` (one config line); dry run; deliver; FAO release note | Coverage change | faoapi serves global historical; FAO notified |

**Child issues:**
- A (views-datafactory): `land_gaul` region — views-platform/views-datafactory#159
- C (Stage 0 baseline): #21
- D (Stage 1 build): #22
- E (Stage 2 diff): #23
- F (Stage 3 swap): #24
- G (views-models, Stage 4 go global): views-platform/views-models#127

## Dependency graph

```
views-datafactory A (region, additive) ──→ pip update datafactory_query on prod machine ──→ G (config flip, LAST)
C (baseline run) ──┐
                   ├──→ E (shadow diff) ──→ F (swap) ──→ G
D (lookup build) ──┘
```

A, C, and D are fully independent and can run in parallel starting immediately. The only ordering constraints: the diff (E) needs both the baseline (C) and the enricher (D); the swap (F) is gated on a clean diff; the flip (G) goes last and requires both the datafactory release installed on the production machine and the swap verified.

## Splash zone

| Repo | Change | Risk profile |
|---|---|---|
| views-datafactory | New bundled region (json + regions.py entry + script + test). **Purely additive.** | Low — `land` and `africa_me_legacy` untouched; cannot break existing consumers |
| views-postprocessing | Lookup + enricher + tests (additive), then **one method body** in `unfao.py:_append_metadata` | Contained — one commit, instantly revertable, gated by the Stage-2 diff |
| views-models | **One config line** (`config_queryset.py:20`) | Trivial code-wise; it IS the go-global act, so it goes last |
| **views-pipeline-core** | **FROZEN — zero changes.** The dataloader passes the region string through untouched (`dataloaders.py:1182`). Its known issues (postprocessing register C-26 fillna, C-27 swallowed exceptions, C-28 zarr timeout, C-29 cache side-channel) are real and deliberately deferred: pipeline-core has the widest blast radius in the platform (every model depends on it), and none of these is on the delivery critical path. | — |
| **views-faoapi** | **FROZEN — zero changes. This is the acceptance criterion**, not an accident: the lookup must reproduce the 9 metadata columns exactly (they are hard-validated at `handlers.py:1146-1163`), making the entire engine swap invisible downstream. | — |
| Appwrite | No schema/bucket/collection changes; new files through the existing path only | — |

## The schema contract (do not touch)

The 9 metadata columns are independently hard-coded at three enforcement points — postprocessor selection (`unfao.py:141-153`), postprocessor validation (`unfao.py:189-197`), and faoapi dataset validation (`handlers.py:1146-1156`):

```
pg_xcoord, pg_ycoord, country_iso_a3,
admin1_gaul1_code, admin1_gaul1_name, admin1_gaul0_code, admin1_gaul0_name,
admin2_gaul2_code, admin2_gaul2_name
```

Any rename anywhere breaks views-faoapi with HTTP 500 on the first request. No renaming is in scope for this delivery (the separate FAO-contract-names question, register C-24/D-06, is explicitly deferred).

## Risk register cross-references (this repo, `reports/technical_risk_register.md`)

- **C-30** (Tier 1): the 82 GAUL-uncovered land cells — resolved upstream via the datafactory region (D-10)
- **C-31** (Tier 2): runtime mapper unverified at global scale — the reason for the lookup-first adjudication (D-08)
- **C-32** (Tier 2): memory at ~28M rows — categorical dtypes in the enricher + mandatory dry run before delivery
- **C-33** (Tier 2): hardcoded store identity — deferred to the multi-store phase (D-09)
- **C-34** (Tier 2): coverage has no contract — addressed by the coverage tests in Stage 1
- **D-08**: global via mapper vs lookup — adjudicated lookup-first
- **D-09**: multi-store now vs after — adjudicated after; only exception: delete dead config blocks during the swap
- **D-10**: the 82 cells — resolved: upstream curated region, postprocessor keeps zero spatial knowledge

## Explicitly out of scope (with their "when")

1. **views-pipeline-core fixes** (C-26 fillna data fabrication, C-27 swallowed constructor exceptions, C-28 zarr timeout, C-29 disk side-channel): after delivery; C-26 first — it is Tier 1.
2. **Multi-store `DeliveryProfile`** (C-33): the calm week after FAO global ships. One manager class, N store configs; never copy-paste the manager.
3. **Deleting `mapping.py`**: after one verified production cycle on the lookup.
4. **Column renaming to FAO contract names** (C-24): separate coordinated 2-repo change + FAO sign-off, never bundled with this delivery.

## Definition of green for the whole umbrella

1. faoapi `/data/historical/latest` serves global (64,736-cell) historical data with the baseline column list.
2. Every stage's DoD met and checked off in its child issue.
3. FAO has received one combined release note: global coverage, area-majority attribution changes, the 82 excluded island cells, and the historical-global vs forecast-regional coverage asymmetry.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[UMBRELLA] FAO global delivery: lookup-based enrichment (ADR-011) + global coverage #20

[UMBRELLA] FAO global delivery: lookup-based enrichment (ADR-011) + global coverage

Why this exists

The plan (Plan C)

Dependency graph

Splash zone

The schema contract (do not touch)

Risk register cross-references (this repo, `reports/technical_risk_register.md`)

Explicitly out of scope (with their "when")

Definition of green for the whole umbrella

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Stage	Issue	What happens	Behavioral change?	Definition of done
0. Baseline	(this repo, see child issues)	Run the pipeline unchanged on africa_me_legacy on the production machine; archive output as ground truth	None	Green run; baseline parquet + schema archived
1. Build	(this repo)	Lookup parquet from datafactory's 7 area-majority parquets + merge-by-gid enricher + tests	None — purely additive	pytest green incl. coverage tests
— parallel	views-datafactory	New bundled curated region `land_gaul` (land ∩ GAUL coverage, 64,736 cells)	None — purely additive	Test pins count; version released
2. Diff	(this repo)	Shadow comparison: Stage-0 baseline vs enricher output, same 13,110 cells, all 9 columns	None	Diff report, zero unexplained differences
3. Swap	(this repo)	`_append_metadata` uses the enricher (one method body)	The only behavioral change in the plan	africa_me green; schema identical to baseline
4. Go global	views-models	`REGION = "land_gaul"` (one config line); dry run; deliver; FAO release note	Coverage change	faoapi serves global historical; FAO notified

Repo	Change	Risk profile
views-datafactory	New bundled region (json + regions.py entry + script + test). Purely additive.	Low — `land` and `africa_me_legacy` untouched; cannot break existing consumers
views-postprocessing	Lookup + enricher + tests (additive), then one method body in `unfao.py:_append_metadata`	Contained — one commit, instantly revertable, gated by the Stage-2 diff
views-models	One config line (`config_queryset.py:20`)	Trivial code-wise; it IS the go-global act, so it goes last
views-pipeline-core	FROZEN — zero changes. The dataloader passes the region string through untouched (`dataloaders.py:1182`). Its known issues (postprocessing register C-26 fillna, C-27 swallowed exceptions, C-28 zarr timeout, C-29 cache side-channel) are real and deliberately deferred: pipeline-core has the widest blast radius in the platform (every model depends on it), and none of these is on the delivery critical path.	—
views-faoapi	FROZEN — zero changes. This is the acceptance criterion, not an accident: the lookup must reproduce the 9 metadata columns exactly (they are hard-validated at `handlers.py:1146-1163`), making the entire engine swap invisible downstream.	—
Appwrite	No schema/bucket/collection changes; new files through the existing path only	—

[UMBRELLA] FAO global delivery: lookup-based enrichment (ADR-011) + global coverage #20

Description

[UMBRELLA] FAO global delivery: lookup-based enrichment (ADR-011) + global coverage

Why this exists

The plan (Plan C)

Dependency graph

Splash zone

The schema contract (do not touch)

Risk register cross-references (this repo, reports/technical_risk_register.md)

Explicitly out of scope (with their "when")

Definition of green for the whole umbrella

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

Risk register cross-references (this repo, `reports/technical_risk_register.md`)