Skip to content

Stage 0 — Baseline as-usual run on africa_me_legacy (production machine) #21

@Polichinel

Description

@Polichinel

Stage 0 — Baseline as-usual run on africa_me_legacy (production machine)

Part of the FAO global delivery plan — umbrella: #20. No code changes in this issue.

Why this exists

Two reasons, and the second is the one that makes this more than "just a test":

  1. Prove the infrastructure works as it usually does. This run exercises everything the development environment cannot: real Git-LFS shapefiles, the datafactory zarr fetch over HTTP, Appwrite credentials, and the full _read → _transform → _validate → _save pipeline at the current africa_me_legacy coverage (13,110 cells). Green here means any later failure is caused by our changes, not by drifted infrastructure.
  2. Produce the ground-truth baseline for the engine swap. The enriched output of this run — produced by the current runtime mapper on real input — is the exact artifact the new lookup enricher will be diffed against in Stage 2 (#LINK_E). A real production output is the best possible verification data; this run creates it for free.

What to run

The pipeline unchanged, on the production machine, region africa_me_legacy, exactly as a normal delivery run — with one decision:

  • Recommended: skip the final Appwrite upload (or treat it as a normal upload if operationally simpler — the bucket has no retention either way, see register C-25 narrative). Record which choice was made here.

What to archive (the baseline artifact)

Create a baseline/ directory (local to the production machine or committed as a release artifact — record where) containing:

  1. The enriched historical output parquet (post-_transform, post-_validate)
  2. The enriched forecast output parquet
  3. A baseline_schema.md recording, for both dataframes: exact column list, dtypes per column, index structure (MultiIndex names and dtypes), row counts
  4. Operational numbers: wall-clock runtime of the full run, peak memory if observable, and per-stage timings if logs allow

Item 3 is the acceptance reference for Stage 3 (#LINK_F: "schema identical to baseline") and Stage 4 (faoapi column-list verification). Item 4 is the comparison point for the global dry run (register C-32).

Definition of done

Notes

  • If this run is NOT green, stop the plan and diagnose here first — that is this issue doing its job. Likely suspects are environment drift (credentials, LFS state, zarr reachability), not code: the same code delivered successfully before.
  • The 5 africa_me ocean cells (gids 62356, 94776, 99027, 107733, 107742) have historically been part of the 13,110-cell input. If validation passes today, the current mapper assigns them somehow (Natural Earth detail) — worth noting in the baseline what values they carry, since the lookup-based future will exclude them via the land_gaul region story (views-platform/views-datafactory#159).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions