Skip to content

Integration test: mapper-to-manager boundary contract (filter_cols) #4

@Polichinel

Description

@Polichinel

Context

The UNFAOPostProcessorManager._append_metadata() method (unfao.py:140-163) calls PriogridCountryMapper.enrich_dataframe_with_pg_info(), then selects exactly 9 columns via filter_cols, re-indexes, and joins back onto the original dataset. This is the single most important boundary in the codebase: the mapper produces enrichment data, and the manager consumes it.

Currently, the mapper and manager are tested in isolation:

  • test_mapping.py tests the mapper's spatial logic with synthetic shapefiles
  • test_validation.py tests the validation logic with a replicated function

No test verifies the boundary between them. If the mapper changes a column name (e.g., the 3-step derivation: shapefile ISO_A3 → dict key iso_a3 → prefixed country_iso_a3), the mapper tests pass, the validation tests pass, and production breaks.

This is the #1 priority test identified by the falsification campaign (Claim 3.4, FALSIFIED) and the prioritization review.

Why now

ADR-011 plans to replace the runtime mapper with a precomputed Parquet lookup table. When that happens, this integration test becomes the acceptance test for the new enricher — it verifies that the replacement produces the same output the manager expects. Writing it BEFORE the replacement ensures we have a safety net during the transition.

Requirements

Test: test_integration_mapper_to_manager_filter_cols

Setup:

  • Use the existing mapper fixture from conftest.py (synthetic shapefiles)
  • Create a DataFrame with the columns _append_metadata expects as input: a multi-index of (month_id, priogrid_gid) with at least 3 rows covering cells in different countries

Act:

  • Call mapper.enrich_dataframe_with_pg_info() with the same parameters _append_metadata uses:
    • pg_id_col=<entity_id_column_name>
    • time_id_col=<time_id_column_name>
    • only_metadata=True
    • batch_size=1000
  • Select the exact filter_cols list from the enriched result (copy the list from unfao.py:146-157):
    filter_cols = [
        "month_id", "priogrid_gid",
        "pg_xcoord", "pg_ycoord", "country_iso_a3",
        "admin1_gaul1_code", "admin1_gaul1_name",
        "admin1_gaul0_code", "admin1_gaul0_name",
        "admin2_gaul2_code", "admin2_gaul2_name",
    ]

Assert:

  1. All 9 metadata columns (excluding the index columns) are present in the enriched DataFrame
  2. None of the 9 columns contain null values for GIDs that exist in the synthetic shapefiles
  3. country_iso_a3 contains valid ISO codes from the synthetic data ("AAA" or "BBB")
  4. admin1_gaul1_code is an integer (not float, not string)
  5. pg_xcoord and pg_ycoord are floats
  6. The enriched DataFrame has the same row count as the input

Test: test_integration_enrichment_then_validation

Setup:

  • Same as above — enrich a DataFrame through the mapper

Act:

  • After enrichment, run the validation logic (the same checks _validate() performs) against the enriched DataFrame

Assert:

  1. All 9 required metadata columns are present
  2. No null values in any required column
  3. If this test passes, it means the mapper's output would pass _validate() in production

Acceptance criteria

  • Both tests pass with the existing synthetic shapefile fixtures
  • Both tests are in a new file: tests/test_integration.py
  • The filter_cols list is copied verbatim from unfao.py (not hardcoded separately) — or imported from a shared constant if one is created
  • Tests use the existing mapper session fixture, not a new mapper instance
  • Tests run in <1 second (no shapefile loading per test — session fixture handles it)

References

  • Falsification campaign Claim 3.4 (FALSIFIED): docs/falsification_campaign.md
  • Test stub: tests/test_falsification_campaign_3_4.py
  • Risk register: C-17 (implicit column naming contract), Cluster D (mapper-manager boundary)
  • CIC: docs/CICs/PriogridCountryMapper.md §5, docs/CICs/UNFAOPostProcessorManager.md §4
  • ADR-011: precomputed lookup table (this test becomes the acceptance test for the transition)

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions