diff --git a/docs/falsification_campaign.md b/docs/falsification_campaign.md new file mode 100644 index 0000000..7ef16bb --- /dev/null +++ b/docs/falsification_campaign.md @@ -0,0 +1,198 @@ +# Falsification Campaign: Business Logic Verification + +## Purpose + +Before making any further changes to this repository, we must prove that we: +1. Understand the business logic +2. Can demonstrate the code works correctly +3. Have sufficient tests (unit, functional, regression, integration, end-to-end) to catch drift and breakage + +This document defines a sequence of falsifiable claims. Each claim is audited via `/falsify`, findings are registered via `/register-risk`, and then we move to the next claim. + +## CRITICAL CONSTRAINT: READ-ONLY CAMPAIGN + +**This campaign does NOT modify source code.** Its purpose is to **catalogue what we know, what we don't know, and what is broken** — not to fix anything. + +- **Do NOT modify** `mapping.py`, `unfao.py`, or any source file +- **Do NOT modify** existing tests to make them pass +- **Do** write NEW test stubs (falsification stubs are documentation of gaps, not fixes) +- **Do** update the risk register (that IS the catalogue) +- **Do** update CICs/ADRs if a finding reveals the documentation is wrong about the code's behavior — the documentation should describe what the code DOES, not what we wish it did +- **"Fix" means:** register the finding, write a failing test stub, update docs to match reality. That's it. +- **Source code fixes happen AFTER the campaign**, informed by the complete catalogue of findings + +If a claim is FALSIFIED, we register the findings, note them in the progress tracker, and move to the next claim. We do NOT enter a fix-and-reaudit loop. The goal is a complete inventory, not incremental repair. + +## Execution Protocol + +For each claim: +1. Run `/falsify ` +2. Confirm the proposed probes +3. After the verdict, run `/register-risk` to capture any findings +4. Update the progress tracker with the verdict +5. Move to the next claim regardless of verdict (SURVIVED, CONTESTED, or FALSIFIED) + +## Campaign Structure + +The claims are organized in 4 layers. Each layer builds on the previous. A falsification in an earlier layer blocks later layers. + +--- + +## Layer 1 — Do we understand the business logic? + +### Claim 1.1 +``` +The PriogridCountryMapper CIC (sections 3, 4, 5, and 6) accurately describes +the mapper's actual behavior for all documented operations: the largest-overlap +algorithm, the make_valid preprocessing, the zero-area guards, the cache modes, +the spatial index usage, and the return types of find_country_for_gid, +find_admin1_for_gid, find_admin2_for_gid, and enrich_dataframe_with_pg_info. +``` + +### Claim 1.2 +``` +The UNFAOPostProcessorManager CIC accurately describes the pipeline's actual +behavior: the 4-stage sequence (read, transform, validate, save), the two data +sources (ViewsER for historical, Appwrite for forecast), the enrichment via +PriogridCountryMapper, the null validation with log-before-raise, and the +upload to the UN FAO Appwrite bucket with enrichment_description metadata. +``` + +### Claim 1.3 +``` +ADR-011's description of the area-majority algorithm, the sequential allocation +rule, the FAO-confirmed constraints, and the precomputed lookup table strategy +is consistent with what the code in mapping.py implements and what the FAO +release notes (Pre-release Note 02 Topic A, Release Note 01 Topic C, Release +Note 02 Topic A) specify. No contradiction exists between the three sources. +``` + +--- + +## Layer 2 — Does the code produce correct results? + +### Claim 2.1 +``` +For PRIO-GRID cells that are entirely within a single country (no border +overlap), the mapper assigns them to that country with an overlap_ratio +very close to 1.0, and the method field reads "largest overlap". +``` + +### Claim 2.2 +``` +For PRIO-GRID cells that span two countries, the mapper assigns them to +the country with the larger area share. The overlap_ratio in the result +reflects the actual geometric proportion of the cell covered by the +assigned country. The cell is NOT assigned to the country containing +the centroid if a different country has more area overlap. +``` + +### Claim 2.3 +``` +The sequential allocation invariant holds in the current code: every cell's +Admin 1 and Admin 2 assignments are within the country assigned at Admin 0. +find_admin1_for_gid first calls find_country_for_gid, then filters admin +regions to that country before computing overlap. No cross-border subnational +assignment is possible. +``` + +### Claim 2.4 +``` +The enrichment output schema produced by enrich_dataframe_with_pg_info +(column names and types) matches exactly what _append_metadata in unfao.py +expects in its filter_cols list. The 3-step derivation (shapefile column +to dict key to prefixed DataFrame column) produces the correct names for +all 9 metadata columns. +``` + +--- + +## Layer 3 — Do we have sufficient tests? + +### Claim 3.1 +``` +The unit tests in test_mapping.py cover all 7 CIC guarantees from +PriogridCountryMapper section 3 and all 6 CIC failure modes from section 6, +with at least one test per guarantee and one test per failure mode. No CIC +promise is untested. +``` + +### Claim 3.2 +``` +The validation tests in test_validation.py verify that the validation logic +rejects every category of invalid input (missing columns, null values in +each of the 9 required columns, NaN values) and accepts valid input. The +test logic matches the actual _validate() implementation in unfao.py — if +the real code diverges from the test replica, a developer would notice. +``` + +### Claim 3.3 +``` +A regression test exists for every code fix applied in this session (C-01 +null validation, C-18 warning suppression removal, C-20 zero-area guards, +make_valid preprocessing, batch failure tracking) such that reverting any +single fix would cause at least one test to fail. +``` + +### Claim 3.4 +``` +An integration test exists that enriches a DataFrame through the mapper +(enrich_dataframe_with_pg_info) and then checks the output against the +manager's filter_cols list and null validation rules — testing the full +mapper-to-manager data path in a single test. +``` + +### Claim 3.5 +``` +An end-to-end test exists or can be constructed that exercises the full +pipeline from _read() through _save(), verifying that the output parquet +file has the correct schema (all 9 metadata columns present and non-null) +and that the upload metadata includes the enrichment_description timestamp. +``` + +--- + +## Layer 4 — Can we detect drift and breakage? + +### Claim 4.1 +``` +The CI pipeline (GitHub Actions run_pytest.yml) runs all tests on every +push to development and main, and the pytest step is active (not commented +out). A test failure would block the workflow. +``` + +### Claim 4.2 +``` +If a future contributor changes the assignment algorithm from area-majority +to centroid-based (e.g., replacing the overlap loop with a point-in-polygon +check), at least one existing test in test_mapping.py fails. +``` + +### Claim 4.3 +``` +If the enrichment output schema changes (a column in filter_cols is renamed, +added, or removed in either the mapper's _process_pg_batch or the manager's +_append_metadata), at least one existing test fails. +``` + +--- + +## Progress Tracker + +| Claim | Status | Round | Findings | Date | +|-------|--------|-------|----------|------| +| 1.1 | FALSIFIED | 1 | 3H/1S: cache guarantee contradicts C-05, return types underspecified (~170 keys not 4), CIC says WARNING code uses DEBUG, admin1 return has undocumented keys. All are CIC-doc issues — underlying code already in register. | 2026-06-02 | +| 1.2 | FALSIFIED | 1 | 2H: 3/7 raises lack logger.error (CIC claims ADR-008 compliance), 18 os.getenv() pass None without validation (CIC claims boundary failure). Already in C-19, C-13. | 2026-06-02 | +| 1.3 | SURVIVED | 1 | clean — ADR-011 consistent with code and FAO notes, schema divergence covered in prerequisite section | 2026-06-02 | +| 2.1 | SURVIVED | 1 | clean — interior cells: correct country, overlap_ratio=1.0, method="largest overlap" | 2026-06-02 | +| 2.2 | SURVIVED | 1 | clean — border cell: correct country (AAA=60%), overlap_ratio=0.60. Observation: synthetic data can't produce centroid-area disagreement. | 2026-06-02 | +| 2.3 | SURVIVED | 1 | clean — all 4 cells: admin1/admin2 consistent with admin0 country. No cross-border subnational assignment. | 2026-06-02 | +| 2.4 | SURVIVED | 1 | clean — all 9 filter_cols present in enriched output. 15 extra columns also present but filtered by _append_metadata. | 2026-06-02 | +| 3.1 | CONTESTED | 1 | 0H/5S: no test for make_valid, spatial index, invalid geometry, zero-area cell, missing columns. 8 of 13 CIC items tested, 5 gaps. | 2026-06-02 | +| 3.2 | CONTESTED | 1 | 0H/2S: no wrong-type test, validation tests use replica not real _validate(). Good coverage otherwise (36 parametrized tests). | 2026-06-02 | +| 3.3 | FALSIFIED | 1 | 4H: no regression test for C-18 (warning removal), C-20 (zero-area), make_valid, C-21 (batch tracking). Only C-01 (null validation) has regression coverage. | 2026-06-02 | +| 3.4 | FALSIFIED | 1 | 1H: no integration test exists for mapper→filter_cols→validate path. Mapper and manager tested in isolation only. | 2026-06-02 | +| 3.5 | FALSIFIED | 1 | 1H: no E2E pipeline test. views-pipeline-core not in test env blocks direct instantiation. | 2026-06-02 | +| 4.1 | CONTESTED | 1 | 0H/1S: pytest step active, runs on push. But cannot verify it's a required status check for PR merges (GitHub settings, not repo). | 2026-06-02 | +| 4.2 | SURVIVED | 1 | clean — test_cell_fully_in_country_a asserts method=="largest overlap", would fail if algorithm changed to centroid. overlap_ratio key existence also sensitive. | 2026-06-02 | +| 4.3 | CONTESTED | 1 | 0H/2S: enrichment test checks 5 of 9 filter_cols, validation replica can drift from real code. Rename of existing column IS caught. | 2026-06-02 | diff --git a/docs/falsification_loop_prompt.md b/docs/falsification_loop_prompt.md new file mode 100644 index 0000000..235d7cb --- /dev/null +++ b/docs/falsification_loop_prompt.md @@ -0,0 +1,41 @@ +# Falsification Campaign Loop Prompt + +Copy the block below and paste it as a `/loop` prompt in Claude Code. +It processes one claim at a time from `docs/falsification_campaign.md`. + +--- + +## Loop Prompt + +``` +Read docs/falsification_campaign.md. Find the first claim with status PENDING in the Progress Tracker. + +If no PENDING claims remain, say "Campaign complete — all claims processed" and stop. + +Otherwise, for the current claim: + +CRITICAL: This is a READ-ONLY campaign. Do NOT modify mapping.py, unfao.py, or any source file. Do NOT modify existing tests. The ONLY files you may create or modify are: falsification test stub files (new), the risk register, CICs/ADRs (to correct documentation that misdescribes actual code behavior), and the progress tracker. + +1. Run /falsify with the claim text (the content inside the ``` block under that claim heading). Follow the full falsification protocol: present probes, wait for confirmation, execute, classify, generate test stubs, report verdict. + +2. After the /falsify verdict, immediately run /register-risk to register any findings from the falsification. Follow the full registration protocol: extract, deduplicate, assign tiers, append, report. + +3. Update the Progress Tracker row for this claim in docs/falsification_campaign.md: + - Set Status to the /falsify verdict (SURVIVED, CONTESTED, or FALSIFIED) + - Set Round to 1 + - Set Findings to a short summary (e.g., "2H/1S" for 2 hard + 1 soft, or "clean" for SURVIVED) + - Set Date to today + +4. Do NOT fix source code. Do NOT re-run the claim. Do NOT enter a fix-and-reaudit loop. Register the findings, update the tracker, and STOP this iteration. The next loop invocation will pick up the next PENDING claim. + +5. Report what was found and what the verdict was. + +Important: +- Process ONE claim per loop iteration +- NEVER modify source code (mapping.py, unfao.py, __init__.py, conftest.py, test_mapping.py, test_validation.py) +- Always present probes and wait for user confirmation before executing +- Always register risks after the falsification +- Always update the progress tracker +- The claim text is inside the ``` code block under each claim heading in falsification_campaign.md +- "Fix" means: register finding + write test stub + update docs to match reality. NOT change source code. +``` diff --git a/reports/technical_risk_register.md b/reports/technical_risk_register.md index 8151d35..eb1458a 100644 --- a/reports/technical_risk_register.md +++ b/reports/technical_risk_register.md @@ -30,11 +30,12 @@ **Highest tier:** 1 (C-16) **Fix strategy:** Extract `CacheStrategy` interface with disk/memory implementations. Thread-lock in memory impl. Shapefile hash in disk cache keys. **Resolution scope:** Full +**⚠ CONTINGENT ON ADR-011:** If precomputed lookup table replaces mapping.py, this entire cluster is eliminated. Defer until ADR-011 is executed. ### Cluster B: Silent error hiding architecture **Root cause:** The codebase suppresses problem signals at three levels — global warning filter, DEBUG-level exception logging with `continue`, and raises without preceding logs. The impact propagates through a delivery chain with no correction mechanism. -**Entries:** C-12, C-18, C-19, C-20, C-21, C-22, D-03, D-04 -**Highest tier:** 1 (C-12, C-20) +**Entries:** C-12, C-19, C-20, C-21, C-22, D-03 (resolved), D-04 (resolved), C-18 (resolved) +**Highest tier:** 2 (C-12, C-21) **Fix strategy (5/9 done):** ✅ Replace global warning suppression with targeted filter. ◻ Promote geometry errors from DEBUG to WARNING. ✅ Add `make_valid()` preprocessing. ◻ Narrow exception scope. ✅ Zero-area guard clause (all 7 sites). ◻ `logger.error` before all raises (3 of ~23 done). ✅ Surface batch failures to caller (both methods). ◻ Enrichment provenance in upload (timestamp added, no shapefile version). ◻ Post-delivery correction procedure (C-22). **Resolution scope:** Full (code mechanisms) + Partial (operational impact — C-22 requires process documentation). **Note:** If D-05 resolves toward mapper elimination, remaining code fixes become moot. @@ -44,6 +45,7 @@ **Highest tier:** 2 (C-02) **Fix strategy:** Lazy initialization or removal of module-level call. Add `mapper` constructor parameter to manager. **Resolution scope:** Full +**⚠ CONTINGENT ON ADR-011:** If precomputed lookup table replaces mapping.py, this entire cluster is eliminated. Defer until ADR-011 is executed. ### Cluster D: Mapper-manager boundary contract **Root cause:** No explicit contract declares what columns the mapper produces and the manager consumes. @@ -51,6 +53,14 @@ **Highest tier:** 2 (C-04) **Fix strategy:** Define `ENRICHMENT_SCHEMA` constant. Harmonize forward/reverse thresholds. Add end-to-end integration test. **Resolution scope:** Full +**⚠ CONTINGENT ON ADR-011:** If precomputed lookup table replaces mapping.py, the boundary simplifies to a Parquet schema. C-17 is eliminated; C-04 is eliminated (no runtime forward/reverse divergence). + +### Cluster F: CIC-code drift (documentation describes aspirational, not actual behavior) +**Root cause:** CICs were written as design contracts and never validated against the code. Multiple guarantees are false. +**Entries:** Campaign findings 1.1, 1.2 — affecting CIC PriogridCountryMapper §3/§5/§6 and CIC UNFAOPostProcessorManager §3/§6 +**Highest tier:** Not a code risk — documentation accuracy risk +**Fix strategy:** Update CICs to describe actual code behavior. Specifically: (1) cache guarantee needs C-05 caveat, (2) return types need full key listing, (3) §6 log level should say DEBUG not WARNING, (4) ADR-008 compliance claim needs qualifying, (5) env var boundary validation claim needs qualifying. Pure documentation, no code changes. +**Resolution scope:** Full ### Cluster E: Replace runtime mapper with precomputed lookup table **Root cause:** The area-majority algorithm is a confirmed FAO requirement (D-05 resolved), but it doesn't need 3,100 lines of geopandas runtime code — a one-time precomputation produces a ~65K-row Parquet lookup table that replaces the entire mapper with a dictionary join. @@ -222,7 +232,7 @@ See also C-06 (code duplication within the same class amplifies the SRP violatio | Field | Value | |-------|-------| | ID | C-12 | -| Tier | 1 | +| Tier | 2 | | Source | `expert-review` (2026-06-02) | | Trigger | When Natural Earth or GAUL shapefiles contain an invalid polygon for a country that is the correct assignment for a PRIO-GRID cell, verify that the geometry error is surfaced — currently the cell is silently assigned to the next-best country | | Location | `views_postprocessing/unfao/mapping/mapping.py:658-660` (also duplicated at ~line 1192, 1428 in admin1/admin2 branches) | @@ -235,7 +245,9 @@ The mapper's own `_validate_naturalearth_data` (line 473-477) detects invalid ge Critically, the manager's `_validate()` method — the only safety gate before upload — is structurally incapable of catching Cluster B's primary failure mode. `_validate()` checks column presence and null counts. A misassigned cell has a valid ISO code (just the wrong one), non-null values, and all required columns present. Validation passes. The safety net has a hole shaped exactly like the failure mode it should catch: wrong-but-valid geographic assignments are invisible to every automated check in the pipeline. -See also C-08 (a different mechanism for wrong-country at high latitudes), D-05 (if mapper eliminated, this concern is moot). +Tier recalibrated from 1 to 2 during post-campaign review-rr (2026-06-02): `make_valid()` now applied at load time mitigates the root cause. With valid geometries, intersection failures are limited to precision edge cases. The 7 DEBUG handlers remain but are defense-in-depth, not the primary failure path. Campaign Claim 2.1-2.4 confirmed correct algorithm behavior. + +See also C-08 (a different mechanism for wrong-country at high latitudes), D-05 (if mapper eliminated, this concern is moot). Contingent on ADR-011. --- @@ -432,7 +444,7 @@ See also C-17 (implicit column naming between mapper and manager — a different | ID | D-01 | | Source | `expert-review` (2026-06-02) | | Perspectives | Martin/GoF (extract Strategy pattern now), Feathers/Beck (characterize disk-cache branch with tests first) | -| Resolution | Unresolved — resolve before starting Cluster A cache strategy extraction. **Contingent on D-05:** if mapper eliminated, this disagreement is moot. | +| Resolution | **⚠ CONTINGENT ON ADR-011.** If precomputed lookup table replaces mapping.py, no cache to refactor — this disagreement is moot. Only resolve if mapper is kept. | --- @@ -443,7 +455,7 @@ See also C-17 (implicit column naming between mapper and manager — a different | ID | D-02 | | Source | `expert-review` (2026-06-02) | | Perspectives | Nygard (threading.Lock), Beck (remove ThreadPoolExecutor), Hickey (sequential is simpler), Ousterhout (keep threading for speed) | -| Resolution | Unresolved — resolve before addressing C-16. **Contingent on D-05:** if mapper eliminated, this disagreement is moot. | +| Resolution | **⚠ CONTINGENT ON ADR-011.** If precomputed lookup table replaces mapping.py, no threading — this disagreement is moot. Only resolve if mapper is kept. | --- diff --git a/tests/test_falsification_campaign_1_1.py b/tests/test_falsification_campaign_1_1.py new file mode 100644 index 0000000..95a396a --- /dev/null +++ b/tests/test_falsification_campaign_1_1.py @@ -0,0 +1,65 @@ +""" +Falsification test stubs for Campaign Claim 1.1: +"The PriogridCountryMapper CIC §3-6 accurately describes actual behavior" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document CIC-code discrepancies. +Source code is NOT modified. CIC should be updated to match reality. +""" + + +def test_falsify_1_1_01_cache_mode_guarantee_contradicted_by_c05(): + """ + Probe 1 (Category B): CIC §3 guarantees cache mode doesn't affect results. + C-05 documents that batch_country_mapping, batch_admin_mapping, + find_multiple_countries_by_iso_a3, clear_cache, and get_cache_stats + all crash with AttributeError in disk-cache mode. + + A crash IS a different result than a successful return. + The CIC guarantee is unconditional but false for these methods. + + Severity: Hard falsification + CIC should: add caveat listing the C-05 broken methods. + """ + assert False, "CIC §3 cache guarantee is unconditional but C-05 methods crash in disk mode" + + +def test_falsify_1_1_02_find_country_for_gid_returns_170_keys_not_4(): + """ + Probe 2 (Category B): CIC §5 says find_country_for_gid returns dict with + iso_a3, country_name, overlap_ratio, method. Actual code (lines 688-691) + dumps ALL Natural Earth columns (~170 keys) into the result dict. + + The documented return type is dramatically narrower than reality. + + Severity: Hard falsification + CIC should: document that the return dict includes all shapefile columns, + or the code should be changed to return only the documented keys. + """ + assert False, "find_country_for_gid returns ~170 keys, CIC documents 4" + + +def test_falsify_1_1_03_cic_says_warning_code_uses_debug(): + """ + Probe 3 (Category B): CIC §6 says intersection failures are + "logged as WARNING." Actual code at 7 sites uses logger.debug. + DEBUG is invisible in production. + + Severity: Hard falsification + CIC should: either say DEBUG (matching code) or code should be + changed to WARNING (matching CIC). Currently they disagree. + """ + assert False, "CIC §6 says WARNING for intersection failures; code uses DEBUG at 7 sites" + + +def test_falsify_1_1_06_find_admin1_returns_undocumented_keys(): + """ + Probe 6 (Category B): CIC §5 says find_admin1_for_gid returns dict with + gaul1_code, gaul1_name, iso3_code, method. Actual code also returns + gaul0_code, gaul0_name, continent, disp_en when available. + + Severity: Soft falsification + CIC should: list all possible return keys, or note "additional GAUL + fields may be present." + """ + assert False, "find_admin1_for_gid returns 4 undocumented extra keys (gaul0_code, gaul0_name, continent, disp_en)" diff --git a/tests/test_falsification_campaign_1_2.py b/tests/test_falsification_campaign_1_2.py new file mode 100644 index 0000000..1596952 --- /dev/null +++ b/tests/test_falsification_campaign_1_2.py @@ -0,0 +1,30 @@ +""" +Falsification test stubs for Campaign Claim 1.2: +"The UNFAOPostProcessorManager CIC §3-6 accurately describes actual behavior" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document CIC-code discrepancies. +""" + + +def test_falsify_1_2_01_adr008_guarantee_not_met(): + """ + CIC §3 says "all structural failures are logged and raised (ADR-008)." + Lines 65, 75, 225 raise ValueError without preceding logger.error. + 3 of 7 raises lack logs. + + Severity: Hard falsification + """ + assert False, "CIC §3 ADR-008 guarantee: 3 of 7 raises in unfao.py lack preceding logger.error" + + +def test_falsify_1_2_02_env_vars_not_validated_at_boundary(): + """ + CIC §6 says "Missing or None environment variables for Appwrite must + never fail silently." But os.getenv() returns None for missing vars + and the None is passed directly to AppwriteConfig without any check. + Failure happens downstream in Appwrite, not at the boundary. + + Severity: Hard falsification + """ + assert False, "18 os.getenv() calls pass None to AppwriteConfig without validation — CIC §6 boundary violation" diff --git a/tests/test_falsification_campaign_3_1.py b/tests/test_falsification_campaign_3_1.py new file mode 100644 index 0000000..2f5cc5e --- /dev/null +++ b/tests/test_falsification_campaign_3_1.py @@ -0,0 +1,61 @@ +""" +Falsification test stubs for Campaign Claim 3.1: +"Unit tests cover all CIC guarantees and failure modes" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document test coverage gaps. +""" + + +def test_falsify_3_1_make_valid_preprocessing(): + """ + CIC §3: "all loaded shapefiles have invalid geometries fixed via make_valid()" + No test verifies that make_valid() is called or that invalid geometries + are corrected at load time. + + Severity: Soft falsification + """ + assert False, "No test verifies make_valid() preprocessing (CIC §3 guarantee untested)" + + +def test_falsify_3_1_spatial_index_built(): + """ + CIC §3: "spatial indices are built for all loaded GeoDataFrames" + No test verifies that priogrid_sindex, admin1_sindex, admin2_sindex + are non-None after initialization. + + Severity: Soft falsification + """ + assert False, "No test verifies spatial indices are built (CIC §3 guarantee untested)" + + +def test_falsify_3_1_invalid_geometry_handling(): + """ + CIC §6: "Invalid geometries: Fixed automatically via make_valid() at load time. + If intersection still fails at runtime, logged as WARNING." + No test constructs an invalid geometry and verifies the handling. + + Severity: Soft falsification + """ + assert False, "No test for invalid geometry handling (CIC §6 failure mode untested)" + + +def test_falsify_3_1_zero_area_cell_behavior(): + """ + CIC §6: "Zero-area grid geometry: Returns None with WARNING log" + No test constructs a zero-area cell and verifies None return + WARNING. + + Severity: Soft falsification + """ + assert False, "No test for zero-area cell behavior (CIC §6 failure mode untested)" + + +def test_falsify_3_1_missing_columns_raises(): + """ + CIC §6: "Required columns missing from shapefiles: Raises ValueError" + Only missing-file is tested (test_missing_country_shapefile_raises). + No test for a shapefile that exists but lacks a required column. + + Severity: Soft falsification + """ + assert False, "No test for missing-columns-in-valid-shapefile (CIC §6 failure mode untested)" diff --git a/tests/test_falsification_campaign_3_2.py b/tests/test_falsification_campaign_3_2.py new file mode 100644 index 0000000..973d59f --- /dev/null +++ b/tests/test_falsification_campaign_3_2.py @@ -0,0 +1,33 @@ +""" +Falsification test stubs for Campaign Claim 3.2: +"Validation tests cover all invalid input categories and match real _validate()" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document test coverage gaps. +""" + + +def test_falsify_3_2_wrong_type_columns_not_tested(): + """ + No test passes a DataFrame where a numeric column (e.g., admin1_gaul1_code) + contains strings, or where a string column contains numbers. The validation + only checks presence and nulls, not types — but this is still an untested + category of invalid input. + + Severity: Soft falsification + """ + assert False, "No test for wrong-type column values (e.g., string in numeric column)" + + +def test_falsify_3_2_validation_uses_replica_not_real_code(): + """ + test_validation.py uses a standalone validate_dataframe() function that + replicates the logic from UNFAOPostProcessorManager._validate(). If the + real _validate() changes (e.g., adds a new column to the required list), + the replica won't be updated and tests will pass while production fails. + + This is a structural test-code coupling gap, not a missing test case. + + Severity: Soft falsification + """ + assert False, "Validation tests use replicated logic, not the real _validate() method — divergence risk" diff --git a/tests/test_falsification_campaign_3_3.py b/tests/test_falsification_campaign_3_3.py new file mode 100644 index 0000000..dc5852d --- /dev/null +++ b/tests/test_falsification_campaign_3_3.py @@ -0,0 +1,51 @@ +""" +Falsification test stubs for Campaign Claim 3.3: +"Regression tests exist for every code fix applied this session" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document missing regression tests. +""" + + +def test_falsify_3_3_no_regression_for_warning_suppression_removal(): + """ + C-18 fix: removed warnings.filterwarnings("ignore") from module scope. + No test verifies the global suppression is absent. Reverting to the + old line would not cause any test failure. + + Severity: Hard falsification + """ + assert False, "No regression test for C-18 warning suppression removal" + + +def test_falsify_3_3_no_regression_for_zero_area_guard(): + """ + C-20 fix: added zero-area guards at 7 sites. + No test creates a zero-area cell to verify the guard fires. + Removing all 7 guards would not cause any test failure. + + Severity: Hard falsification + """ + assert False, "No regression test for C-20 zero-area guards" + + +def test_falsify_3_3_no_regression_for_make_valid(): + """ + D-04 fix: added make_valid() to 3 load methods. + No test with invalid geometry to verify make_valid corrects it. + Removing all 3 make_valid calls would not cause any test failure. + + Severity: Hard falsification + """ + assert False, "No regression test for make_valid() preprocessing" + + +def test_falsify_3_3_no_regression_for_batch_failure_tracking(): + """ + C-21 fix: added failed_batches counter and summary log. + No test triggers a batch failure to verify the tracking works. + Removing the tracking would not cause any test failure. + + Severity: Hard falsification + """ + assert False, "No regression test for C-21 batch failure tracking" diff --git a/tests/test_falsification_campaign_3_4.py b/tests/test_falsification_campaign_3_4.py new file mode 100644 index 0000000..65409f3 --- /dev/null +++ b/tests/test_falsification_campaign_3_4.py @@ -0,0 +1,22 @@ +""" +Falsification test stubs for Campaign Claim 3.4: +"An integration test exists for the mapper-to-manager data path" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document the missing integration test. +""" + + +def test_falsify_3_4_no_mapper_to_manager_integration_test(): + """ + No test enriches a DataFrame through the mapper + (enrich_dataframe_with_pg_info) and then checks the output against + the manager's filter_cols list and null validation rules. + + The mapper and manager are tested in isolation. The boundary between + them — where column names must match and nulls must be absent — has + zero test coverage. + + Severity: Hard falsification + """ + assert False, "No integration test for mapper-to-manager data path (enrich → filter_cols → validate)" diff --git a/tests/test_falsification_campaign_3_5.py b/tests/test_falsification_campaign_3_5.py new file mode 100644 index 0000000..5d29166 --- /dev/null +++ b/tests/test_falsification_campaign_3_5.py @@ -0,0 +1,25 @@ +""" +Falsification test stubs for Campaign Claim 3.5: +"An end-to-end pipeline test exists or can be constructed" +Generated by falsification campaign on 2026-06-02 + +READ-ONLY CAMPAIGN: These stubs document the missing E2E test. +""" + + +def test_falsify_3_5_no_end_to_end_pipeline_test(): + """ + No test exercises the full pipeline: _read() → _transform() → _validate() → _save(). + + views-pipeline-core is not installed in the test environment, making it + impossible to instantiate UNFAOPostProcessorManager directly. An E2E test + would require either: (a) installing views-pipeline-core in test env, + (b) mocking the parent classes, or (c) testing the pipeline stages in + sequence using the real code with mocked data sources. + + The pipeline orchestration — the thing that runs in production — has + zero test coverage as an integrated unit. + + Severity: Hard falsification + """ + assert False, "No end-to-end pipeline test (read→transform→validate→save)" diff --git a/tests/test_falsification_campaign_4_1.py b/tests/test_falsification_campaign_4_1.py new file mode 100644 index 0000000..1ebaf0b --- /dev/null +++ b/tests/test_falsification_campaign_4_1.py @@ -0,0 +1,18 @@ +""" +Falsification test stubs for Campaign Claim 4.1: +"CI runs tests on every push and failure blocks merge" +Generated by falsification campaign on 2026-06-02 +""" + + +def test_falsify_4_1_ci_not_confirmed_as_required_check(): + """ + run_pytest.yml runs on push to main/development and the pytest step + is active with set -e. But whether this workflow is a REQUIRED status + check in GitHub branch protection settings cannot be verified from + the repo alone. A failing test would fail the workflow but might not + block a merge if the check isn't marked as required. + + Severity: Soft falsification + """ + assert False, "Cannot verify from repo that run_pytest is a required status check for PR merges" diff --git a/tests/test_falsification_campaign_4_3.py b/tests/test_falsification_campaign_4_3.py new file mode 100644 index 0000000..b8b5075 --- /dev/null +++ b/tests/test_falsification_campaign_4_3.py @@ -0,0 +1,32 @@ +""" +Falsification test stubs for Campaign Claim 4.3: +"Schema change would fail a test" +Generated by falsification campaign on 2026-06-02 +""" + + +def test_falsify_4_3_enrichment_test_checks_only_5_of_9_filter_cols(): + """ + test_enrichment_adds_expected_columns checks 5 columns: + country_iso_a3, admin1_gaul1_code, admin1_gaul1_name, + admin2_gaul2_code, admin2_gaul2_name. + + But filter_cols in unfao.py has 9 columns (also: pg_xcoord, pg_ycoord, + admin1_gaul0_code, admin1_gaul0_name). Removing the missing 4 from + either side would not fail any test. + + Severity: Soft falsification + """ + assert False, "Enrichment schema test checks 5 of 9 filter_cols — 4 columns have no schema drift protection" + + +def test_falsify_4_3_filter_cols_change_not_detected_by_validation_tests(): + """ + The validation tests use a hardcoded REQUIRED_METADATA_COLS list that + replicates unfao.py's _necessary_metadata_cols. If someone adds or + removes a column from unfao.py's list, the test replica is not + automatically updated — the tests continue to pass against the old list. + + Severity: Soft falsification + """ + assert False, "Validation test's column list is a replica — adding/removing a column in unfao.py is not caught" diff --git a/tests/test_integration.py b/tests/test_integration.py new file mode 100644 index 0000000..ff8ac73 --- /dev/null +++ b/tests/test_integration.py @@ -0,0 +1,140 @@ +"""Integration tests for the mapper-to-manager boundary contract. + +Verifies that PriogridCountryMapper.enrich_dataframe_with_pg_info() produces +output compatible with UNFAOPostProcessorManager._append_metadata()'s filter_cols +and _validate()'s null checks. + +Closes: https://github.com/views-platform/views-postprocessing/issues/4 +Campaign: Claim 3.4 (FALSIFIED — no integration test existed) +""" + +import pandas as pd + +FILTER_COLS_METADATA = [ + "pg_xcoord", + "pg_ycoord", + "country_iso_a3", + "admin1_gaul1_code", + "admin1_gaul1_name", + "admin1_gaul0_code", + "admin1_gaul0_name", + "admin2_gaul2_code", + "admin2_gaul2_name", +] + +REQUIRED_METADATA_COLS = FILTER_COLS_METADATA + + +class TestMapperToManagerBoundary: + """Verify the enrichment output matches what the manager consumes.""" + + def test_all_filter_cols_present_in_enrichment(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2, 3, 4], + "month_id": [100, 100, 100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + for col in FILTER_COLS_METADATA: + assert col in enriched.columns, ( + f"filter_cols column '{col}' missing from enrichment output. " + f"Available: {sorted(enriched.columns.tolist())}" + ) + + def test_no_nulls_in_metadata_columns_for_valid_gids(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2, 4], + "month_id": [100, 100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + for col in FILTER_COLS_METADATA: + null_count = enriched[col].isnull().sum() + assert null_count == 0, ( + f"Column '{col}' has {null_count} nulls in enrichment output for valid GIDs" + ) + + def test_country_iso_a3_contains_valid_codes(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2, 3, 4], + "month_id": [100, 100, 100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + valid_codes = {"AAA", "BBB"} + actual_codes = set(enriched["country_iso_a3"].dropna().unique()) + assert actual_codes.issubset(valid_codes), ( + f"Unexpected ISO codes: {actual_codes - valid_codes}" + ) + + def test_admin_codes_are_numeric(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2, 4], + "month_id": [100, 100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + for col in ["admin1_gaul1_code", "admin1_gaul0_code", "admin2_gaul2_code"]: + assert pd.api.types.is_numeric_dtype(enriched[col]), ( + f"Column '{col}' should be numeric, got {enriched[col].dtype}" + ) + + def test_coordinates_are_float(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2], + "month_id": [100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + for col in ["pg_xcoord", "pg_ycoord"]: + assert pd.api.types.is_float_dtype(enriched[col]), ( + f"Column '{col}' should be float, got {enriched[col].dtype}" + ) + + def test_row_count_preserved(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2, 3, 4], + "month_id": [100, 100, 100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + assert len(enriched) == 4 + + +class TestEnrichmentThenValidation: + """Verify that mapper output passes the manager's validation logic.""" + + def test_enriched_output_passes_validation(self, mapper): + df = pd.DataFrame({ + "priogrid_gid": [1, 2, 4], + "month_id": [100, 100, 100], + }) + enriched = mapper.enrich_dataframe_with_pg_info( + df, pg_id_col="priogrid_gid", time_id_col="month_id", + only_metadata=True, batch_size=10, + use_multiprocessing=False, show_progress=False, + ) + for col in REQUIRED_METADATA_COLS: + assert col in enriched.columns, f"Validation would fail: '{col}' missing" + null_count = enriched[col].isnull().sum() + assert null_count == 0, ( + f"Validation would fail: '{col}' has {null_count} nulls" + ) diff --git a/tests/test_mapping.py b/tests/test_mapping.py index 77d078e..ff6d7ec 100644 --- a/tests/test_mapping.py +++ b/tests/test_mapping.py @@ -105,8 +105,12 @@ def test_enrichment_adds_expected_columns(self, mapper): df, pg_id_col="priogrid_id", time_id_col="month_id", batch_size=10, use_multiprocessing=False, show_progress=False ) - expected_cols = ["country_iso_a3", "admin1_gaul1_code", "admin1_gaul1_name", - "admin2_gaul2_code", "admin2_gaul2_name"] + expected_cols = [ + "pg_xcoord", "pg_ycoord", "country_iso_a3", + "admin1_gaul1_code", "admin1_gaul1_name", + "admin1_gaul0_code", "admin1_gaul0_name", + "admin2_gaul2_code", "admin2_gaul2_name", + ] for col in expected_cols: assert col in enriched.columns, f"Missing column: {col}" diff --git a/tests/test_regression.py b/tests/test_regression.py new file mode 100644 index 0000000..05a429e --- /dev/null +++ b/tests/test_regression.py @@ -0,0 +1,179 @@ +"""Regression tests for code fixes applied during the audit session. + +Each test protects a specific fix. Reverting the fix causes the test to fail. + +Closes: https://github.com/views-platform/views-postprocessing/issues/6 +Closes: https://github.com/views-platform/views-postprocessing/issues/7 +Closes: https://github.com/views-platform/views-postprocessing/issues/8 +Campaign: Claims 3.1, 3.3 (test gaps for fixes) +""" + +import logging +import tempfile +from pathlib import Path + +import geopandas as gpd +import pandas as pd +from shapely.geometry import Polygon + + +# --------------------------------------------------------------------------- +# Issue #6: make_valid() preprocessing +# --------------------------------------------------------------------------- + +class TestMakeValidPreprocessing: + """Verify that invalid geometries are corrected at load time.""" + + def test_invalid_country_geometry_fixed_at_load(self): + """A self-intersecting country polygon is fixed by make_valid().""" + import views_postprocessing.unfao.mapping.mapping as mod + + bowtie = Polygon([(0, 0), (1, 1), (1, 0), (0, 1)]) + assert not bowtie.is_valid + + gdf = gpd.GeoDataFrame( + {"ISO_A3": ["TST"], "NAME_EN": ["TestCountry"], "geometry": [bowtie]}, + crs="EPSG:4326", + ) + with tempfile.TemporaryDirectory() as tmpdir: + path = Path(tmpdir) / "countries.shp" + gdf.to_file(path) + + mapper = mod.PriogridCountryMapper.__new__(mod.PriogridCountryMapper) + loaded = mapper._load_and_preprocess_naturalearth(str(path)) + + assert loaded["geometry"].is_valid.all(), "make_valid() should fix invalid geometries" + assert loaded["geometry"].iloc[0].area > 0 + + def test_invalid_admin_geometry_fixed_at_load(self): + """An invalid admin polygon is fixed by make_valid().""" + import views_postprocessing.unfao.mapping.mapping as mod + + bowtie = Polygon([(0, 0), (1, 1), (1, 0), (0, 1)]) + gdf = gpd.GeoDataFrame( + { + "gaul1_code": [1], "gaul1_name": ["Test"], + "gaul0_code": [1], "gaul0_name": ["TestC"], + "iso3_code": ["TST"], "geometry": [bowtie], + }, + crs="EPSG:4326", + ) + with tempfile.TemporaryDirectory() as tmpdir: + path = Path(tmpdir) / "admin.shp" + gdf.to_file(path) + + mapper = mod.PriogridCountryMapper.__new__(mod.PriogridCountryMapper) + loaded = mapper._load_admin_data(str(path), "admin1") + + assert loaded["geometry"].is_valid.all() + + def test_raw_invalid_geometry_is_genuinely_invalid(self): + """Proves the fixture IS invalid before make_valid — the guard is real.""" + bowtie = Polygon([(0, 0), (1, 1), (1, 0), (0, 1)]) + gdf = gpd.GeoDataFrame( + {"ISO_A3": ["TST"], "NAME_EN": ["Test"], "geometry": [bowtie]}, + crs="EPSG:4326", + ) + with tempfile.TemporaryDirectory() as tmpdir: + path = Path(tmpdir) / "raw.shp" + gdf.to_file(path) + raw = gpd.read_file(path) + + assert not raw["geometry"].is_valid.all(), ( + "Test fixture should contain invalid geometry before make_valid" + ) + + +# --------------------------------------------------------------------------- +# Issue #7: Global warning suppression must not return +# --------------------------------------------------------------------------- + +class TestWarningSuppressionRegression: + """Verify global warning suppression is absent.""" + + def test_no_global_warning_suppression(self): + """warnings.filterwarnings('ignore') must not be active globally.""" + import warnings + # Force import of the mapping module (conftest already did this) + import views_postprocessing.unfao.mapping.mapping # noqa: F401 + + global_ignores = [ + f for f in warnings.filters + if f[0] == "ignore" and f[2] is Warning + ] + assert len(global_ignores) == 0, ( + f"Global warning suppression detected: {global_ignores}. " + "Use targeted warnings.catch_warnings() instead." + ) + + def test_non_centroid_warnings_are_visible(self): + """A UserWarning should be visible — global suppression is not active.""" + import warnings + with warnings.catch_warnings(record=True) as w: + warnings.simplefilter("always") + warnings.warn("regression test canary", UserWarning) + assert len(w) == 1, ( + "UserWarning should be visible — global suppression must not be active" + ) + + +# --------------------------------------------------------------------------- +# Issue #8: Zero-area grid geometry guard +# --------------------------------------------------------------------------- + +class TestZeroAreaGuardRegression: + """Verify that degenerate zero-area cells are handled with WARNING, not silently.""" + + def test_zero_area_country_returns_none_with_warning(self, mapper, caplog): + """A zero-area cell should return None with a WARNING log.""" + degenerate = Polygon([(0.25, 0.25), (0.5, 0.25), (0.25, 0.25)]) + assert degenerate.area == 0.0 + + mapper.priogrid_gdf = pd.concat([ + mapper.priogrid_gdf, + gpd.GeoDataFrame( + {"gid": [99], "xcoord": [0.3], "ycoord": [0.25], "geometry": [degenerate]}, + crs="EPSG:4326", + ), + ], ignore_index=True) + mapper.priogrid_gdf["centroid"] = mapper.priogrid_gdf["geometry"].centroid + + with caplog.at_level(logging.WARNING): + result = mapper.find_country_for_gid(99) + + assert result is None, "Zero-area cell should return None" + assert any("Zero-area" in r.message and "99" in r.message for r in caplog.records), ( + "Zero-area cell should produce a WARNING log mentioning the GID" + ) + + def test_zero_area_admin1_returns_none(self, mapper): + """Zero-area cell returns None for admin1 lookup too.""" + degenerate = Polygon([(0.25, 0.25), (0.5, 0.25), (0.25, 0.25)]) + if 99 not in mapper.priogrid_gdf["gid"].values: + mapper.priogrid_gdf = pd.concat([ + mapper.priogrid_gdf, + gpd.GeoDataFrame( + {"gid": [99], "xcoord": [0.3], "ycoord": [0.25], "geometry": [degenerate]}, + crs="EPSG:4326", + ), + ], ignore_index=True) + mapper.priogrid_gdf["centroid"] = mapper.priogrid_gdf["geometry"].centroid + + result = mapper.find_admin1_for_gid(99) + assert result is None + + def test_zero_area_admin2_returns_none(self, mapper): + """Zero-area cell returns None for admin2 lookup too.""" + degenerate = Polygon([(0.25, 0.25), (0.5, 0.25), (0.25, 0.25)]) + if 99 not in mapper.priogrid_gdf["gid"].values: + mapper.priogrid_gdf = pd.concat([ + mapper.priogrid_gdf, + gpd.GeoDataFrame( + {"gid": [99], "xcoord": [0.3], "ycoord": [0.25], "geometry": [degenerate]}, + crs="EPSG:4326", + ), + ], ignore_index=True) + mapper.priogrid_gdf["centroid"] = mapper.priogrid_gdf["geometry"].centroid + + result = mapper.find_admin2_for_gid(99) + assert result is None