Skip to content

Evolution#16

Merged
profsergiocosta merged 13 commits into
devfrom
evolution
Jun 15, 2026
Merged

Evolution#16
profsergiocosta merged 13 commits into
devfrom
evolution

Conversation

@profsergiocosta

Copy link
Copy Markdown
Contributor

No description provided.

profsergiocosta and others added 13 commits June 14, 2026 11:37
Introduces `Derivation`, a Pydantic front-end over the existing
`SpatialDerivation` / `Variable` machinery that validates operator names
and operator-specific requirements at construction time (fail-fast).

Field names follow STAC conventions where natural (`valid_from`,
`valid_until`, `bbox`) — naming only, no STAC implementation.

- `disscube/operators/base.py`: `OPERATOR_REGISTRY` as single source of
  truth for valid operator names and their constraints.
- `disscube/derivation.py`: `Derivation` model with `to_variable()`,
  `to_spatial_derivation()`, and `spec_hash()` that delegates to
  `SpatialDerivation.spec_hash()`.
- `disscube/client/cube_client.py`: thin `derive_declarative()` wrapper.
- `tests/test_derivation.py`: 13 tests covering validation, roun
  hash consistency, and bbox exclusion from spec_hash.
…-registration, fix architectural issues identified in audit. All derivation scripts pass. Next: commit the changes.
- README.md — complete rewrite
- docs/index.md — updated navigation
- docs/architecture/overview.md — complete rewrite (entity model, pipeline, spec_hash, operator plugin system)
- docs/architecture/operators.md — new file (operator system, table, compute() contract, how to add new operators)
- docs/architecture/pipeline.md — complete rewrite (all 4 stages with current code)
- docs/architecture/catalog.md — complete rewrite (schema, hashes, temporal series, evolution)
- docs/architecture/tiling.md — complete rewrite (tile concept, tile detection, parallel processing)
- docs/guides/grids.md — complete rewrite (snap, hierarchy, SpatialRelation, manual grid)
- docs/guides/bdc.md — rewritten (removed stale BDC_SM/BDC_MD/BDC_LG references, updated with current tile workflow)
- mkdocs.yml — added Operator System: architecture/operators.md to nav
- Untrack catalog.db and examples/drivers/catalog.db (runtime artifacts;
  .gitignore rule already existed — files were previously force-tracked)
- pyproject.toml: declare dissmodel>=0.6.0,<0.7.0; add fiona as [bdc]
  optional extra; move s3fs/python-multipart to optional extras; remove
  rasterstats and typer which have zero references in library code
- cube_client.py: defer `from dissmodel...RasterBackend` import to inside
  to_lucc_data() so CubeClient is usable without dissmodel installed;
  fix load() to always return 3D (time,y,x) for temporal variables even
  when only one slice exists (was silently returning 2D, causing
  to_lucc_data to treat single-year temporal vars as static); replace
  print() warning with log.warning(); add CONTRACT notes documenting the
  three open temporal-contract decisions
- writer.py: fix valid_from ISO-date silent drop — int("2020-01-01") was
  swallowing the year and producing times=[], now extracts year component
  from any "YYYY" or "YYYY-MM-DD" string
- bdc_importer.py, grids.py: replace all print() calls with
  logging.getLogger(__name__) so output is observable in structured logs
- tests/test_temporal.py: six new regression tests covering the two defects
  fixed above (ISO date extraction, load() shape consistency) plus temporal
  sort order and the period-filter logging path

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_bdc_master_grids.py:
- Fix setUp to use tempfile.mkdtemp() instead of relative paths, eliminating
  the fragile in-tree db creation that caused intermittent failures
- Update test_importer_creates_3_grids (split into two): now asserts BR/5km
  and BR/1km grids (what register_simulation_grids() actually creates) and
  6 BDC tile sources (2 tiles × 3 levels), matching current code
- Keep test_derive_uses_tile_id as-is (it passes and covers real behavior)
- Split test_load_with_tile_id: keep the passing half (explicit tile_id
  filtering works) as test_load_with_explicit_tile_id_filters_correctly;
  mark the unimplemented disambiguation half as @unittest.expectedFailure
  with a clear docstring documenting the planned feature

docs/architecture/tiling.md:
- Fix mermaid diagram: BDC/100m (non-existent) → BR/5km (actual grid)
- Add clarifying note distinguishing BDC tile sources from simulation grids
- Add limitation callout on load() multi-tile ambiguity

README.md:
- Add "Limitações conhecidas" section documenting six intentional scope
  boundaries: in-memory single-tile design, rasterized vector aggregation,
  load() tile disambiguation, SpatialRelation not used in pipeline,
  purity_threshold reserved, no STAC integration

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Single acre_5km.toml with no references from any Python code, test,
example, or doc. Directory name collides with the sibling org repo,
creating contributor confusion about where grid configs live.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Item 9 — roundtrip integration tests (tests/test_roundtrip.py):
Exercises the full Normalizer → GridAligner → Aggregator → VariableWriter
chain via CubeClient.derive() and reloads via CubeClient.load(). Verifies
pixel values, coverage_purity and dominance_purity coordinates survive Zarr
serialisation, content_hash is recorded, and the second derive() hits cache.

Item 10 — GitHub Actions CI (.github/workflows/ci.yml):
Single job, Python 3.12, pip install -e ".[dev,bdc]", pytest -q.
Triggers on push to main/evolution and all pull requests.

Item 11 — SpatialRelation excluded from spec_hash (correctness fix):
No pipeline stage reads SpatialRelation during computation. Including
relations in the hash made the cache key sensitive to documentation-level
metadata, meaning the same source + grid + operator + time window could
map to different hashes depending on which relations were registered —
breaking the reproducibility guarantee. Relations are now explicitly
excluded. The existing test that asserted the old (incorrect) behaviour
is updated; new tests in test_spec_hash.py guard the corrected contract.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scripts/ misturava três naturezas distintas: tutorial narrativo, setup
operacional e utilitário de exportação. A pasta foi dissolvida:

- scripts/05_ilha_maranhao_mapbiomas.py
    → examples/case_studies/ilha_maranhao/01_derive.py
  Movido para junto dos outros case studies (brmangue, lucc_acre).
  Adicionado __main__ guard; código de exemplo reorganizado em main().

- scripts/import_bdc_tiles.py
    → examples/setup/03_import_bdc_tiles.py
  Já era referenciado como passo 3 do setup no examples/README.md;
  agora reside onde o README o indicava.
  Adicionado __main__ guard.

- scripts/zarr_to_tif.py
    → tools/zarr_to_tif.py
  Utilitário de exportação, não é um exemplo de uso da biblioteca.

- scripts/list_grids.py, scripts/list_sources.py
  Deletados: conteúdo trivial (< 10 linhas), paths hard-coded
  sem utilidade fora de um diretório de projeto específico.

examples/README.md atualizado: referência a scripts/import_bdc_tiles.py
corrigida para examples/setup/03_import_bdc_tiles.py; nova entrada
para o case study Ilha do Maranhão; menção a tools/zarr_to_tif.py.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- examples/case_studies/brmangue/ + examples/case_studies/ilha_maranhao/
    → examples/case_studies/maranhao/
  Ambos os estudos usam a mesma grade (ilha_maranhao/100m). Consolidados
  em uma única pasta com numeração contínua:
    01_mapbiomas_temporal.py  (era ilha_maranhao/01_derive.py)
    02_brmangue_derive.py     (era brmangue/01_derive.py)
    03_brmangue_simulate.py   (era brmangue/02_simulate.py)

- examples/setup/03_import_bdc_tiles.py → tools/import_bdc_tiles.py
  Operação de infraestrutura one-time, não narrativa de exemplo.
  examples/setup/ fica só com os dois scripts de bootstrap do tutorial.

- examples/README.md atualizado: nova estrutura, tabela de utilitários.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
catalog.md / overview.md — spec_hash não inclui SpatialRelation:
  Relações foram explicitamente excluídas do hash no item 11 (nenhum
  estágio do pipeline as usa durante a computação). As duas páginas
  ainda listavam relações como input do hash. Corrigido: removido do
  JSON de exemplo e das listas; adicionada explicação do motivo.

operators.md — tabela e nota de rodapé incorretas:
  majority, minority e percentage têm needs_fine_alignment=True e
  _resampling=Resampling.nearest no código; a doc dizia 'mode'.
  A nota de rodapé afirmava que std/percentage usam dado pré-reamostrado
  — o oposto do que o código faz. Tabela reestruturada em dois grupos
  (reamostragem direta vs. alinhamento fino) com comportamento correto.

bdc.md — três bugs:
  1. Parâmetros sm_shp/md_shp/lg_shp não existem: são sm_path/md_path/lg_path.
  2. Filtro de tiles usava prefixo "BR/5km_"; tiles são registrados como
     BDC_SM_<tile> (ex: BDC_SM_009002).
  3. Comentário "mosaico automático" contradiz a limitação conhecida de
     load() sem tile_id. Substituído por aviso explícito.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… vars

rioxarray.write_crs() sets grid_mapping="spatial_ref" only on data
variables already present in the Dataset at call time. Calling it before
the loop meant zero variables received the attribute, so QGIS and other
CF-compliant readers could not auto-detect the projection.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arr warnings

- load() now skips DerivedVariable entries whose Zarr file no longer
  exists on disk (catalog accumulates orphans when files are deleted
  between runs). Skipped entries are logged at DEBUG level.
- CubeClient.purge_stale() removes all such orphan entries from the
  catalog; CatalogStore protocol and both implementations get
  delete_derived().
- to_zarr(..., consolidated=False) and open_zarr(..., consolidated=False)
  suppress the ZarrUserWarning about consolidated metadata not being
  part of the Zarr v3 spec.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
@profsergiocosta profsergiocosta merged commit d62edb4 into dev Jun 15, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant