Evolution#16
Merged
Merged
Conversation
Introduces `Derivation`, a Pydantic front-end over the existing `SpatialDerivation` / `Variable` machinery that validates operator names and operator-specific requirements at construction time (fail-fast). Field names follow STAC conventions where natural (`valid_from`, `valid_until`, `bbox`) — naming only, no STAC implementation. - `disscube/operators/base.py`: `OPERATOR_REGISTRY` as single source of truth for valid operator names and their constraints. - `disscube/derivation.py`: `Derivation` model with `to_variable()`, `to_spatial_derivation()`, and `spec_hash()` that delegates to `SpatialDerivation.spec_hash()`. - `disscube/client/cube_client.py`: thin `derive_declarative()` wrapper. - `tests/test_derivation.py`: 13 tests covering validation, roun hash consistency, and bbox exclusion from spec_hash.
…-registration, fix architectural issues identified in audit. All derivation scripts pass. Next: commit the changes.
- README.md — complete rewrite - docs/index.md — updated navigation - docs/architecture/overview.md — complete rewrite (entity model, pipeline, spec_hash, operator plugin system) - docs/architecture/operators.md — new file (operator system, table, compute() contract, how to add new operators) - docs/architecture/pipeline.md — complete rewrite (all 4 stages with current code) - docs/architecture/catalog.md — complete rewrite (schema, hashes, temporal series, evolution) - docs/architecture/tiling.md — complete rewrite (tile concept, tile detection, parallel processing) - docs/guides/grids.md — complete rewrite (snap, hierarchy, SpatialRelation, manual grid) - docs/guides/bdc.md — rewritten (removed stale BDC_SM/BDC_MD/BDC_LG references, updated with current tile workflow) - mkdocs.yml — added Operator System: architecture/operators.md to nav
- Untrack catalog.db and examples/drivers/catalog.db (runtime artifacts;
.gitignore rule already existed — files were previously force-tracked)
- pyproject.toml: declare dissmodel>=0.6.0,<0.7.0; add fiona as [bdc]
optional extra; move s3fs/python-multipart to optional extras; remove
rasterstats and typer which have zero references in library code
- cube_client.py: defer `from dissmodel...RasterBackend` import to inside
to_lucc_data() so CubeClient is usable without dissmodel installed;
fix load() to always return 3D (time,y,x) for temporal variables even
when only one slice exists (was silently returning 2D, causing
to_lucc_data to treat single-year temporal vars as static); replace
print() warning with log.warning(); add CONTRACT notes documenting the
three open temporal-contract decisions
- writer.py: fix valid_from ISO-date silent drop — int("2020-01-01") was
swallowing the year and producing times=[], now extracts year component
from any "YYYY" or "YYYY-MM-DD" string
- bdc_importer.py, grids.py: replace all print() calls with
logging.getLogger(__name__) so output is observable in structured logs
- tests/test_temporal.py: six new regression tests covering the two defects
fixed above (ISO date extraction, load() shape consistency) plus temporal
sort order and the period-filter logging path
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
test_bdc_master_grids.py: - Fix setUp to use tempfile.mkdtemp() instead of relative paths, eliminating the fragile in-tree db creation that caused intermittent failures - Update test_importer_creates_3_grids (split into two): now asserts BR/5km and BR/1km grids (what register_simulation_grids() actually creates) and 6 BDC tile sources (2 tiles × 3 levels), matching current code - Keep test_derive_uses_tile_id as-is (it passes and covers real behavior) - Split test_load_with_tile_id: keep the passing half (explicit tile_id filtering works) as test_load_with_explicit_tile_id_filters_correctly; mark the unimplemented disambiguation half as @unittest.expectedFailure with a clear docstring documenting the planned feature docs/architecture/tiling.md: - Fix mermaid diagram: BDC/100m (non-existent) → BR/5km (actual grid) - Add clarifying note distinguishing BDC tile sources from simulation grids - Add limitation callout on load() multi-tile ambiguity README.md: - Add "Limitações conhecidas" section documenting six intentional scope boundaries: in-memory single-tile design, rasterized vector aggregation, load() tile disambiguation, SpatialRelation not used in pipeline, purity_threshold reserved, no STAC integration Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Single acre_5km.toml with no references from any Python code, test, example, or doc. Directory name collides with the sibling org repo, creating contributor confusion about where grid configs live. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Item 9 — roundtrip integration tests (tests/test_roundtrip.py): Exercises the full Normalizer → GridAligner → Aggregator → VariableWriter chain via CubeClient.derive() and reloads via CubeClient.load(). Verifies pixel values, coverage_purity and dominance_purity coordinates survive Zarr serialisation, content_hash is recorded, and the second derive() hits cache. Item 10 — GitHub Actions CI (.github/workflows/ci.yml): Single job, Python 3.12, pip install -e ".[dev,bdc]", pytest -q. Triggers on push to main/evolution and all pull requests. Item 11 — SpatialRelation excluded from spec_hash (correctness fix): No pipeline stage reads SpatialRelation during computation. Including relations in the hash made the cache key sensitive to documentation-level metadata, meaning the same source + grid + operator + time window could map to different hashes depending on which relations were registered — breaking the reproducibility guarantee. Relations are now explicitly excluded. The existing test that asserted the old (incorrect) behaviour is updated; new tests in test_spec_hash.py guard the corrected contract. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
scripts/ misturava três naturezas distintas: tutorial narrativo, setup
operacional e utilitário de exportação. A pasta foi dissolvida:
- scripts/05_ilha_maranhao_mapbiomas.py
→ examples/case_studies/ilha_maranhao/01_derive.py
Movido para junto dos outros case studies (brmangue, lucc_acre).
Adicionado __main__ guard; código de exemplo reorganizado em main().
- scripts/import_bdc_tiles.py
→ examples/setup/03_import_bdc_tiles.py
Já era referenciado como passo 3 do setup no examples/README.md;
agora reside onde o README o indicava.
Adicionado __main__ guard.
- scripts/zarr_to_tif.py
→ tools/zarr_to_tif.py
Utilitário de exportação, não é um exemplo de uso da biblioteca.
- scripts/list_grids.py, scripts/list_sources.py
Deletados: conteúdo trivial (< 10 linhas), paths hard-coded
sem utilidade fora de um diretório de projeto específico.
examples/README.md atualizado: referência a scripts/import_bdc_tiles.py
corrigida para examples/setup/03_import_bdc_tiles.py; nova entrada
para o case study Ilha do Maranhão; menção a tools/zarr_to_tif.py.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
- examples/case_studies/brmangue/ + examples/case_studies/ilha_maranhao/
→ examples/case_studies/maranhao/
Ambos os estudos usam a mesma grade (ilha_maranhao/100m). Consolidados
em uma única pasta com numeração contínua:
01_mapbiomas_temporal.py (era ilha_maranhao/01_derive.py)
02_brmangue_derive.py (era brmangue/01_derive.py)
03_brmangue_simulate.py (era brmangue/02_simulate.py)
- examples/setup/03_import_bdc_tiles.py → tools/import_bdc_tiles.py
Operação de infraestrutura one-time, não narrativa de exemplo.
examples/setup/ fica só com os dois scripts de bootstrap do tutorial.
- examples/README.md atualizado: nova estrutura, tabela de utilitários.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
catalog.md / overview.md — spec_hash não inclui SpatialRelation:
Relações foram explicitamente excluídas do hash no item 11 (nenhum
estágio do pipeline as usa durante a computação). As duas páginas
ainda listavam relações como input do hash. Corrigido: removido do
JSON de exemplo e das listas; adicionada explicação do motivo.
operators.md — tabela e nota de rodapé incorretas:
majority, minority e percentage têm needs_fine_alignment=True e
_resampling=Resampling.nearest no código; a doc dizia 'mode'.
A nota de rodapé afirmava que std/percentage usam dado pré-reamostrado
— o oposto do que o código faz. Tabela reestruturada em dois grupos
(reamostragem direta vs. alinhamento fino) com comportamento correto.
bdc.md — três bugs:
1. Parâmetros sm_shp/md_shp/lg_shp não existem: são sm_path/md_path/lg_path.
2. Filtro de tiles usava prefixo "BR/5km_"; tiles são registrados como
BDC_SM_<tile> (ex: BDC_SM_009002).
3. Comentário "mosaico automático" contradiz a limitação conhecida de
load() sem tile_id. Substituído por aviso explícito.
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
… vars rioxarray.write_crs() sets grid_mapping="spatial_ref" only on data variables already present in the Dataset at call time. Calling it before the loop meant zero variables received the attribute, so QGIS and other CF-compliant readers could not auto-detect the projection. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…arr warnings - load() now skips DerivedVariable entries whose Zarr file no longer exists on disk (catalog accumulates orphans when files are deleted between runs). Skipped entries are logged at DEBUG level. - CubeClient.purge_stale() removes all such orphan entries from the catalog; CatalogStore protocol and both implementations get delete_derived(). - to_zarr(..., consolidated=False) and open_zarr(..., consolidated=False) suppress the ZarrUserWarning about consolidated metadata not being part of the Zarr v3 spec. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
No description provided.