Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/index.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Explorer <overview/explorer>
Use VS Code <overview/ui-vscode>
Use GitHub Codespaces <overview/ui-codespaces>
Using QGIS <overview/qgis-plugin>
Reading COGs with async-geotiff <overview/async-geotiff>
Changelog <overview/changelog>
```

Expand Down
164 changes: 164 additions & 0 deletions docs/overview/async-geotiff.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,164 @@
# Reading Planetary Computer COGs with async-geotiff

[async-geotiff](https://github.com/developmentseed/async-geotiff) is a Python Cloud Optimized GeoTIFF reader with no GDAL dependency. The core is Rust, image decoding runs in a thread pool, buffers are zero-copy, and every API is fully type-hinted. Use it when you want async I/O for pixel-level analysis without putting GDAL on the system.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
[async-geotiff](https://github.com/developmentseed/async-geotiff) is a Python Cloud Optimized GeoTIFF reader with no GDAL dependency. The core is Rust, image decoding runs in a thread pool, buffers are zero-copy, and every API is fully type-hinted. Use it when you want async I/O for pixel-level analysis without putting GDAL on the system.
[async-geotiff](https://github.com/developmentseed/async-geotiff) is a Python [Cloud Optimized GeoTIFF](https://cogeo.org/) reader with no GDAL dependency. The core is Rust, image decoding runs in a thread pool, buffers are zero-copy, and every API is fully type-hinted. Use it when you want async I/O for pixel-level analysis without putting GDAL on the system.


A companion notebook walks through every step end-to-end. [Open in Planetary Computer Hub](https://pccompute.westeurope.cloudapp.azure.com/compute/hub/user-redirect/git-pull?repo=https://github.com/microsoft/PlanetaryComputerExamples&urlpath=lab/tree/PlanetaryComputerExamples/quickstarts/async-geotiff.ipynb&branch=main)

## Install async-geotiff

```bash
uv add async-geotiff obstore planetary-computer pystac-client lonboard matplotlib
```

`async-geotiff` is the user-facing library. `async-tiff` is the lower-level Rust core. Use it directly only if you're building library infrastructure on top.
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
`async-geotiff` is the user-facing library. `async-tiff` is the lower-level Rust core. Use it directly only if you're building library infrastructure on top.
`async-geotiff` is the high-level library for reading GeoTIFF and COG files. `async-tiff` is the lower-level Rust core for generically reading TIFF files. It shouldn't be necessary to touch for most users.


## Find a Sentinel-2 scene on the Planetary Computer

```python
import pystac_client
import planetary_computer

catalog = pystac_client.Client.open(
"https://planetarycomputer.microsoft.com/api/stac/v1",
modifier=planetary_computer.sign_inplace,
)
item = next(catalog.search(
collections=["sentinel-2-l2a"],
bbox=[-122.7, 45.5, -122.6, 45.6],
datetime="2024-07-01/2024-08-01",
query={"eo:cloud_cover": {"lt": 10}},
max_items=1,
).items())

asset = item.assets["visual"]
```

`planetary_computer.sign_inplace` signs every asset href as the search returns.

## Build an authenticated obstore store

async-geotiff reads bytes through an [obstore](https://developmentseed.org/obstore/) store. `PlanetaryComputerCredentialProvider` handles SAS token acquisition and refresh. Give it a signed asset and it figures out the account and container and mounts the store to that single blob, so the COG is opened with an empty path below:

```python
from obstore.auth.planetary_computer import PlanetaryComputerCredentialProvider
from obstore.store import AzureStore

provider = PlanetaryComputerCredentialProvider.from_asset(asset)
store = AzureStore(credential_provider=provider)
```

Set your Planetary Computer subscription key via the `PC_SDK_SUBSCRIPTION_KEY` environment variable, or pass `subscription_key=` to the provider.

## Open the COG and inspect metadata

```python
from async_geotiff import GeoTIFF

geotiff = await GeoTIFF.open("", store=store)

print(geotiff.transform) # affine transform
print(geotiff.crs) # PyProj CRS
print(geotiff.nodata)
print(geotiff.overviews) # finest → coarsest
```

The header read is a single range request. This is the same pattern used by [obstore.](./obstore.md)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
The header read is a single range request. This is the same pattern used by [obstore.](./obstore.md)
The header read usually fits in one or two range requests, facilitated by [obstore](./obstore.md).


## Pick an overview

`geotiff.overviews` is ordered finest-to-coarsest. Index `0` is the full-resolution image. A coarser overview is the right choice for previews or zoomed-out work:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, the geotiff itself is the full-resolution image. Index 0 of the overviews is the finest resolution after the full-resolution data.


```python
full_res = geotiff.overviews[0]
coarse = geotiff.overviews[-1]
```

## Read a window

A *window* names a rectangle of pixels in image coordinates. Reading one fetches only the COG tiles that intersect the rectangle:

```python
from async_geotiff import Window

window = Window(col_off=2048, row_off=2048, width=512, height=512)
array = await full_res.read(window=window)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: I'd call this raster_array not array to reflect the class name and to distinguish it from a bare numpy array.

```

The returned `Array` has:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.


- `array.data`: 3D NumPy array, band-first (`(bands, rows, cols)`).
- `array.mask`: boolean mask, `True` where nodata.
- `array.transform`: affine transform for the windowed region.
- `array.as_masked()`: convert to `numpy.ma.MaskedArray`.

The `visual` asset is 3-band RGB, so transpose to band-last before previewing:

```python
import numpy as np
import matplotlib.pyplot as plt

plt.imshow(np.transpose(array.data, (1, 2, 0)))
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case it's clearer, this is shipped as reshape_as_image so that users don't have to remember the band ordering (1, 2, 0) is easy to forget or get wrong IMO

```

```{image} images/async-geotiff-window-matplotlib.png
:height: 360
:name: async-geotiff window preview
:class: no-scaled-link
```

## Visualize the scene with Lonboard

For an interactive map view of the same Sentinel-2 item, stream its COG tiles through the Planetary Computer tiler into a [Lonboard](https://developmentseed.org/lonboard/) `BitmapTileLayer`:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a note, the BitmapTileLayer uses titiler on the backend to send formatted PNG tiles, this doesn't read the COG data directly. That's fine if you'd like to point out how this integrates with the planetary computer titiler server!

In case you want an example of how to read the COG data directly through async-geotiff, you'd want to use RasterLayer.from_geotiff which natively integrates with async-geotiff.


```python
import json
import urllib.request

from lonboard import BitmapTileLayer, Map

tilejson = json.load(urllib.request.urlopen(
"https://planetarycomputer.microsoft.com/api/data/v1/item/tilejson.json"
f"?collection={item.collection_id}&item={item.id}&assets=visual"
))
lon, lat, _ = tilejson["center"]
layer = BitmapTileLayer(
data=tilejson["tiles"][0],
min_zoom=int(tilejson["minzoom"]),
max_zoom=int(tilejson["maxzoom"]),
tile_size=256,
)
Map(layer, view_state={"longitude": lon, "latitude": lat, "zoom": 11})
```

```{image} images/async-geotiff-scene-lonboard.png
:height: 500
:name: async-geotiff Lonboard scene
:class: no-scaled-link
```

## Read in parallel

Each `read()` is independent. Fire many at once with `asyncio.gather`

async-geotiff issues range requests in parallel and decodes them on the Rust thread pool:

```python
import asyncio

windows = [
Window(c, r, 256, 256)
for c in range(0, 2048, 256) for r in range(0, 2048, 256)
]
arrays = await asyncio.gather(
*[full_res.read(window=w) for w in windows]
)
```
Comment on lines +141 to +155
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Similar to #527 (comment), this isn't an ideal example to suggest to people, because we're instructing them to make entirely independent requests for each window of the file.

But this is something that read handles automatically, and it'll be faster because it can minimize the total number of requests that need to be made.

So here, a single

window = Window(0, 0, 2048, 2048)
full_res.read(window=window)

would be a lot better than making many independent window reads


This is the same speedup pattern the [obstore tutorial](./obstore.md) demonstrates at the raw-bytes level, one layer up the stack.

## When to use something else

- For resampling, reprojection, or warping, hand the array to [rasterio](https://rasterio.readthedocs.io/).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
- For resampling, reprojection, or warping, hand the array to [rasterio](https://rasterio.readthedocs.io/).
- For resampling, reprojection, or warping, use [rasterio](https://rasterio.readthedocs.io/), either alone or in combination with async-geotiff.

- For interactive visualization, see [Lonboard](https://developmentseed.org/lonboard/).
- For the raw-bytes layer beneath async-geotiff, see [obstore](https://developmentseed.org/obstore/).
- For library authors building on the Rust core, drop to [async-tiff](https://github.com/developmentseed/async-tiff).
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python library authors who want to build on top of GeoTIFF should still be using async-geotiff, not async-tiff. It's really only people who want generic TIFF support, and don't want to specialize their code to support only GeoTIFF, who should be using async-tiff.

Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
1 change: 1 addition & 0 deletions etl/config/external_docs_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -28,3 +28,4 @@
- file_url: quickstarts/reading-tabular-data.ipynb
- file_url: quickstarts/reading-zarr-data.ipynb
- file_url: quickstarts/storage.ipynb
- file_url: quickstarts/async-geotiff.ipynb