diff --git a/docs/notebooks/Gallery.ipynb b/docs/notebooks/Gallery.ipynb
index c5e74e0b..60d2a46e 100644
--- a/docs/notebooks/Gallery.ipynb
+++ b/docs/notebooks/Gallery.ipynb
@@ -133,7 +133,7 @@
"| Transform-on-load | How to transform and adjust data at load-time | [Using Built-In Operations](./data/UsingTheInBuiltOperations.ipynb) | 18 Aug 2025 |\n",
"| Applying data transforms | -- | [Applying Data Transforms](./data/Transforms.ipynb) | 18 Aug 2025 |\n",
"| Geospatial subsetting | -- | [Region Cutting](./data/RegionCutting.ipynb) | 18 Aug 2025 |\n",
- "| Remote Zarr Loading | Demontsrates loading data in a remote zarr archive | [Remote Zarr Loading](./data/Zarr_ |2 June 2026 |\n"
+ "| Remote Zarr Loading | Demontsrates loading data in a remote zarr archive | [Remote Zarr Loading](./data/Remote_Zarr_Loading.ipynb) |2 June 2026 |\n"
]
},
{
diff --git a/docs/notebooks/data/Remote_Zarr_Loading.ipynb b/docs/notebooks/data/Remote_Zarr_Loading.ipynb
index 361391c9..6d20de89 100644
--- a/docs/notebooks/data/Remote_Zarr_Loading.ipynb
+++ b/docs/notebooks/data/Remote_Zarr_Loading.ipynb
@@ -1,843 +1,903 @@
{
- "cells": [
- {
- "cell_type": "markdown",
- "id": "98164f08-4b2f-4600-b7d5-b4cc4284aac6",
- "metadata": {},
- "source": [
- "# Loading a Remote Zarr file - ERA5 ARCO example\n",
- "\n",
- "[Zarr](https://zarr.readthedocs.io/) is an increasingly popular format for storing gridded environmental science data, especially when working on cloud environments. The nature of a zarr, with consolidated metadat which enables interaction with a dataset as a unified whole, data divided into separate files called *chunks* to enable parallel read and write, which suits a massively parallel distributed file system like the object stores used by cloud providers, makes it efficient and intuive for many machine-learning platforms and workflows. \n",
- "\n",
- "PyEarthTools provides a data accessor class for zarr files. As PyEarthTools uses xarray for data I/O, which through fsspec supports loading data directly from remote sources, it can load data from such sources.\n"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "f644ae8b-32dd-49c8-9c14-0a32c0840ac6",
- "metadata": {},
- "source": [
- "### Set up imports for demo"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 1,
- "id": "c3f48f0f-6e4d-40d3-a88a-625f0d5a5b7d",
- "metadata": {},
- "outputs": [],
- "source": [
- "import xarray"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 2,
- "id": "4e140303-a282-41e8-8424-9547f9159c26",
- "metadata": {},
- "outputs": [],
- "source": [
- "import matplotlib.pyplot\n",
- "import cartopy"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 3,
- "id": "4e4d3dc7-e81a-40c3-b604-db49bf43847e",
- "metadata": {},
- "outputs": [],
- "source": [
- "import pyearthtools.data"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 4,
- "id": "ab4e3a1d-770b-4cd1-a57d-c678d1af20d6",
- "metadata": {},
- "outputs": [],
- "source": [
- "import pyearthtools.pipeline"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "51c6e404-a86f-4448-b9d2-37bb2b60f6cb",
- "metadata": {},
- "source": [
- "## Set up the accessor\n",
- "\n",
- "Fore this demo we will be using the copy of the [ECMWF ERA5 Reanalsyis dataset](https://cds.climate.copernicus.eu/datasets/reanalysis-era5-pressure-levels?tab=overview) hosted by [Google Research](https://github.com/google-research/arco-era5) on the Google Cloud Storage system."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 5,
- "id": "5ded0ad5-7561-4755-82de-deee4f4a5417",
- "metadata": {},
- "outputs": [],
- "source": [
- "remote_url = 'gs://gcp-public-data-arco-era5/ar/full_37-1h-0p25deg-chunk-1.zarr-v3'"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 6,
- "id": "2190d29c-899e-4f0d-8184-7db042c07d4a",
- "metadata": {},
- "outputs": [],
- "source": [
- "zarr_opts = {'chunks': None, 'storage_options': {'token': 'anon'}}"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "15ccfd12-75e7-404e-a623-065a8fc389f9",
- "metadata": {},
- "source": [
- "Usually we are only interested in a subset of the data, so we can select a particular time, pressure levels and set of variables that we want to us"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 7,
- "id": "1760e80f-c43c-4c97-9242-69c930fed623",
- "metadata": {},
- "outputs": [],
- "source": [
- "select_variables = ['temperature', 'specific_humidity']\n",
- "select_pressure_levels = [200, 500, 700, 850, 1000]\n",
- "select_time = ('2024-01-01','2025-01-01')\n",
- "select_region = {'latitude': (-40,-5),\n",
- " 'longitude': (110,155)\n",
- " }"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 8,
- "id": "394c417d-7a85-4a8f-917c-5c32f6398437",
- "metadata": {},
- "outputs": [],
- "source": [
- "select_dict = {\n",
- " 'level': select_pressure_levels, \n",
- " 'time': slice(*select_time),\n",
- "}"
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 9,
- "id": "bac39d42-cc5a-44ca-b0a1-908f61df54a4",
- "metadata": {},
- "outputs": [],
- "source": [
- "era5_accessor = pyearthtools.data.archive.ZarrTimeIndex(\n",
- " remote_url,\n",
- " variables=select_variables,\n",
- " open_kwargs=zarr_opts,\n",
- " remote=True,\n",
- " transforms=pyearthtools.data.transforms.TransformCollection([\n",
- " pyearthtools.data.transform.coordinates.Select(select_dict),\n",
- " pyearthtools.data.transform.region.Bounding(select_region['latitude'][0],select_region['latitude'][1], select_region['longitude'][0], select_region['longitude'][1])\n",
- " ]),\n",
- " )"
- ]
- },
- {
- "cell_type": "markdown",
- "id": "74a39b7a-0a19-4db5-8363-5055c735076f",
- "metadata": {},
- "source": [
- "## Using the data of interest\n",
- "We can now look at the xarray dataset through the time index function. If we do this, the xarray lazy loading paradigm means only the metadata will have been downloaded form the remote location, until we try to do some thing which requires the data. For example after exmaing the metadata, we will then plot some of the data which will take longer as the data will be transfered from the remote location. This will be quite efficient as only the chunks pertaining to the data selected (variables, time period, spatial region) will be transferred."
- ]
- },
- {
- "cell_type": "code",
- "execution_count": 10,
- "id": "cdfe8f43-d089-4724-aae0-d1dec734c5d2",
- "metadata": {},
- "outputs": [
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": [
+ ""
+ ]
+ },
{
- "data": {
- "text/html": [
- "
<xarray.Dataset> Size: 1MB\n", - "Dimensions: (time: 1, level: 5, latitude: 141, longitude: 181)\n", - "Coordinates:\n", - " * time (time) datetime64[ns] 8B 2024-01-23T17:00:00\n", - " * level (level) int64 40B 200 500 700 850 1000\n", - " * latitude (latitude) float32 564B -5.0 -5.25 -5.5 ... -39.75 -40.0\n", - " * longitude (longitude) float32 724B 110.0 110.2 ... 154.8 155.0\n", - "Data variables:\n", - " specific_humidity (time, level, latitude, longitude) float32 510kB ...\n", - " temperature (time, level, latitude, longitude) float32 510kB ...\n", - "Attributes:\n", - " valid_time_start: 1940-01-01\n", - " last_updated: 2026-06-01 03:27:53.232444+00:00\n", - " valid_time_stop: 2025-12-31\n", - " valid_time_stop_era5t: 2026-05-26
<xarray.Dataset> Size: 1MB\n", + "Dimensions: (time: 1, level: 5, latitude: 141, longitude: 181)\n", + "Coordinates:\n", + " * time (time) datetime64[ns] 8B 2024-01-23T17:00:00\n", + " * level (level) int64 40B 200 500 700 850 1000\n", + " * latitude (latitude) float32 564B -5.0 -5.25 -5.5 ... -39.75 -40.0\n", + " * longitude (longitude) float32 724B 110.0 110.2 ... 154.8 155.0\n", + "Data variables:\n", + " specific_humidity (time, level, latitude, longitude) float32 510kB ...\n", + " temperature (time, level, latitude, longitude) float32 510kB ...\n", + "Attributes:\n", + " valid_time_start: 1940-01-01\n", + " last_updated: 2026-06-01 03:27:53.232444+00:00\n", + " valid_time_stop: 2025-12-31\n", + " valid_time_stop_era5t: 2026-05-26