MET-AL · storage notes

Pyramids for planet-scale data
COG · OME-Zarr · GeoZarr

Three cloud-native formats solve the same problem — how do you look at one small piece of a dataset far too big to download? All three answer with the same two tricks: tiling (fetch only the area you see) and overviews / pyramids (fetch only the resolution you need). This page explains each, how they relate, and how they shape the MET-AL "global-capable" store.

← MET-AL gallery · companion: MET-in-the-browser explainer

The problem, in one breath

A single global weather field can be gigabytes; an archive is terabytes. A browser can't download that to draw one region at one zoom level. The fix is to never send the whole thing — store the data so a client can ask for exactly the bytes it needs over plain HTTP range requests. Two ideas make that possible, and all three formats below are variations on them:

tiling

the array is cut into small blocks, so a viewport fetches only the blocks it overlaps — not the whole grid.

overviews

pre-computed coarser copies (½, ¼, ⅛…). Zoomed out, you read a small coarse level instead of millions of full-res tiles.

Why it matters here: the MET-AL store must serve interactive maps and GPU compute from Cloudflare R2 with no server-side decoder. Getting tiling + overviews right is the difference between "one quick range read" and "download the planet." The rest of this page is the vocabulary behind that design.

See it: a pyramid keeps fetch cost flat

Drag the zoom. The dataset quadruples at every finer level, but the number of tiles your screen actually fetches stays roughly constant — because you read the level that matches the zoom, and only the tiles under the viewport. That invariant is the whole point of overviews.

zoom level L1 strategy

—

resolution level used for this view

—

tiles that exist at this level (whole world)

—

tiles fetched for the current viewport

—

vs. fetching this view from the finest level

The blue box is the viewport (what you see on screen). Filled cells are the tiles fetched; faint cells exist in the store but aren't requested. Switch to finest level only to see what happens with no overviews: a zoomed-out view has to pull every full-res tile.

Cloud-Optimized GeoTIFF COG mature · OGC standard

A COG is a perfectly ordinary GeoTIFF laid out so a client can range-read it. Three ingredients: the file opens with its metadata (so one read reveals every tile's byte offset), the pixels are internally tiled (256²/512² blocks, not scanline strips), and it carries overviews — embedded lower-resolution copies at ½, ¼, ⅛… A viewer reads the header, figures out which tiles (at which overview) cover the current map view, and fetches just those byte ranges. No tiling server, no reformatting — the file on S3/R2 is the tile service.

One file, front-loaded metadata, coarse overviews before full-res tiles — the layout that makes range reads cheap.

Best at: 2-D single- or few-band rasters (imagery, a single analysis field, DEMs). The default for satellite & basemap tiles.
Weak at: many dimensions. A COG is fundamentally width × height × bands; time, vertical level, model, ensemble member don't have a natural home — you end up with a directory of thousands of COGs.
In MET-AL: the data-store already ships a Web-Mercator COG pyramid of the URMA truth field, served by TiTiler — overviews in action for the single-field map.

OME-Zarr / NGFF OME-Zarr mature · community standard

When microscopy hit the same wall — single images of terabytes, five dimensions (time, channel, z, y, x) — the Open Microscopy Environment built a "next-generation file format" (NGFF) on top of Zarr. Its key idea is multiscales: a Zarr group holds several arrays, one per resolution level (0 = full res, 1,2… progressively coarser), and a small JSON metadata block describes them — the ordered datasets, each with a coordinateTransformations scale and the named axes. It's COG's overview idea, but Zarr-native and n-dimensional: viewers like vizarr/napari stream just the level + chunks in view.

A Zarr group of resolution-level arrays plus multiscales metadata that names the axes and the per-level scale. Each level is itself chunked, so a viewer streams only the chunks in view.

Best at: huge n-dimensional arrays where you already live in the Zarr/array world. It proved the "multiscale group in Zarr" pattern that GeoZarr now borrows.
Not geospatial: its axes are physical microns, not a map projection — no CRS, no earth. But the mechanism (a group of levels + scale metadata) is exactly what geospatial needed.

GeoZarr GeoZarr emerging · OGC draft (2026)

GeoZarr is the effort to make Zarr a first-class geospatial raster format — informally, "COG for the n-dimensional Zarr world." It's being standardized by an OGC Standards Working Group as a set of small, composable Zarr Conventions rather than one monolith: a multiscales convention (the pyramid idea, straight from OME-Zarr's lineage) for progressive visualisation, plus spatial conventions that tie array indices to real-world coordinates (CF grid_mapping/CRS, dimension names). So a single GeoZarr store can carry time, vertical level, and multiple resolution levels with a proper projection — the things a stack of COGs can't.

Status (mid-2026): an OGC Standards Working Group is active, a V1 release candidate is on the 2026 roadmap, and it already has implementations across GDAL, rioxarray, TiTiler, OpenLayers and the Copernicus EOPF data model. It is stabilizing, not frozen — worth following its conventions without hard-coding against a draft.

Best at: exactly our case — big, multi-dimensional, projected geoscience grids that need both interactive maps (pyramids) and array compute (native Zarr chunks + sharding).
The bridge: Zarr v3 sharding plays COG's "internal tiling" role — many small chunks packed into few cloud objects — so GeoZarr gets range-readable tiles without the millions-of-tiny-files problem.

Side by side

	COG	OME-Zarr (NGFF)	GeoZarr
container	one GeoTIFF file	a Zarr store (group of arrays)	a Zarr store (group of arrays)
home domain	geospatial imagery	bio-imaging / microscopy	geospatial / earth-observation
dimensions	x, y, bands (2-D)	up to t, c, z, y, x (n-D)	n-D incl. time, level, + CRS
"tiling"	internal TIFF tiles	Zarr chunks (+ v3 shards)	Zarr chunks + v3 shards
"overviews"	embedded overviews	`multiscales` group	`multiscales` convention
CRS / geo	yes (GeoTIFF tags)	no (physical units)	yes (CF `grid_mapping`)
range reads	HTTP byte ranges	per-chunk object reads	per-chunk / per-shard reads
maturity	OGC standard	community standard	emerging (2026)
browser tools	TiTiler, OpenLayers, GDAL	vizarr, napari, zarrita	zarrita, TiTiler, OpenLayers, GDAL

The one-sentence relationship: COG put a tiled pyramid inside a single 2-D geo file; OME-Zarr generalised the pyramid to n-D arrays in a Zarr store; GeoZarr brings that n-D Zarr pyramid back to geospatial with a real CRS — so it's the natural target for a multi-dimensional, multi-resolution earth-data store.

What this means for the MET-AL store

The MET-AL "global-capable" design is a GeoZarr-style multiscale pyramid: a coarse global level (a regular lat/lon grid, for global models like AIGFS/GFS) and a fine regional level (a Lambert CONUS grid, for regional models), each dataset stored at its honest native resolution. A reader picks the level that matches the zoom — coarse when you're looking at the hemisphere, fine when you're zoomed into a region — exactly the pyramid invariant from the demo above.

We already use overviews: the URMA truth ships as a COG pyramid (TiTiler). The global-capable store extends the same idea to the whole n-D forecast archive in Zarr.
Sharding, not tiny files: Zarr v3 sharding gives us COG-style "many tiles, few objects" on R2 — the fix for the current pairs store's 55k-object sprawl.
Regional-only models stay cheap: with model as a dimension and Zarr omitting all-fill chunks, a model that only covers a sub-region costs storage proportional to its footprint — see the interactive demo in the MET-in-the-browser explainer.
Follow, don't marry: GeoZarr is still stabilizing, so we adopt its conventions (multiscales + CF CRS) without depending on any single draft revision.