What is MET-AL?

MET-AL (Model Evaluation Tools — Analysis Lab) is an NCAR research lab exploring how users could explore, visualize, and interpret output from MET (Model Evaluation Tools) using a modern, browser-first stack — and, increasingly, how much of the MET verification pipeline itself can run in the browser.

The lab is live at https://met-al-lab.pages.dev/ — eleven prototype apps, no install.

The one-paragraph story

The lab started as four parallel idea themes (client-side compute, modern interaction, novel plotting, cross-cutting modernization) captured on an idea board, was prototyped as seven isolated experiments (each its own branch + worktree, plan → build → review), then consolidated into a single gallery with a shared verification-math library. A third round shifted from viewing MET output to recomputing it: a header-driven .stat parser validated on the full real archive (6,329 files, 88,456 records, 0 errors), a pipeline explorer that recomputes grid_stat live in the browser and matches MET’s own .stat bit-identically on contingency counts, a real-data app streaming de-identified Zarr/Parquet from Cloudflare R2, and a WebGPU implementation of the Fractions Skill Score that beats the CPU by ~7× at 2048² grids.

What has been established

The verification math is cheap. MET’s categorical (CTC/CTS), continuous (SL1L2/CNT), and vector (VL1L2/VCNT) statistics reduce to sums and ratios; a real 8k-cell case computes in 0.46 ms warm (4.28 ms cold) in JavaScript — far under one 60 fps frame.
Parity with MET is achievable and testable. Browser-computed contingency counts are bit-identical to MET’s .stat; real-valued stats match to ±5e-6 (MET rounds its output to 5 decimal places).
The grid operators are the GPU’s job. Neighborhood methods like FSS have no closed form over partial sums; four WebGPU kernels (naive, separable, prefix-scan, multi-block scan) compute it with ~1e-8 parity and large speedups at scale.
Format decode stays offline. GRIB2/prepBUFR decoding is the one pipeline stage that should not be attempted client-side — data is pre-converted once to Zarr v3 + Parquet (the sibling metplus-data-store project), then range-read lazily by the browser.
Aggregation must be ratio-of-sums. Summing raw counts/partial sums then deriving the metric is correct; averaging per-group metrics is wrong, and the apps demonstrate the difference on screen.

Where to go next

You want to…	Go to
Run the apps	Getting started
Understand the experiment process	How the lab works
See each app in depth	Gallery overview
Understand the shared math & parser	Shared libraries
Learn how real data flows in	Cloud-native pipeline
See what’s planned	Roadmap