What is MET-AL?
MET-AL (Model Evaluation Tools — Analysis Lab) is an NCAR research lab exploring how users could explore, visualize, and interpret output from MET (Model Evaluation Tools) using a modern, browser-first stack — and, increasingly, how much of the MET verification pipeline itself can run in the browser.
The lab is live at https://met-al-lab.pages.dev/ — eleven prototype apps, no install.
The one-paragraph story
Section titled “The one-paragraph story”The lab started as four parallel idea themes (client-side compute, modern interaction, novel
plotting, cross-cutting modernization) captured on an idea board, was
prototyped as seven isolated experiments (each its own branch + worktree, plan → build →
review), then consolidated into a single gallery with a shared verification-math library.
A third round shifted from viewing MET output to recomputing it: a header-driven .stat
parser validated on the full real archive (6,329 files, 88,456 records, 0 errors), a pipeline
explorer that recomputes grid_stat live in the browser and matches MET’s own .stat
bit-identically on contingency counts, a real-data app streaming de-identified Zarr/Parquet
from Cloudflare R2, and a WebGPU implementation of the Fractions Skill Score that beats the CPU
by ~7× at 2048² grids.
What has been established
Section titled “What has been established”- The verification math is cheap. MET’s categorical (CTC/CTS), continuous (SL1L2/CNT), and vector (VL1L2/VCNT) statistics reduce to sums and ratios; a real 8k-cell case computes in 0.46 ms warm (4.28 ms cold) in JavaScript — far under one 60 fps frame.
- Parity with MET is achievable and testable. Browser-computed contingency counts are
bit-identical to MET’s
.stat; real-valued stats match to ±5e-6 (MET rounds its output to 5 decimal places). - The grid operators are the GPU’s job. Neighborhood methods like FSS have no closed form over partial sums; four WebGPU kernels (naive, separable, prefix-scan, multi-block scan) compute it with ~1e-8 parity and large speedups at scale.
- Format decode stays offline. GRIB2/prepBUFR decoding is the one pipeline stage that should
not be attempted client-side — data is pre-converted once to Zarr v3 + Parquet (the
sibling
metplus-data-storeproject), then range-read lazily by the browser. - Aggregation must be ratio-of-sums. Summing raw counts/partial sums then deriving the metric is correct; averaging per-group metrics is wrong, and the apps demonstrate the difference on screen.
Where to go next
Section titled “Where to go next”| You want to… | Go to |
|---|---|
| Run the apps | Getting started |
| Understand the experiment process | How the lab works |
| See each app in depth | Gallery overview |
| Understand the shared math & parser | Shared libraries |
| Learn how real data flows in | Cloud-native pipeline |
| See what’s planned | Roadmap |
