MET in the client — a feasibility matrix

Which MET-ecosystem functionality can run efficiently in the browser, given the optimal input format · MET-AL · 2026-07-01

Every verdict below is backed by a measurement on real METplus archive data (the 62 GiB tarball: 4 models × 24 cycles × 21 leads of grid_stat + point_stat output) or by an already-shipped, browser-verified MET-AL app. Nothing is graded on vibes: proven means checked against MET's own .stat output, measured means benchmarked with built-in correctness checks, and offline by design is a positive architecture claim, not a failure.

The one ruleThree destinies for a MET workload

Verification math → runs anywhere

CTC/CTS, SL1L2/CNT, VL1L2/VCNT, PCT/PSTD, aggregation, bootstrap CIs. Sums and ratios over small numbers of aggregables. Sub-millisecond to low-millisecond in plain JS — cheaper than a single network round-trip.

Grid operators → CPU fine, GPU shines

Neighborhood methods (FSS), object identification (MODE), regridding, pairs-on-the-fly. O(cells) work: tens of ms per megacell on CPU, ~7× faster on WebGPU, rendered straight to screen with no readback.

Format decode → offline, once

GRIB2 / prepBUFR / NetCDF decoding is ~90% of pipeline cost, I/O-bound, and its output expands. Pre-convert to Zarr v3 + Parquet; the browser range-reads exactly what a view needs (1–2 GETs per map frame).

proven vs MET validated against MET's own .stat output  ·  measured benchmarked with built-in correctness checks  ·  partial / untested plausible, not yet demonstrated  ·  offline by design deliberately kept out of the client

The matrixMET ecosystem × client-side feasibility

FunctionalityClient?Evidence (measured)Why · limitations · what the client buys
MET core — grid & point verification
grid_stat · categorical + continuousthreshold → CTC/CTS · SL1L2 → CNT · masks proven vs MET real 8k-cell case: 0.46 ms warm / 4.28 ms cold; SL1L2 reduction ~70 M cells/s; counts BIT-IDENTICAL to MET .stat, reals ±5e-6 (MET's 5-dp rounding); 88/88 oracle checks; live in app 09 This is the heart of MET, and it is decisively client-sized. A full recompute costs less than one 60 fps frame, so thresholds/masks/regions become sliders, not batch jobs. Limitation: interpolation-method parity (e.g. budget vs nearest) must match MET's config to reproduce matched pairs exactly — hence verifying on a common grid. Benefit: verification-from-a-URL; zero install; what-if in real time.
grid_stat · neighborhood (FSS/NBRCNT)the operator with no closed form over sums proven vs CPU ref integral-image CPU O(cells); 4 WebGPU kernels (naive/separable/prefix-scan/multi-block) parity Δ~1e-8 on real Metal; ~7.3× CPU at 2048² (4.2 M cells); GPU→screen with no readback; app 11 The stress test for "grid ops on the GPU" — passed. No NBRCNT lines exist in this archive, so parity is proven CPU↔GPU (exact n, FSS to 1e-8), not vs a MET oracle — stated honestly. Limitations discovered and documented: NaN-sentinel folding under fast-math, the silent 128 MiB binding cap. Benefit: radius/threshold sweeps become interactive; the same integral image drives both score and on-screen field.
point_stat · matching + scoringinterpolate to obs, form pairs, score measured bilinear match + SL1L2 + CNT at ~13 M pts/s; 1 M points = 79 ms; pair errors ≤1e-13 on an analytic field; RMSE≈σ sanity passes (tools/bench/bench-point-match.mjs) Given pre-decoded obs (Parquet), the inner loop is trivially fast. The archive's GDAS obs are 11.7 M rows = 45 MB Parquet; one synoptic time over a region is 10³–10⁵ points — sub-millisecond territory. Limitations: prepBUFR decode and MET's obs QC/level-dedup logic stay offline/unported; land/sea & topography masks need the static fields shipped once. Benefit: station-level drill-down with live re-scoring.
Regridding · regrid.to_gridput fcst and truth on one grid measured bilinear weights (417k pts) built once in 4.5 ms; 1.22 ms/frame apply (342 M pts/s); budget ×3 box-average 3.3 ms/frame; plane reproduced to 1.4e-13 (bench-regrid.mjs) Fast enough to regrid interactively, which the v2 pipeline currently does offline in numpy — exploratory "compare any two models on any grid" needs no conversion pass. Limitations: conservative regridding of precip at large ratios needs proper area weights (budget here is ×3 box-average); Lambert/rotated-pole projections add a coordinate transform (cheap, but must match MET's). Benefit: kills the biggest constraint on which model pairs the client can score.
pcp_combineaccumulation-bucket arithmetic measured (same op) frame-minus-frame at 416k cells runs live in app 10's Difference mode (with running bias/RMSE/MAE); array add/subtract ≪ regrid cost Sum/difference of accumulation fields is the cheapest grid op here. Limitation: bucket semantics (resets, missing cycles) are bookkeeping that must be encoded in store metadata. Benefit: build any accumulation window on the fly instead of pre-materializing APCP_01/03/06/24 variants.
MODE · object-based verificationsmooth → threshold → objects → attributes → match faithful core BUILT lib/met-mode.mjs: disc convolution + attributes + MET fuzzy-interest engine (default weights) + cluster merging; 42/42 selftest (hand-worked geometry); full pipeline ~12 ms on the real case, live in the MODE Lab app (card 13); scale probe 73.5 ms at 1024² The core loop is comfortably interactive on CPU alone — and it is FSS-shaped work, so the GPU path is proven adjacent. Limitations, honestly: this is now a faithful CORE (true disc smoothing, MET's attribute set, the MODEConfig_default weights and threshold, configurable interest maps, cluster merging) — but the archive has no MODE output to use as an oracle, so verification is hand-worked geometry + invariants, not parity; curvature/percentile-thresholds/secondary merges remain unported. Benefit: convolution radius and threshold become sliders on the object map — MODE's parameters are notoriously fiddly, and interactive feedback is exactly what its users lack.
MTD · time-domain objectsMODE in 3-D (x, y, t) measured 3-D 26-connected space-time labeling + track attributes: 24×512² in 101 ms (~107 MB working set); 24×1024² in 379 ms (~428 MB); moving-blob verification exact (bench-mtd.mjs) The spike is done: compute is fine (~65 M cells/s), and memory is the real constraint exactly as predicted — a 24-frame 1024² stack holds ~428 MB of arrays, comfortable on desktop, tight on mobile. Streaming frames (label t against t−1 only) would cut that if it ever matters.
ensemble_statRHIST · CRPS · spread-skill · ensemble probs math ready · no data 20 members × 100k cases: rank hist 5.1 ms · spread-skill 15.2 ms · CRPS 112 ms (bench-ensemble.mjs); lib math covered by 129/129 selftest Compute is a non-issue; the blocker is that this archive is deterministic-only (no ECNT/ORANK/RHIST lines), so there is nothing real to verify against — app 06 stays synthetic with a banner saying so. The day ensemble output exists, the client is already fast enough.
Probabilistic (PCT/PSTD/PRC)reliability · Brier decomposition · ROC proven (math) parsed PCT bins reproduce the paired PSTD line's Brier decomposition EXACTLY (browser-verified, app 08); N_THRESH=edges lesson encoded in the parser The math and parsing are proven end-to-end; the archive just contains no probabilistic lines to display. ROC/PRC rendering is a small addition (queued with the Taylor diagram). Benefit: reliability diagrams that recompute as you re-bin.
wavelet_statintensity-scale decomposition untested no measurement; a Haar transform is O(cells) with tiny constants Nothing about it looks client-hostile — it is less work per cell than FSS. Parked only because no one has asked for it yet; would need its own oracle case.
Statistics over archives — the analysis layer
stat_analysis · filter/aggregate .stat archivesthe batch tool behind most MET Q&A proven vs MET FULL real archive (6,329 files / 59.8 MB / 88,456 records): parse 602 ms (99 MB/s), ratio-of-sums aggregate 14.5 ms → 1,005 groups, series 0.14 ms; cold→plot 1.4 s (bench-archive-aggregate.mjs); parser 0 errors; 32,389/32,390 + 7,038/7,038 derived-line oracle checks The whole archive's statistics are a client-side object. After a one-time parse (or an 8.8 MB Parquet load), every re-slice — new model, threshold, grouping — costs ~15 ms. The 5-dp cancellation caveat applies when re-deriving from MET-rounded sums (one documented case in 32k). Benefit: METviewer-class questions with zero infrastructure, plus CIs the batch tool doesn't give you (next row).
Bootstrap confidence intervalsthe METcalcpy capability gap measured SHIPPED in lib/met-stats.mjs (bootstrapCI, percentile + BCa; selftest 140/140) and live in apps 02/04/12: 0.4 ms per CI (B=1000); bands recomputed per interaction; scorecard significance = paired, event-equalized bootstrap The blueprint asked whether an n-weighted proxy over MET's per-record _BCL/_BCU was acceptable. Wrong question — the real thing is free. Resampling cycles and re-deriving via ratio-of-sums is sub-millisecond, so every plotted point can carry a CI, recomputed on every interaction. Limitations: percentile method under-covers slightly at small n (93.3% vs 95% at n=48; BCa would tighten it); within-case pair-level bootstrap needs the pair grids (available via the v2 store). Benefit: first-class uncertainty — an idea-board theme — for ~30 lines of code.
Event equalizationonly compare where all models verify implemented the scorecard (card 12) compares models on their COMMON init cycles only — a set intersection over the funnel's cycle keys, inside every paired-bootstrap cell Plain set intersection over (cycle, lead, var, mask) keys — minutes of work on top of the existing grouping, or one SQL INTERSECT in DuckDB. Listed separately because METcalcpy treats it as a feature; here it falls out of the data model.
The ecosystem — data, viewers, orchestration
METdataioload .stat into a database replaced, serverless entire 1.53 M-row .stat archive = 8.8 MB Parquet; DuckDB-WASM queries it live from R2 in app 10 ("80 rows across 4 models"); .stat/.txt/MODE parser 97/97 + 0 errors on 6,329 real files (app 08) The database's job was random access and aggregation; Parquet + DuckDB-WASM do both without the server. Limitation: writes/curation stay offline (that's the conversion step); multi-user concurrency is N browsers reading one immutable object — which is a feature. Benefit: the entire "install MySQL, define schemas, run the loader" on-ramp disappears.
METviewerweb UI: select → aggregate → plot demonstrated app 10 (SQL → RMSE-vs-lead by model, live from R2) + app 02 (Metric/Dimension/Filter/Facet explorer) + 15 ms re-aggregation at full-archive scale Architecture proven and the signature views now exist: the scorecard (with honest paired-bootstrap significance), Taylor diagram, and ROC shipped as card 12. Limitation: METviewer's long tail of plot types and its saved-XML workflows would need deliberate porting. Benefit: selection loop goes from form-submit-wait to direct manipulation at 60 fps.
METcalcpyaggregation, derivation, statistics in Python mostly covered ratio-of-sums aggregation (proven), VL1L2→VCNT 7,038/7,038, bootstrap CIs (shipped, 140/140 lib checks), event equalization (shipped in the scorecard); remaining: the long tail of specialized derivations The shared lib is a growing JS METcalcpy with a stricter honesty contract (NaN-safety, wrong-way demonstrators). Remaining gap after this round: equalization + the long tail of specialized derivations (add per demand, each with an oracle).
METplotpythe plot catalog broad coverage performance diagram, reliability+sharpness+Brier, rank histogram, spread-skill, CRPS, threshold scrubber, object views, spatial fields — all shipped & browser-verified across apps 03–07 Interactive versions of the core catalog exist; Taylor diagram and ROC are the named absentees. Benefit over static PNGs: linked brushing, hover-to-counts, and provenance on every plot.
METexpresssimplified predefined-query viewer same class no separate demo; it is a curated subset of the METviewer capability demonstrated above If METviewer's loop runs serverless, METexpress's narrower loop does too. The interesting port is its curated question set, not its plumbing.
METplus wrappersconfig-driven batch orchestration different role app 09 reconstructs the pipeline narratively (10 stages, each badged browser/pre-encode/offline) rather than executing configs Orchestration is the one piece that shouldn't move client-side: it exists to babysit filesystems, schedulers, and 62 GiB of I/O. The browser-side analog is session state (URL-hash configs, shareable analyses), which the lab already has. Verdict: reimagine, don't port.
Kept offline — on purpose
GRIB2 / NetCDF decode+ prepBUFR for point obs offline by design measured: decode ≈ 90% of the ~73 s/cycle conversion; GRIB2→Zarr EXPANDS on disk (1,226→1,692 MB — GRIB2's packing beats zstd+bitround); full archive ≈ 25–30 min once A WASM ecCodes port would work and still be the wrong answer: you'd ship megabytes of decoder to spend seconds of CPU producing data you then can't range-read. Decode once, store analysis-ready (Zarr v3 whole-frame chunks + Parquet), and a map frame costs 1–2 GETs (~2.5–8 MB compressed). The client's superpower is lazy access, not brute decoding.
Bulk pair materializationMET's _pairs.nc intermediates eliminated v2 store: pairs computed on the fly (fcst − truth on a common grid); 55,331 objects / 810 MB of stored pairs → a few oracle objects; live pairs for any selection ≈ 4 GETs / 11–17 MB; CTC from the oracle BIT-IDENTICAL across all 5 precip thresholds (app 10, production-verified) Client-side compute doesn't just relocate this stage — it deletes it. Any model/var/lead/region pairing becomes computable, not just the ones a batch config anticipated. A small stored oracle keeps the parity proof alive.

NumbersThe benchmark ledger

Apple M2 Pro · Node v26 (V8, as the engine proxy) unless marked browser; browser numbers from the shipped apps on Apple Metal. New measurements are reproducible via tools/bench/ (each script carries its own correctness checks).

WorkloadScaleResultSource
grid_stat full recompute (threshold→CTC→scores→sums→CNT)real case, 8k cells0.46 ms warm · 4.28 ms coldexplainer (browser+node)
SL1L2 reduction throughputup to 2048² synthetic~70 M cells/sexplainer
FSS on WebGPU vs CPU integral2048² / 4.2 M cells~7.3× faster · Δ ~1e-8 (browser, Metal)app 11
.stat archive → plot-ready series (parse+index+aggregate)6,329 files · 88,456 records1.4 s cold · 14.5 ms per re-slicebench-archive-aggregate
Bootstrap CI (percentile, B=1000, ratio-of-sums re-derive)24 cycles/case set0.4 ms per CI · 4.7 ms per 20-lead seriesbench-bootstrap
MODE-lite object pipeline (smooth→label→attrs→match)real 83×97 · 1024²1.0 ms · 73.5 ms (≈14 M cells/s)bench-mode-objects
Bilinear regrid apply (global 0.25° → 2.5 km-class grid)417k target points1.22 ms/frame (342 M pts/s)bench-regrid
point_stat inner loop (match+SL1L2+CNT)1 M obs points79 ms (~13 M pts/s)bench-point-match
Ensemble diagnostics (RHIST · spread-skill · CRPS)20 members × 100k cases5.1 · 15.2 · 111.8 msbench-ensemble
Full MODE pipeline (disc conv → objects → interest → clusters)real case, 83×97 × 2 fields~12 ms (browser, live sliders)lib/met-mode + card 13
MTD 3-D space-time labeling + track attributes24×512² · 24×1024²101 ms · 379 ms (~65 M cells/s)bench-mtd
Archive bundle → app-ready DataSource (funnel load)44,228 cases · 0.8 MB gz40 ms load · 38 ms 3-model series w/ B=1000 CIslib/met-data-source
DuckDB-WASM SQL over the full .stat Parquet1.53 M rows · 8.8 MBinteractive, streamed from R2 (browser)app 10
Offline conversion (the part kept out of the client)1 cycle, all models~73 s (GRIB2 decode ≈ 90%)data-store bench

Premise"Provided the optimal data input format"

Honesty notes. (1) New benchmarks ran in Node/V8; prior rounds showed Node≈Chrome on these kernels, and the flagship paths (apps 09/10/11) are verified in the browser on the production origin. (2) The MODE row measures a simplified core, not MODE parity — no MODE output exists in this archive to verify against. (3) The archive is deterministic-only and single-region; ensemble/probabilistic verdicts are math-plus-benchmarks, not end-to-end demonstrations. (4) The regional model appears here under its de-identified name (WRF-REG); all public artifacts follow the lab's de-identification policy. (5) "5-dp cancellation": statistics re-derived from MET's rounded partial sums can differ near zero error — one documented case in 32,390.

Bottom lineWhat should move to the client — and what shouldn't