The one ruleThree destinies for a MET workload
Verification math → runs anywhere
CTC/CTS, SL1L2/CNT, VL1L2/VCNT, PCT/PSTD, aggregation, bootstrap CIs. Sums and ratios over small numbers of aggregables. Sub-millisecond to low-millisecond in plain JS — cheaper than a single network round-trip.
Grid operators → CPU fine, GPU shines
Neighborhood methods (FSS), object identification (MODE), regridding, pairs-on-the-fly. O(cells) work: tens of ms per megacell on CPU, ~7× faster on WebGPU, rendered straight to screen with no readback.
Format decode → offline, once
GRIB2 / prepBUFR / NetCDF decoding is ~90% of pipeline cost, I/O-bound, and its output expands. Pre-convert to Zarr v3 + Parquet; the browser range-reads exactly what a view needs (1–2 GETs per map frame).
The matrixMET ecosystem × client-side feasibility
| Functionality | Client? | Evidence (measured) | Why · limitations · what the client buys |
|---|---|---|---|
| MET core — grid & point verification | |||
| grid_stat · categorical + continuousthreshold → CTC/CTS · SL1L2 → CNT · masks | proven vs MET | real 8k-cell case: 0.46 ms warm / 4.28 ms cold; SL1L2 reduction ~70 M cells/s; counts BIT-IDENTICAL to MET .stat, reals ±5e-6 (MET's 5-dp rounding); 88/88 oracle checks; live in app 09 | This is the heart of MET, and it is decisively client-sized. A full recompute costs less than one 60 fps frame, so thresholds/masks/regions become sliders, not batch jobs. Limitation: interpolation-method parity (e.g. budget vs nearest) must match MET's config to reproduce matched pairs exactly — hence verifying on a common grid. Benefit: verification-from-a-URL; zero install; what-if in real time. |
| grid_stat · neighborhood (FSS/NBRCNT)the operator with no closed form over sums | proven vs CPU ref | integral-image CPU O(cells); 4 WebGPU kernels (naive/separable/prefix-scan/multi-block) parity Δ~1e-8 on real Metal; ~7.3× CPU at 2048² (4.2 M cells); GPU→screen with no readback; app 11 | The stress test for "grid ops on the GPU" — passed. No NBRCNT lines exist in this archive, so parity is proven CPU↔GPU (exact n, FSS to 1e-8), not vs a MET oracle — stated honestly. Limitations discovered and documented: NaN-sentinel folding under fast-math, the silent 128 MiB binding cap. Benefit: radius/threshold sweeps become interactive; the same integral image drives both score and on-screen field. |
| point_stat · matching + scoringinterpolate to obs, form pairs, score | measured | bilinear match + SL1L2 + CNT at ~13 M pts/s; 1 M points = 79 ms; pair errors ≤1e-13 on an analytic field; RMSE≈σ sanity passes (tools/bench/bench-point-match.mjs) | Given pre-decoded obs (Parquet), the inner loop is trivially fast. The archive's GDAS obs are 11.7 M rows = 45 MB Parquet; one synoptic time over a region is 10³–10⁵ points — sub-millisecond territory. Limitations: prepBUFR decode and MET's obs QC/level-dedup logic stay offline/unported; land/sea & topography masks need the static fields shipped once. Benefit: station-level drill-down with live re-scoring. |
Regridding · regrid.to_gridput fcst and truth on one grid |
measured | bilinear weights (417k pts) built once in 4.5 ms; 1.22 ms/frame apply (342 M pts/s); budget ×3 box-average 3.3 ms/frame; plane reproduced to 1.4e-13 (bench-regrid.mjs) | Fast enough to regrid interactively, which the v2 pipeline currently does offline in numpy — exploratory "compare any two models on any grid" needs no conversion pass. Limitations: conservative regridding of precip at large ratios needs proper area weights (budget here is ×3 box-average); Lambert/rotated-pole projections add a coordinate transform (cheap, but must match MET's). Benefit: kills the biggest constraint on which model pairs the client can score. |
| pcp_combineaccumulation-bucket arithmetic | measured (same op) | frame-minus-frame at 416k cells runs live in app 10's Difference mode (with running bias/RMSE/MAE); array add/subtract ≪ regrid cost | Sum/difference of accumulation fields is the cheapest grid op here. Limitation: bucket semantics (resets, missing cycles) are bookkeeping that must be encoded in store metadata. Benefit: build any accumulation window on the fly instead of pre-materializing APCP_01/03/06/24 variants. |
| MODE · object-based verificationsmooth → threshold → objects → attributes → match | faithful core BUILT | lib/met-mode.mjs: disc convolution + attributes + MET fuzzy-interest engine (default weights) + cluster merging; 42/42 selftest (hand-worked geometry); full pipeline ~12 ms on the real case, live in the MODE Lab app (card 13); scale probe 73.5 ms at 1024² | The core loop is comfortably interactive on CPU alone — and it is FSS-shaped work, so the GPU path is proven adjacent. Limitations, honestly: this is now a faithful CORE (true disc smoothing, MET's attribute set, the MODEConfig_default weights and threshold, configurable interest maps, cluster merging) — but the archive has no MODE output to use as an oracle, so verification is hand-worked geometry + invariants, not parity; curvature/percentile-thresholds/secondary merges remain unported. Benefit: convolution radius and threshold become sliders on the object map — MODE's parameters are notoriously fiddly, and interactive feedback is exactly what its users lack. |
| MTD · time-domain objectsMODE in 3-D (x, y, t) | measured | 3-D 26-connected space-time labeling + track attributes: 24×512² in 101 ms (~107 MB working set); 24×1024² in 379 ms (~428 MB); moving-blob verification exact (bench-mtd.mjs) | The spike is done: compute is fine (~65 M cells/s), and memory is the real constraint exactly as predicted — a 24-frame 1024² stack holds ~428 MB of arrays, comfortable on desktop, tight on mobile. Streaming frames (label t against t−1 only) would cut that if it ever matters. |
| ensemble_statRHIST · CRPS · spread-skill · ensemble probs | math ready · no data | 20 members × 100k cases: rank hist 5.1 ms · spread-skill 15.2 ms · CRPS 112 ms (bench-ensemble.mjs); lib math covered by 129/129 selftest | Compute is a non-issue; the blocker is that this archive is deterministic-only (no ECNT/ORANK/RHIST lines), so there is nothing real to verify against — app 06 stays synthetic with a banner saying so. The day ensemble output exists, the client is already fast enough. |
| Probabilistic (PCT/PSTD/PRC)reliability · Brier decomposition · ROC | proven (math) | parsed PCT bins reproduce the paired PSTD line's Brier decomposition EXACTLY (browser-verified, app 08); N_THRESH=edges lesson encoded in the parser | The math and parsing are proven end-to-end; the archive just contains no probabilistic lines to display. ROC/PRC rendering is a small addition (queued with the Taylor diagram). Benefit: reliability diagrams that recompute as you re-bin. |
| wavelet_statintensity-scale decomposition | untested | no measurement; a Haar transform is O(cells) with tiny constants | Nothing about it looks client-hostile — it is less work per cell than FSS. Parked only because no one has asked for it yet; would need its own oracle case. |
| Statistics over archives — the analysis layer | |||
| stat_analysis · filter/aggregate .stat archivesthe batch tool behind most MET Q&A | proven vs MET | FULL real archive (6,329 files / 59.8 MB / 88,456 records): parse 602 ms (99 MB/s), ratio-of-sums aggregate 14.5 ms → 1,005 groups, series 0.14 ms; cold→plot 1.4 s (bench-archive-aggregate.mjs); parser 0 errors; 32,389/32,390 + 7,038/7,038 derived-line oracle checks | The whole archive's statistics are a client-side object. After a one-time parse (or an 8.8 MB Parquet load), every re-slice — new model, threshold, grouping — costs ~15 ms. The 5-dp cancellation caveat applies when re-deriving from MET-rounded sums (one documented case in 32k). Benefit: METviewer-class questions with zero infrastructure, plus CIs the batch tool doesn't give you (next row). |
| Bootstrap confidence intervalsthe METcalcpy capability gap | measured | SHIPPED in lib/met-stats.mjs (bootstrapCI, percentile + BCa; selftest 140/140) and live in apps 02/04/12: 0.4 ms per CI (B=1000); bands recomputed per interaction; scorecard significance = paired, event-equalized bootstrap | The blueprint asked whether an n-weighted proxy over MET's per-record _BCL/_BCU was acceptable. Wrong question — the real thing is free. Resampling cycles and re-deriving via ratio-of-sums is sub-millisecond, so every plotted point can carry a CI, recomputed on every interaction. Limitations: percentile method under-covers slightly at small n (93.3% vs 95% at n=48; BCa would tighten it); within-case pair-level bootstrap needs the pair grids (available via the v2 store). Benefit: first-class uncertainty — an idea-board theme — for ~30 lines of code. |
| Event equalizationonly compare where all models verify | implemented | the scorecard (card 12) compares models on their COMMON init cycles only — a set intersection over the funnel's cycle keys, inside every paired-bootstrap cell | Plain set intersection over (cycle, lead, var, mask) keys — minutes of
work on top of the existing grouping, or one SQL INTERSECT in DuckDB. Listed
separately because METcalcpy treats it as a feature; here it falls out of the data model. |
| The ecosystem — data, viewers, orchestration | |||
| METdataioload .stat into a database | replaced, serverless | entire 1.53 M-row .stat archive = 8.8 MB Parquet; DuckDB-WASM queries it live from R2 in app 10 ("80 rows across 4 models"); .stat/.txt/MODE parser 97/97 + 0 errors on 6,329 real files (app 08) | The database's job was random access and aggregation; Parquet + DuckDB-WASM do both without the server. Limitation: writes/curation stay offline (that's the conversion step); multi-user concurrency is N browsers reading one immutable object — which is a feature. Benefit: the entire "install MySQL, define schemas, run the loader" on-ramp disappears. |
| METviewerweb UI: select → aggregate → plot | demonstrated | app 10 (SQL → RMSE-vs-lead by model, live from R2) + app 02 (Metric/Dimension/Filter/Facet explorer) + 15 ms re-aggregation at full-archive scale | Architecture proven and the signature views now exist: the scorecard (with honest paired-bootstrap significance), Taylor diagram, and ROC shipped as card 12. Limitation: METviewer's long tail of plot types and its saved-XML workflows would need deliberate porting. Benefit: selection loop goes from form-submit-wait to direct manipulation at 60 fps. |
| METcalcpyaggregation, derivation, statistics in Python | mostly covered | ratio-of-sums aggregation (proven), VL1L2→VCNT 7,038/7,038, bootstrap CIs (shipped, 140/140 lib checks), event equalization (shipped in the scorecard); remaining: the long tail of specialized derivations | The shared lib is a growing JS METcalcpy with a stricter honesty contract (NaN-safety, wrong-way demonstrators). Remaining gap after this round: equalization + the long tail of specialized derivations (add per demand, each with an oracle). |
| METplotpythe plot catalog | broad coverage | performance diagram, reliability+sharpness+Brier, rank histogram, spread-skill, CRPS, threshold scrubber, object views, spatial fields — all shipped & browser-verified across apps 03–07 | Interactive versions of the core catalog exist; Taylor diagram and ROC are the named absentees. Benefit over static PNGs: linked brushing, hover-to-counts, and provenance on every plot. |
| METexpresssimplified predefined-query viewer | same class | no separate demo; it is a curated subset of the METviewer capability demonstrated above | If METviewer's loop runs serverless, METexpress's narrower loop does too. The interesting port is its curated question set, not its plumbing. |
| METplus wrappersconfig-driven batch orchestration | different role | app 09 reconstructs the pipeline narratively (10 stages, each badged browser/pre-encode/offline) rather than executing configs | Orchestration is the one piece that shouldn't move client-side: it exists to babysit filesystems, schedulers, and 62 GiB of I/O. The browser-side analog is session state (URL-hash configs, shareable analyses), which the lab already has. Verdict: reimagine, don't port. |
| Kept offline — on purpose | |||
| GRIB2 / NetCDF decode+ prepBUFR for point obs | offline by design | measured: decode ≈ 90% of the ~73 s/cycle conversion; GRIB2→Zarr EXPANDS on disk (1,226→1,692 MB — GRIB2's packing beats zstd+bitround); full archive ≈ 25–30 min once | A WASM ecCodes port would work and still be the wrong answer: you'd ship megabytes of decoder to spend seconds of CPU producing data you then can't range-read. Decode once, store analysis-ready (Zarr v3 whole-frame chunks + Parquet), and a map frame costs 1–2 GETs (~2.5–8 MB compressed). The client's superpower is lazy access, not brute decoding. |
| Bulk pair materializationMET's _pairs.nc intermediates | eliminated | v2 store: pairs computed on the fly (fcst − truth on a common grid); 55,331 objects / 810 MB of stored pairs → a few oracle objects; live pairs for any selection ≈ 4 GETs / 11–17 MB; CTC from the oracle BIT-IDENTICAL across all 5 precip thresholds (app 10, production-verified) | Client-side compute doesn't just relocate this stage — it deletes it. Any model/var/lead/region pairing becomes computable, not just the ones a batch config anticipated. A small stored oracle keeps the parity proof alive. |
NumbersThe benchmark ledger
Apple M2 Pro · Node v26 (V8, as the engine proxy) unless marked
browser; browser numbers from the shipped apps on Apple Metal. New measurements are reproducible via
tools/bench/ (each script carries its own correctness checks).
| Workload | Scale | Result | Source |
|---|---|---|---|
| grid_stat full recompute (threshold→CTC→scores→sums→CNT) | real case, 8k cells | 0.46 ms warm · 4.28 ms cold | explainer (browser+node) |
| SL1L2 reduction throughput | up to 2048² synthetic | ~70 M cells/s | explainer |
| FSS on WebGPU vs CPU integral | 2048² / 4.2 M cells | ~7.3× faster · Δ ~1e-8 (browser, Metal) | app 11 |
| .stat archive → plot-ready series (parse+index+aggregate) | 6,329 files · 88,456 records | 1.4 s cold · 14.5 ms per re-slice | bench-archive-aggregate |
| Bootstrap CI (percentile, B=1000, ratio-of-sums re-derive) | 24 cycles/case set | 0.4 ms per CI · 4.7 ms per 20-lead series | bench-bootstrap |
| MODE-lite object pipeline (smooth→label→attrs→match) | real 83×97 · 1024² | 1.0 ms · 73.5 ms (≈14 M cells/s) | bench-mode-objects |
| Bilinear regrid apply (global 0.25° → 2.5 km-class grid) | 417k target points | 1.22 ms/frame (342 M pts/s) | bench-regrid |
| point_stat inner loop (match+SL1L2+CNT) | 1 M obs points | 79 ms (~13 M pts/s) | bench-point-match |
| Ensemble diagnostics (RHIST · spread-skill · CRPS) | 20 members × 100k cases | 5.1 · 15.2 · 111.8 ms | bench-ensemble |
| Full MODE pipeline (disc conv → objects → interest → clusters) | real case, 83×97 × 2 fields | ~12 ms (browser, live sliders) | lib/met-mode + card 13 |
| MTD 3-D space-time labeling + track attributes | 24×512² · 24×1024² | 101 ms · 379 ms (~65 M cells/s) | bench-mtd |
| Archive bundle → app-ready DataSource (funnel load) | 44,228 cases · 0.8 MB gz | 40 ms load · 38 ms 3-model series w/ B=1000 CIs | lib/met-data-source |
| DuckDB-WASM SQL over the full .stat Parquet | 1.53 M rows · 8.8 MB | interactive, streamed from R2 (browser) | app 10 |
| Offline conversion (the part kept out of the client) | 1 cycle, all models | ~73 s (GRIB2 decode ≈ 90%) | data-store bench |
Premise"Provided the optimal data input format"
- Gridded fields: Zarr v3, one common verification grid, whole-frame inner
chunks (the dominant read is a full 2-D frame → 1–2 range GETs), per-variable
precision (bitround where tolerable, lossless for categorical-critical fields like
APCP), byte-shuffle + zstd,
1e30fill (GPU-safe — never NaN). - Statistics: Parquet. The full archive's 1.53 M stat rows are 8.8 MB — small enough to query whole, structured enough for DuckDB to push filters down.
- Point obs: Parquet (11.7 M GDAS obs = 45 MB), sliced by time/region before matching.
- Parity oracles: a few KB of MET's own .stat values stored beside the data, so every client recompute can prove itself against the reference — the lab's core habit.
Bottom lineWhat should move to the client — and what shouldn't
- Move (proven): all stat math, archive aggregation, the analysis/viewer layer (METdataio/METviewer/METcalcpy roles), neighborhood + pairs grid ops, bootstrap CIs.
- Move next (measured, needs product work): point_stat drill-down, interactive regridding, MODE-lite → faithful MODE, ROC/Taylor/scorecard views.
- Don't move: format decode, bulk conversion, batch orchestration. They run once, offline, and make everything above possible — that division of labor is the architecture.