Skip to content

Lessons & gotchas

A lab’s most valuable output is often what it learned. These are the non-obvious ones, kept so nobody pays for them twice.

  • Ratio-of-sums, never mean-of-ratios. Aggregate raw counts / partial sums first, then derive the metric. Averaging per-group metrics is wrong and the error is large enough to show on screen.
  • MET rounds its .stat output to 5 decimal places. Recomputed real-valued stats can only be expected to match to ~±5e-6; near-perfect forecasts can produce catastrophic cancellation when deriving RMSE from rounded partial sums (clamp tiny negative MSE to 0).
  • N_THRESH counts threshold edges, not bins (MET writes pct.nrows()+1). The first parser cut got this wrong and 76/76 self-tests passed anyway — because the fixtures encoded the same wrong assumption. For format parsers, verify against an independent source of truth (the MET source code), not fixtures you wrote yourself.
  • Pair SL1L2↔CNT lines on the full common header minus LINE_TYPE/ALPHA — SL1L2 has ALPHA=NA while its CNT twin has ALPHA=0.05; a reduced key silently collides across regions/levels.
  • FCST_LEAD is an HHMMSS string (e.g. 060000), zero-padded. Match leads as strings or convert once in a shared place; an int comparison silently pairs the wrong lead.
  • Never use NaN as a missing-data sentinel in WGSL. Apple Metal fast-math folds the x != x NaN self-test to a constant, so masked cells leak into every window. Use an out-of-band ordered sentinel (1e30) and test x < 1e29.
  • WebGPU’s default maxStorageBufferBindingSize is 128 MiB. Oversized bindings don’t error visibly — they silently bind zeros. Request the adapter’s limits at device creation.
  • Backticks inside a WGSL comment inside a JS template literal terminate the string. A one-character comment edit can become a bewildering SyntaxError.
  • Playwright’s bundled headless Chromium has no WebGPU. Use the real Chrome channel with --enable-unsafe-webgpu --use-angle=metal to get the actual Metal adapter — which also means GPU code can be benchmarked honestly in automation.
  • Validate archives with gzip -t, not tar -tzf. libarchive lists headers leniently and exits 0 on a truncated stream; only the gzip CRC check validates the whole file.
  • GRIB2 decode dominates conversion cost (~90% of ~73 s/cycle) and GRIB2→Zarr expands on disk (GRIB2’s native packing beats zstd+bitround). Decode once, offline.
  • Avoid the tiny-object anti-pattern on object stores. 55k objects for 810 MB of pairs cubes is the wrong shape for R2; consolidate before uploading.
  • Grid conventions bite twice: the v2 common grid stores longitude 0–360 with row 0 at the south edge; every consumer must convert to −180..180 and flip vertically or the map renders blank/upside-down.
  • ES modules don’t load from file:// (CORS origin: null). Either serve over HTTP or inline the whole module graph — which is exactly what tools/inline.mjs does, via an import map of data: URLs (relative specifiers would resolve against the data: base and break; bare import-map keys dodge that).
  • Cloudflare Pages reuses the previous deployment manifest when “0 files uploaded”. Deleting or excluding a file does not remove it from the live site; change at least one file’s content to force a real manifest.
  • Pin --branch main when deploying from a non-main worktree, or the deploy lands as a preview URL instead of production.
  • Serve the repo root, not dist/ — apps import ../../lib/met-stats.mjs, so the whole tree must share one origin.
  • Never let parallel agents drive one browser — verification is centralized and sequential; concurrent Chromium automation deadlocks.
  • Beware circular fixtures (see N_THRESH above): the reviewer that catches spec bugs is the one reading the other implementation.
  • De-identify before anything public. Model names, masks, coordinates, and valid times were scrubbed everywhere public; a private bucket + gated Worker keeps the real data real.