The real dataset
The lab’s real data is a METplus verification archive (metplus_data.tar.gz, ~62 GiB):
both the inputs (GRIB2 forecasts, URMA analyses, GDAS/prepBUFR observations) and the
outputs (.stat files, _pairs.nc matched-pair grids) of grid_stat + point_stat
runs.
| Dimension | Values |
|---|---|
| Models | AIGFS, GFS, CREDIT, plus a regional WRF (de-identified as WRF-REG) — 4 |
| Truth/obs | URMA (gridded), GDAS/ADPSFC/SFCSHP/ADPUPA (point) |
| Cycles | 24 (2026-06-01 00Z → 06-06 18Z, 6-hourly) |
| Leads | 21 (0–120 h) |
| Variables | 18 FCST_VAR@LEV (APCP_06@A06, TMP@Z2, wind @Z10, PRMSL, upper-air @P500, …) |
| Verification region | one mask (de-identified as REGION-01) |
.stat line types |
SL1L2 + CNT, CTC + CTS (mostly precip), VL1L2 + VCNT |
Two structural facts drive app design:
- Deterministic-only. No ensemble (ECNT/ORANK/RHIST) or probabilistic (PCT/PSTD/PRC) lines exist — ensemble/reliability views stay synthetic, with explicit banners.
- Single region. The region facet is a no-op; apps repurpose that axis as MODEL.
Validation
Section titled “Validation”The full archive parses with lib/met-stat-parse.mjs at 6,329 files / 88,456 records /
0 errors, with independent-oracle cross-checks (SL1L2→CNT 32,389/32,390; VL1L2→VCNT
7,038/7,038) against MET’s own derived lines. The one “miss” is a documented
5-dp-rounding/catastrophic-cancellation artifact, not a parse or math error.
De-identification
Section titled “De-identification”Everything public is scrubbed: the regional model name and mask are replaced (WRF-REG,
REGION-01), coordinates and valid times dropped from baked cases, and the raw data lives
only in a private R2 bucket behind a gated Worker. The de-identification is enforced at
conversion time, not display time.
Archive-handling lessons
Section titled “Archive-handling lessons”- Validate the tarball with
gzip -t(whole-stream CRC), nottar -tzf(lenient). - The tar layout puts 40 GB of
output/beforeinput/, and gzip isn’t seekable — extracting a single late member streams ~95 GB. Extract once, keep the extraction.
