# PLAN — Theme 02 · Modern Interaction with Stat Data

**Experiment:** `experiment/stat-interaction`
**Idea grounded in:** *ideas.html* → Theme 02, cards "Direct-manipulation control of stat views" and
"Cross-filtered linked views".

---

## Goal

A **direct-manipulation explorer** for MET-style verification statistics. The user fluidly steers
what they see — pick metric(s), filter by model / region / lead time / threshold, and group/facet —
and **every view recomputes instantly, client-side, fully offline**. The centerpiece is
**cross-filtered linked views**: brushing or selecting in one panel filters all the others.

The control vocabulary is the product. We make four primitives explicit and discoverable:

| Primitive | Meaning in this tool | Example |
|-----------|----------------------|---------|
| **Metric** | the *y* value being plotted (a MET stat column) | `GSS`, `CSI`, `FAR`, `ME`, `RMSE` |
| **Dimension** | a categorical axis of the data you can split/color/facet on | `model`, `region`, `threshold` |
| **Filter** | a constraint that hides rows everywhere (from controls *or* a brush) | `region ∈ {CONUS, WEST}` |
| **Facet** | a dimension broken out into small multiples | facet by `region` |

Success = a *tight interaction loop*: any change (click a legend chip, drag a brush on the line
plot, brush rows in the table, toggle a filter) re-filters and re-renders all panels in < ~16 ms on
the sample dataset, with no flicker and no network.

---

## MVP scope (in)

1. **Synthetic MET-style dataset**, generated by a committed Node-free script and saved as a static
   JSON file the page loads via relative `fetch` (with an inline fallback so `file://` works even if
   `fetch` is blocked — see Sample-data plan).
2. **Control bar** exposing the vocabulary explicitly:
   - **Metric picker** (multi-select chips) — switches the *y* axis / adds compared metrics.
   - **Color/Group-by dimension** selector (`model` default).
   - **Facet-by dimension** selector (`none` / `region` / `threshold` / `model`).
   - **Filter pills** for `model`, `region`, `lead`, `threshold` — toggleable; reflect brush state.
   - **Reset** + a live "**N of M matched pairs / rows shown**" readout.
3. **View A — Main line plot:** metric vs. lead time, one line per group (color = group-by dim),
   SVG, with hover tooltip and an **X-brush** (drag to constrain lead-time range → filters all views).
4. **View B — Brushable summary table:** one row per (group × facet) aggregate with the selected
   metric(s); sortable; **row selection brushes** the dataset (selected rows become the active
   filter). Includes a tiny inline sparkline per row.
5. **View C — Faceted small multiples:** the main plot repeated per facet value, sharing scales,
   so regional/threshold structure is visible at a glance.
6. **Linked brushing engine:** a single in-memory store holds raw rows + the current selection;
   all views subscribe and re-render from derived, memoized aggregates. Brushing in *any* view
   updates the shared filter and refreshes the others.
7. **Confidence band toggle** (nod to MET reality): the synthetic data carries
   `*_BCL`/`*_BCU` bootstrap bounds; the line plot can shade them. Off by default to keep dense
   plots readable.
8. **Heavy-metal dark UI** featuring `assets/met-al_logo.png`: chrome/steel-on-near-black, brushed
   metal gradients, restrained accent glow — scientific content stays high-contrast and uncluttered.
9. **Keyboard + a11y basics:** focusable controls, `Esc` clears brush, color-vision-safe categorical
   palette, ARIA live readout for the match count.

## Out of scope (explicitly, for the MVP)

- Real `.stat` ASCII file parsing/upload (data is synthetic JSON; parser noted as open question).
- Server, build step, npm/CDN — none. Pure `file://`.
- WebGL/WebGPU rendering (dataset is small; SVG is plenty and degrades best). Canvas reserved as a
  later option if cardinality grows.
- Map/spatial views, object-based (MODE) views, ensemble diagnostics — those are Theme 03.
- Persisting/sharing analysis state via URL (a stretch goal, listed below).

---

## File layout (all within this worktree)

```
experiment-stat-interaction/
├── index.html              # shell: logo header, control bar, 3 view regions; loads main.js as a module
├── PLAN.md                 # this file
├── README.md               # (existing, lab-level)
├── assets/
│   └── met-al_logo.png     # (exists) featured in the header
├── styles/
│   └── app.css             # heavy-metal dark theme, layout grid, control + view styling
├── data/
│   ├── gen_data.mjs        # generator: writes stat_sample.json (run once; output committed)
│   ├── stat_sample.json    # committed synthetic dataset (the source of truth the app loads)
│   └── DATA_NOTES.md       # documents the schema + that the data is SYNTHETIC
└── src/
    ├── main.js             # bootstrap: load data (fetch→inline fallback), build store, mount views
    ├── store.js            # state store: raw rows, filters/brush, derived+memoized aggregates, pub/sub
    ├── model.js            # domain helpers: metric registry, dimensions, aggregation, formatting
    ├── controls.js         # control bar: metric/dimension/facet pickers, filter pills, readout
    ├── view_line.js        # View A: metric-vs-lead SVG line plot + X-brush + CI bands
    ├── view_table.js       # View B: brushable, sortable summary table with sparklines
    ├── view_facets.js      # View C: faceted small multiples (reuses line-plot drawing)
    └── svg.js              # tiny SVG/scale/axis helpers (no deps)
```

> ES modules loaded by **relative path** via `<script type="module" src="src/main.js">`.

---

## Sample-data plan (synthetic, committed)

**Generator:** `data/gen_data.mjs` — a deterministic (seeded PRNG) script that emits
`data/stat_sample.json`. It is run once by the implementer; the **JSON output is committed** so the
app never needs to run the generator. Documented as **SYNTHETIC** in `data/DATA_NOTES.md` and in the
UI footer.

**Shape of the data (MET-faithful where it matters):**

- **Models / forecast sources:** 4 — `GFS`, `ECMWF`, `HRRR`, `NAM` (named like real systems but
  values are fabricated).
- **Lead times (`FCST_LEAD`):** `0, 6, 12, … 120` h (a realistic forecast-hour sweep).
- **Regions / masks (`VX_MASK`):** `CONUS`, `EAST`, `WEST`, `GREAT_PLAINS`, `GULF`.
- **Thresholds (`FCST_THRESH`):** categorical precip-style `>=1.0mm`, `>=5.0mm`, `>=10.0mm`,
  `>=25.0mm` (used by categorical metrics).
- **Line types & metrics** (the verification content):
  - **CTS** (categorical): `CSI`, `GSS`, `FAR`, `POD`, `FBIAS` — vary with threshold.
  - **CNT** (continuous): `ME` (bias), `RMSE`, `MAE`, `PR_CORR` — independent of threshold.
  Each metric value carries bootstrap bounds `*_BCL` / `*_BCU` for the CI toggle.
- **Realism baked into the synthetic signal:**
  - Skill **decays with lead time** (GSS/CSI down, RMSE up) with mild per-model offsets so models
    are distinguishable and rank-cross at some leads (makes brushing interesting).
  - Higher thresholds → rarer events → lower categorical skill + noisier CIs.
  - Regional spread (e.g. WEST harder for precip) so faceting reveals structure.
  - Small Gaussian noise per case so lines aren't sterile; CIs scale with noise/sample size.
- **Row schema** (one row per fully-crossed **case**, carrying the **raw aggregables**
  MET sums before deriving a metric — this is what enables correct ratio-of-sums
  aggregation):
  ```json
  // CTS case — 2x2 contingency counts c = [fy_oy, fy_on, fn_oy, fn_on]
  { "model": "HRRR", "line_type": "CTS", "region": "WEST", "lead": 24,
    "thresh": ">=5.0", "n": 1840, "c": [612, 140, 188, 900],
    "ci": { "GSS": [0.281, 0.343], "POD": [...], "CSI": [...], "FAR": [...], "FBIAS": [...] } }

  // CNT case — SL1L2 partial sums s = [FBAR, OBAR, FFBAR, OOBAR, FOBAR] (+ sae for MAE)
  { "model": "HRRR", "line_type": "CNT", "region": "WEST", "lead": 24, "thresh": "NA",
    "n": 5100, "s": [1.61, 1.55, 5.08, 5.05, 4.35], "sae": 3700.4,
    "ci": { "RMSE": [...], "ME": [...], "MAE": [...], "PR_CORR": [...] } }
  ```
  The per-case derived metric value is **re-derived from the raw aggregables** (not
  stored), so the five CTS / four CNT metrics stay mutually consistent and aggregation is
  correct-by-construction. See `data/DATA_NOTES.md`.
- **Size target:** 2,100 case rows (4 models × 21 leads × {CTS: 5 regions × 4 thresh = 20
  + CNT: 5 regions = 5}) — small enough to load instantly and recompute live, large enough to
  feel real. Committed as a **single ~460 KB file** (`data/stat_sample.js`); the JSON mirror is
  git-ignored to avoid a duplicate full mirror.

**Offline robustness:** `index.html` loads `data/stat_sample.js` (a classic `<script>` that assigns
`window.__STAT_DATA__`) — the **single committed data file**. `main.js` prefers that inline global
(works on `file://` double-click *and* `http://` with zero `fetch`); it only `fetch`es
`data/stat_sample.json` as a fallback for a maintainer who has regenerated the (git-ignored) JSON
mirror. This guarantees double-click-to-open works everywhere with zero server, with no duplicate
committed mirror.

---

## How to run

1. Open `index.html` in any modern browser via `file://` (double-click, or
   `open index.html` on macOS). **No build, no server, no install, fully offline.**
2. (Maintainer only, optional) regenerate data: `node data/gen_data.mjs` then commit the new
   `data/stat_sample.json`. Not required to run the app.

---

## Interaction model (the "tight loop")

```
controls / brushes  ──▶  store.setFilter()/setBrush()
                              │  (single source of truth: rows + selection)
                              ▼
                    derive(): filter → group → aggregate (memoized by selection hash)
                              │
              ┌───────────────┼───────────────┐
              ▼               ▼               ▼
         view_line       view_table      view_facets
       (X-brush out)   (row-select out)  (reads only)
```

- One immutable raw array; selection lives in the store, never in the views.
- Each mutation publishes once; views diff against last render to avoid full rebuilds.
- Aggregation memoized on a hash of `{filters, groupBy, facetBy, metrics}` so repeated states are free.

---

## MET fidelity / open questions

**What we mirror from real MET output**
- Dimensions are the genuine universal axes of MET `.stat` output: `MODEL`, `FCST_LEAD`, `VX_MASK`,
  `FCST_THRESH`, `FCST_VAR`, line type. (Answers the ideas.html open question "which dimensions are
  the universal axes of MET output?")
- Metric names/semantics match MET line types: **CTS** (`CSI`,`GSS`,`FAR`,`POD`,`FBIAS`) and **CNT**
  (`ME`,`RMSE`,`MAE`,`PR_CORR`), plus the `_BCL`/`_BCU` bootstrap CI convention.
- Categorical metrics depend on threshold; continuous metrics don't — and the UI reflects that
  (threshold filter greys out / is ignored for CNT metrics).

**Deliberate simplifications (flagged honestly)**
- Values are **fabricated**, not computed from matched pairs; this is an interaction prototype, not a
  verification engine (that's Theme 01).
- We use a per-case tidy form instead of MET's wide fixed-column `.stat` layout — easier
  for generic grouping. Mapping to/from true `.stat` columns is deferred. (But the carried
  CTC counts / SL1L2 sums **are** the genuine MET aggregables.)
- **Aggregation across cases is now ratio-of-sums (FIXED).** Each case carries the raw
  aggregables (2×2 CTC counts for CTS, SL1L2 partial sums + `sae` for CNT); `src/model.js`
  **sums** them across the grouped cases (counts add directly; SL1L2 means re-weighted by `n`)
  and **then derives** the metric — exactly as MET does. The earlier mean-of-the-per-case-derived
  statistic (mean-of-ratios) is gone. Verified numerically: e.g. aggregated
  `POD == Σfy_oy / Σ(fy_oy+fn_oy)` to machine precision, and it differs from the old
  mean-of-derived value (ECMWF GSS: 0.420 ratio-of-sums vs 0.371 mean-of-derived).

**Open questions to resolve in REVIEW**
1. **Aggregation correctness:** ✅ **RESOLVED — yes.** The table/line/facets now aggregate the raw
   partial-sum / contingency-count columns (SL1L2, CTC) and *then* derive the metric (ratio-of-sums),
   not the mean of per-case derived metrics. CI bands on aggregated cells use an n-weighted mean of
   the stored per-case bounds (true bootstrap recombination on brushed subsets is the remaining
   open sub-question — see #4).
2. **Real `.stat` ingestion:** parse MET fixed-width/space-delimited `.stat`, or expect METcalcpy/
   pandas-style tidy CSV? Which is the better integration seam (ties to Theme 04 interoperability)?
3. **Scale:** SVG is fine for thousands of rows; where's the cliff, and is Canvas/WebGL needed for a
   full campaign (hundreds of thousands of rows)? (Theme 02 GPU-rendering card.)
4. **CI semantics in brushed subsets:** when a brush selects a sub-range, are displayed CIs still
   meaningful, or must they be recomputed? For the prototype we show stored per-case CIs only.
5. **Control vocabulary validation:** does metric / dimension / filter / facet actually cover MET's
   breadth without overwhelming users? The MVP is the test of the ideas.html open question.

**Stretch goals (post-MVP, not committed)**
- Serialize analysis state to the URL hash for shareable selections (ties to Theme 01 shareability).
- Difference mode (model A − model B) on the line plot.
- Drag-to-reorder facets; threshold as an animated scrub axis (Theme 03 overlap).