11 · WebGPU FSS
Round 3 · GPU compute · data: (synthetic grids only for the throughput sweep) · reimagines MET core (the FSS operator)
The Fractions Skill Score is MET’s one common operator with no closed form over partial sums — it needs the actual neighborhood fractions over the grid. That makes it the honest test case for GPU compute. This app implements it four ways in WGSL and races them:
| Kernel | Complexity | Character |
|---|---|---|
| Naive per-cell window | O(cells · r²) | Wins by parallelism at small r, blows up with radius |
| Separable 2-pass sliding window | O(cells) | Flat in r, but has a serial per-line dependency |
| Prefix-scan integral image (SAT) | O(cells) | Flat ~5 ms, fastest at large r |
| Multi-block prefix scan | O(cells) | Lifts the 2048-cell line cap (3-phase block scan), lines ≤ 524,288 |
Plus GPU-to-screen rendering: the compute pass writes turbo-colormapped Pf/Po/|Pf−Po| into a storage texture sampled by the render pass — the on-screen field is colorized from the same integral image the score uses, with no readback (~0.2 ms).
Verification & benchmarks
Section titled “Verification & benchmarks”- Layered parity: Node selftest proves naive == separable == integral CPU (148/148, maxΔ exactly 0, including >2048 grids); the browser proves GPU == CPU on real Apple Metal (n exact, FSS Δ ~1e-8 to 1e-9).
- ~7.3× faster than CPU at 2048² (4.2 M cells); the r-sweep at 1024² shows naive climbing O(r²) (5.4 ms at r=1 → 30.7 ms at r=16) while the O(cells) kernels hold flat.
- §6 mirrors the radius slider with a live FSS readout so the score, the map, and the GPU render stay in lock-step.
Hard-won GPU lessons (details in Lessons)
Section titled “Hard-won GPU lessons (details in Lessons)”- NaN is unusable as a missing-data sentinel under Metal fast-math — use
1e30+ ordered test. - WebGPU’s default 128 MiB storage-binding limit fails silently (bindings read zeros) — request adapter limits explicitly.
- Real-Chrome-with-flags is required for WebGPU in automation; the bundled headless shell has no GPU.
Intentionally no single-file build — it needs a WebGPU browser.
