MET-AL — Model Evaluation Tools Analysis Lab

Idea Tracker

Model Evaluation Tools — Analysis Lab  ·  Explore, Visualize, Interpret

A living board of directions for modernizing how users explore, visualize, and interpret MET results. Entries are intentionally kept at the idea level — the problem to solve and the open questions — not a committed architecture. Candidate technologies are noted only as primitives worth weighing later.

Status reflects where the idea is in our thinking, not implementation progress. Nothing here has been investigated yet — this is the capture phase.
Idea To explore Exploring Parked
Theme 01

Client-side wrapper to MET tooling

Push verification computation and data access into the browser itself — minimizing servers, installs, and round-trips. How much of the documented MET workflow can run entirely client-side against modern model output?

Idea

In-browser execution of documented MET approaches

Re-express MET's verification methods so they run directly in the client against analysis-ready data, faithful to the published methodology.

Why: removes the install/server barrier; lets anyone reproduce a verification from a URL.

Open Qs: which line types / methods are tractable client-side? how do we guarantee parity with reference MET output?

Rust cratesWASMmethodology parity
Idea

Analysis-ready gridded model output

Read chunked, cloud-native model fields directly in the browser instead of staging large files through a server.

Why: lazy, range-request access to only the chunks a view needs; scales to large grids without a download step.

Open Qs: how do we map MET's expected inputs onto chunked stores? regridding & masking client-side?

Zarr v3chunked / lazyrange requests
Idea

Columnar point & station data

Treat point observations and matched pairs as columnar tables queried in place, rather than parsing ASCII stat files.

Why: fast filtering/aggregation over millions of matched pairs; a natural fit for interactive exploration.

Open Qs: a schema for MET matched pairs? interop with existing .stat outputs?

ParquetDuckDB-WASMSQL on stats
Idea

GPU compute for verification math

Offload heavy per-cell or per-pair statistics to the GPU for near-instant recomputation as the user changes thresholds or regions.

Why: interactive "what-if" on metrics that are otherwise a batch job.

Open Qs: which computations are GPU-amenable? numerical reproducibility vs. reference results?

WGSLWebGPUinteractive recompute
Idea

Zero-install, shareable analyses

An entire exploration (data refs + selections + view) captured in a link or a small file, reopened by anyone with a browser.

Why: collaboration and reproducibility without environment setup.

Open Qs: what's the minimal portable "analysis state" to serialize?

portabilityreproducibility
Theme 02

Modern interaction with stat data

Let users directly steer what they see — selecting metrics, regions, thresholds, and groupings fluidly — instead of pre-baking plots in a batch step.

Idea

Direct-manipulation control of stat views

Users pick line types, metrics, and filters through the UI and the plot responds immediately, including recomputation where needed.

Why: shrinks the loop between "I wonder if…" and an answer.

Open Qs: what's the right vocabulary of controls that covers MET's breadth without overwhelming?

direct manipulationlive recompute
Idea

Data rendered straight to the GPU surface

Drive plots from stat data into a shader pipeline so very large series stay smooth.

Why: keeps dense, high-cardinality stat plots responsive.

Open Qs: where does GPU rendering pay off vs. add complexity?

WGSLlarge-series rendering
Idea

Natural-language querying of results

Ask for a comparison in plain language and have it resolved into a concrete selection over the stat data.

Why: lowers the expertise needed to drive verification analysis.

Open Qs: how to keep it auditable and grounded in the actual data?

assisted queryauditability
Idea

Cross-filtered linked views

Brushing one panel (time, region, threshold, lead time) filters all the others.

Why: verification questions are inherently multi-dimensional; linked views make the slices obvious.

Open Qs: which dimensions are the universal axes of MET output?

linked brushingfaceting
Theme 03

Novel plotting & verification visuals

Beyond static line plots — new ways to render, interact with, and reason about verification, including spatial and object-based methods.

Idea

3D / volumetric statistics

Render verification across a third axis (height, lead time, threshold) as an explorable volume rather than a stack of 2D plots.

Why: some structure only shows up across the full cube.

Open Qs: when does 3D aid insight vs. obscure it?

3D statsvolumetric
Idea

Object-based verification, visually

Interactive exploration of object-oriented verification — matched/unmatched objects, attributes, and how they pair across forecast and observation.

Why: object methods are powerful but hard to inspect in tables.

Open Qs: best visual language for object matching & attribute diffs?

object verificationMODE-style
Idea

Map-native spatial verification

Explore where forecasts succeed or fail on a pannable, zoomable map, with stats tied to geography and masks.

Why: spatial error is fundamentally geographic; show it that way.

Open Qs: projections, regridding, and overlaying masks client-side?

spatialmapsmasking
Idea

First-class uncertainty

Make confidence intervals and bootstrap spread a built-in, always-visible part of every comparison rather than an afterthought.

Why: differences without uncertainty invite over-interpretation.

Open Qs: how to show uncertainty without cluttering dense plots?

confidence intervalsbootstrap
Idea

Ensemble verification views

Purpose-built visuals for rank histograms, spread-skill, reliability, and other ensemble diagnostics with interactive drill-down.

Why: ensemble diagnostics have rich structure underused in static form.

Open Qs: which ensemble diagnostics benefit most from interaction?

ensemblereliabilityspread-skill
Idea

Animated & scrubbable plot interaction

Scrub through time, lead time, or threshold and watch verification evolve, with smooth transitions that reveal trends.

Why: motion exposes temporal/threshold structure tables hide.

Open Qs: which axes are worth animating, and where does it mislead?

animationscrubbing
Theme 04

Other modernization directions

Cross-cutting ideas that could raise the floor for the whole experience — provenance, collaboration, accessibility, and interpretation.

Idea

Assisted interpretation & narration

Generate plain-language summaries of what a verification result is showing, grounded in the underlying numbers.

Why: turns metrics into decisions for non-specialist stakeholders.

Open Qs: how to keep narration honest and traceable to data?

interpretationsummaries
Idea

Provenance & reproducibility by default

Every plot carries the data sources, selections, and method versions needed to regenerate it exactly.

Why: verification results are evidence; they should be reproducible.

Open Qs: what minimal metadata makes a result fully reproducible?

provenancereproducibility
Idea

Notebook-style reproducible sessions

A narrative document interleaving explanation, live controls, and verification views that re-runs end to end.

Why: bridges exploration and the report that comes out of it.

Open Qs: how live vs. static should a shared session be?

notebooksnarrative
Idea

Collaborative annotation

Comment on and mark up specific plots/regions so a team can discuss results in context.

Why: verification is rarely a solo activity.

Open Qs: annotations tied to data selections, not pixels?

collaborationannotation
Idea

Accessible & responsive by design

Color-vision-safe palettes, keyboard navigation, screen-reader-friendly summaries, and layouts that work beyond the desktop.

Why: broadens who can use verification output and how.

Open Qs: how to make dense scientific plots genuinely accessible?

accessibilityresponsive
Idea

Interoperability with the MET ecosystem

Play well with existing MET / METplus outputs and companion tooling so this augments rather than replaces established workflows.

Why: adoption depends on fitting the world users already have.

Open Qs: which existing formats/outputs are the integration seams?

interoperabilityecosystem fit
Idea

Guided exploration / diagnostic storytelling

Opinionated paths that walk a user from "is my forecast good?" through the diagnostics that answer it.

Why: lowers the expertise floor and standardizes good practice.

Open Qs: which diagnostic journeys are worth encoding first?

guided UXonboarding
Idea

Scale to large result sets

Stay responsive when the stat data is large — streaming, summarization, and level-of-detail rather than "load everything."

Why: real verification campaigns produce a lot of output.

Open Qs: where are the performance cliffs in a client-first design?

scalestreamingLOD