Reference

Methodology.

How every number on this site is computed. Linked from every tool. If you find a mistake — in the math, in a source, or in a number we publish — tell us; we’ll log the correction in public.

Data sources

Every chart on the site has a source: X · freshness: Y footer. Here is the consolidated list. We don’t take revshare from platforms and we don’t resell data.

Source	What we use it for	Refresh	Role
NWS CLI (Climatological Report Daily)	Authoritative settlement value for daily high / low. Every weather market on Kalshi settles on this exact document.	2–6× / day per WFO	A1
METAR RMK 6h groups (`1snnn` / `2snnn` / `4snnnsnnn`)	Closed 6-hour max/min at 0.1°C precision. Sub-hourly peaks that hourly METARs miss.	synoptic hours ± 52 min	A2
FAA aviationweather.gov (METAR + T-groups)	Routine and 5-minute special observations, T-group precise temp/dewpoint.	1–5 min	B
NWS api.weather.gov /stations/…/observations	Backfill for 5-min specials FAA batch omits.	5–10 min	B
NWS gridded forecast (api.weather.gov)	Hourly forecast for ≤48h horizon. Bias-corrected against settled values (1,278 city-days, May–Nov 2025).	1–3 h	B
Kalshi public API	Market metadata, orderbook, settled outcomes.	15–60 s (Pro: realtime)	C
Polymarket gamma-api, Manifold REST	Cross-platform market activity. Week 2.	60 s	C

Role legend. A1 = authoritative settlement source; A2 = derived authoritative (used when A1 hasn’t published); B = observational input; C = market data, never used to model outcomes.

CF6 pecking order

When more than one source has a number for the same observation window, we pick from this list in order and stop at the first one that has produced a value:

NWS CLI — the authoritative climate summary. What Kalshi grades against.
METAR RMK 6h groups, but only when the peak’s timestamp falls inside a closed 6h synoptic window (00 / 06 / 12 / 18 Z).
Hourly METAR with QC filter applied (next paragraph).
5-min FAA specials, with the Mode-3 trend-aware sanity filter.

Hourly readings are accepted only when at least one other reading within ±10 minutes is within 1.5°F — the same QC NWS applies before a value reaches CF6. Isolated spikes from radiational flashes or sensor noise get dropped. Once CLI publishes, a later observation overrides it only if it’s more than 0.5°F more extreme (post-CLI cooling / warming).

LST, not local clock The climate day used for max/min is Local Standard Time — not the wall-clock day. At 00:30 CDT on Apr 28, LST is 23:30 CST on Apr 27. Treating those final hours as Apr 28 cost us markets we should have won. We compute LST from a fixed Jan-15 reference offset to skip DST entirely.

Bracket-edge margin

Kalshi weather brackets are 1°F-wide ranges anchored on a half-degree (e.g. B85.5 means the high settles between 85 and 86 inclusive). Settlement uses rounded CF6 numbers, but the underlying temperature is continuous.

The bracket-edge margin is the two-sided rounding distance from the precise projection to each flip line. For a high of 85.42°F against a B85.5 bracket the rounded high is 85, the lower flip line (round-down to 85) sits at 84.5 and the upper flip line (round-up to 86) sits at 85.5:

84.5°F · lower flip

85.5°F · upper flip

margin lower = 0.92°F

margin upper = 0.08°F

85.42°F projection

Precise projection vs. flip lines · KATL B85.5 · 2026-05-17

We display the narrower of the two margins on the grade card. In the example above, the projection is essentially against the upper flip line — a 0.08°F move at 4 PM tips the bracket. That’s a thin margin and it shows up in the grade.

Quality grade

Every bracket we observe gets a letter grade. The grade is a function of four inputs, all computed at observation time and never backfilled:

Bracket-edge margin — narrower of the two flip-line distances. Below 0.4°F caps the grade at B regardless of the other inputs.
Source confidence — has CLI published? are all four RMK 6h windows closed and consistent? is the projection still leaning on hourly METAR?
Liquidity — last-24h volume on Kalshi for that bracket. We don’t boost the grade for high volume but we cap it when there isn’t enough to leave on without re-pricing it.
City tier — see below. Tier-2 cities get a half-grade haircut on narrow brackets.

The output is one of A+, A, B, or no grade. A+ is reserved for markets where the projection is locked (post-RMK-4-group), the margin is at least 1.0°F, the city is Tier 1, and 24h volume on the bracket is at least 2,500.

No grade is not a recommendation Most markets we look at don’t get a grade. That’s a signal that the data we want isn’t available yet — not a comment on the market itself. We never grade a market we couldn’t observe with our own pipeline.

City tiers

City tier is a tracking-quality classification. It says nothing about how interesting a market is — only how confident we are in the projection for that station. Tiers are re-derived monthly from a rolling 30-day backtest with production-realistic filters (window-fully-today + RMK sanity).

Tier 1. 100% within ±1°F across the last 30 days of resolved settlements. Narrow brackets are OK. 17 cities at the time of writing (KATL, KAUS, KDAL, KDCA, KDEN, KHOU, KLAS, KLAX, KMIA, KMSP, KNYC, KOKC, KPHL, KPHX, KSAT, KSEA, KSFO).
Tier 2. One ±2–5°F miss in 30 days. Half-grade haircut on narrow brackets. KMDW and KBOS at the time of writing.

The dataset behind the v3 backtest was 391 city-days. Overall accuracy: 99% within ±1°F, 94% flawless, mean bias −0.04°F. The full city-tier history is on the scoreboard.

Calibration

Every published forecast lands in validation_cf6 with a forecast_id, the predicted probability, and the resolving market. Outcomes attach automatically when Kalshi settles. Nothing is retroactively edited.

The reliability curve on /calibration buckets forecasts at deciles and plots realized frequency against predicted probability. Perfect calibration is the diagonal; the 95% confidence band is binomial.

NWS bias adjustment

NWS gridded forecasts are passed through a per-city bias table (nws_bias.json) built from 1,278 city-days of resolved settlements May–Nov 2025. The bias is model − settled, so negative means NWS under-predicted the true high. We do not recompute the bias retroactively when settlements come in — the bias table used at forecast time is preserved with the forecast.

Brier score

For a forecast p and outcome y ∈ {0, 1} the Brier score is (p − y)²; the page-level Brier score is the mean. Lower is better. 0 is perfect. A constant forecast of 0.5 produces a Brier score of 0.25 — the baseline you need to beat. Our 90-day rolling Brier across all categories was 0.082 as of the last build.

We also publish log loss (−[y log p + (1−y) log (1−p)]) and mean calibration error (the average vertical distance from the realized curve to the diagonal). All three are visible on /calibration.

Whale categories

The Whale Watcher (Pro) classifies aggressive trades using these category buckets. We publish the rules here so they’re inspectable.

Position taker. A single fill that consumes ≥ 5% of the prior 24-hour volume on a single bracket.
Ladder. Five or more fills on the same bracket in < 60 seconds, total size ≥ 1,000 contracts.
Edge runner. Fills that lift the ask through ≥ 2 cents of an empty book.
Sweeper. A fill on both sides of a binary that crosses through the midpoint.

These are descriptions, not recommendations. We label them and let you decide whether to look closer.

Update cadence

Surface	Free tier	Pro
/hot feed	15-minute delay	Realtime (sub-60 s)
Quality grade feed	Tomorrow’s grades 24h delayed	Live as observations land
Whale Watcher	Not available	10-second cadence
Calibration scoreboard	Live, full resolution	Live, with CSV / API export
Daily AI brief	Preview — 1 paragraph	Full text · ~7 AM local

Corrections log

When we ship a fix that affects published numbers we log it here. Older entries are in the repo history.

2026-05-04. Cache-first observation reader; new nws_observations table backs the cascade so a flaky NWS API doesn’t drop us to FAA-only.
2026-05-01. Trend-aware Mode-3 filter: 5-minute readings outside the T-group envelope by > 3°F are now accepted only when two or more neighbouring readings within 1°F support the trend.
2026-04-21. v3 backtest re-derived city tiers; KMDW and KBOS dropped to Tier 2. Apr 21 KDCA RMK-cap regression patched (the 20Z peak fell after the last closed 18Z window).
2026-04-18. Dallas observations routed to KDFW because Kalshi settles at DFW, not Love Field. The mathclaw map and the settlement station now agree.
2026-04-09. Three-tier source priority introduced after the KMDW B71.5 loss — a 2:23 PM 71°F sub-hourly peak CF6 caught but hourly METAR missed.

Found something wrong? Email team@tinycorp.ai with the page URL and the value you think is off. We respond to every methodology bug.