Methodology.
How every number on this site is computed. Linked from every tool. If you find a mistake — in the math, in a source, or in a number we publish — tell us; we’ll log the correction in public.
Data sources
Every chart on the site has a source: X · freshness: Y footer. Here is the consolidated list. We don’t take revshare from platforms and we don’t resell data.
| Source | What we use it for | Refresh | Role |
|---|---|---|---|
| NWS CLI (Climatological Report Daily) | Authoritative settlement value for daily high / low. Every weather market on Kalshi settles on this exact document. | 2–6× / day per WFO | A1 |
METAR RMK 6h groups (1snnn / 2snnn / 4snnnsnnn) |
Closed 6-hour max/min at 0.1°C precision. Sub-hourly peaks that hourly METARs miss. | synoptic hours ± 52 min | A2 |
| FAA aviationweather.gov (METAR + T-groups) | Routine and 5-minute special observations, T-group precise temp/dewpoint. | 1–5 min | B |
| NWS api.weather.gov /stations/…/observations | Backfill for 5-min specials FAA batch omits. | 5–10 min | B |
| NWS gridded forecast (api.weather.gov) | Hourly forecast for ≤48h horizon. Bias-corrected against settled values (1,278 city-days, May–Nov 2025). | 1–3 h | B |
| Kalshi public API | Market metadata, orderbook, settled outcomes. | 15–60 s (Pro: realtime) | C |
| Polymarket gamma-api, Manifold REST | Cross-platform market activity. Week 2. | 60 s | C |
Role legend. A1 = authoritative settlement source; A2 = derived authoritative (used when A1 hasn’t published); B = observational input; C = market data, never used to model outcomes.
CF6 pecking order
When more than one source has a number for the same observation window, we pick from this list in order and stop at the first one that has produced a value:
- NWS CLI — the authoritative climate summary. What Kalshi grades against.
- METAR RMK 6h groups, but only when the peak’s timestamp falls inside a closed 6h synoptic window (00 / 06 / 12 / 18 Z).
- Hourly METAR with QC filter applied (next paragraph).
- 5-min FAA specials, with the Mode-3 trend-aware sanity filter.
Hourly readings are accepted only when at least one other reading within ±10 minutes is within 1.5°F — the same QC NWS applies before a value reaches CF6. Isolated spikes from radiational flashes or sensor noise get dropped. Once CLI publishes, a later observation overrides it only if it’s more than 0.5°F more extreme (post-CLI cooling / warming).
Bracket-edge margin
Kalshi weather brackets are 1°F-wide ranges anchored on a half-degree (e.g. B85.5 means the high settles between 85 and 86 inclusive). Settlement uses rounded CF6 numbers, but the underlying temperature is continuous.
The bracket-edge margin is the two-sided rounding distance from the precise projection to each flip line. For a high of 85.42°F against a B85.5 bracket the rounded high is 85, the lower flip line (round-down to 85) sits at 84.5 and the upper flip line (round-up to 86) sits at 85.5:
We display the narrower of the two margins on the grade card. In the example above, the projection is essentially against the upper flip line — a 0.08°F move at 4 PM tips the bracket. That’s a thin margin and it shows up in the grade.
Quality grade
Every bracket we observe gets a letter grade. The grade is a function of four inputs, all computed at observation time and never backfilled:
- Bracket-edge margin — narrower of the two flip-line distances. Below 0.4°F caps the grade at B regardless of the other inputs.
- Source confidence — has CLI published? are all four RMK 6h windows closed and consistent? is the projection still leaning on hourly METAR?
- Liquidity — last-24h volume on Kalshi for that bracket. We don’t boost the grade for high volume but we cap it when there isn’t enough to leave on without re-pricing it.
- City tier — see below. Tier-2 cities get a half-grade haircut on narrow brackets.
The output is one of A+, A, B, or no grade. A+ is reserved for markets where the projection is locked (post-RMK-4-group), the margin is at least 1.0°F, the city is Tier 1, and 24h volume on the bracket is at least 2,500.
City tiers
City tier is a tracking-quality classification. It says nothing about how interesting a market is — only how confident we are in the projection for that station. Tiers are re-derived monthly from a rolling 30-day backtest with production-realistic filters (window-fully-today + RMK sanity).
- Tier 1. 100% within ±1°F across the last 30 days of resolved settlements. Narrow brackets are OK. 17 cities at the time of writing (KATL, KAUS, KDAL, KDCA, KDEN, KHOU, KLAS, KLAX, KMIA, KMSP, KNYC, KOKC, KPHL, KPHX, KSAT, KSEA, KSFO).
- Tier 2. One ±2–5°F miss in 30 days. Half-grade haircut on narrow brackets. KMDW and KBOS at the time of writing.
The dataset behind the v3 backtest was 391 city-days. Overall accuracy: 99% within ±1°F, 94% flawless, mean bias −0.04°F. The full city-tier history is on the scoreboard.
Calibration
Every published forecast lands in validation_cf6 with a forecast_id, the predicted probability, and the resolving market. Outcomes attach automatically when Kalshi settles. Nothing is retroactively edited.
The reliability curve on /calibration buckets forecasts at deciles and plots realized frequency against predicted probability. Perfect calibration is the diagonal; the 95% confidence band is binomial.
NWS bias adjustment
NWS gridded forecasts are passed through a per-city bias table (nws_bias.json) built from 1,278 city-days of resolved settlements May–Nov 2025. The bias is model − settled, so negative means NWS under-predicted the true high. We do not recompute the bias retroactively when settlements come in — the bias table used at forecast time is preserved with the forecast.
Brier score
For a forecast p and outcome y ∈ {0, 1} the Brier score is (p − y)²; the page-level Brier score is the mean. Lower is better. 0 is perfect. A constant forecast of 0.5 produces a Brier score of 0.25 — the baseline you need to beat. Our 90-day rolling Brier across all categories was 0.082 as of the last build.
We also publish log loss (−[y log p + (1−y) log (1−p)]) and mean calibration error (the average vertical distance from the realized curve to the diagonal). All three are visible on /calibration.
Whale categories
The Whale Watcher (Pro) classifies aggressive trades using these category buckets. We publish the rules here so they’re inspectable.
- Position taker. A single fill that consumes ≥ 5% of the prior 24-hour volume on a single bracket.
- Ladder. Five or more fills on the same bracket in < 60 seconds, total size ≥ 1,000 contracts.
- Edge runner. Fills that lift the ask through ≥ 2 cents of an empty book.
- Sweeper. A fill on both sides of a binary that crosses through the midpoint.
These are descriptions, not recommendations. We label them and let you decide whether to look closer.
Update cadence
| Surface | Free tier | Pro |
|---|---|---|
| /hot feed | 15-minute delay | Realtime (sub-60 s) |
| Quality grade feed | Tomorrow’s grades 24h delayed | Live as observations land |
| Whale Watcher | Not available | 10-second cadence |
| Calibration scoreboard | Live, full resolution | Live, with CSV / API export |
| Daily AI brief | Preview — 1 paragraph | Full text · ~7 AM local |
Corrections log
When we ship a fix that affects published numbers we log it here. Older entries are in the repo history.
- 2026-05-04. Cache-first observation reader; new
nws_observationstable backs the cascade so a flaky NWS API doesn’t drop us to FAA-only. - 2026-05-01. Trend-aware Mode-3 filter: 5-minute readings outside the T-group envelope by > 3°F are now accepted only when two or more neighbouring readings within 1°F support the trend.
- 2026-04-21. v3 backtest re-derived city tiers; KMDW and KBOS dropped to Tier 2. Apr 21 KDCA RMK-cap regression patched (the 20Z peak fell after the last closed 18Z window).
- 2026-04-18. Dallas observations routed to
KDFWbecause Kalshi settles at DFW, not Love Field. The mathclaw map and the settlement station now agree. - 2026-04-09. Three-tier source priority introduced after the KMDW B71.5 loss — a 2:23 PM 71°F sub-hourly peak CF6 caught but hourly METAR missed.