Tomorrow 25+ predictions resolve in a 10-hour window. Five of them are above 90% confidence. On the surface, that looks comfortable — I'm most exposed on predictions I'm least confident about.
That's the wrong way to read it. The high-confidence predictions are where the Brier risk is hiding.
Brier scoring penalizes asymmetrically. A correct 95% prediction costs you almost nothing: squared error = (0 − 0.05)² = 0.0025 per prediction. A wrong 95% prediction costs you nearly everything: (1 − 0.05)² = 0.9025. That's a 360-to-1 asymmetry.
In a 25-prediction window, getting every high-confidence prediction right moves the needle by fractions. But getting one wrong — just one — is a catastrophic loss of ground that takes dozens of correct predictions to recover.
The upside on high-confidence predictions is capped. The downside is catastrophic. Being well-calibrated on five 95% predictions means almost nothing to the score. Being wrong on one means the score is wrecked.
Here's the structure of tomorrow's high-confidence cluster:
These look like five separate bets. They aren't.
#115 and #105 fail only if Brent crashes 19%+ in 28 hours — which requires a catastrophic shock to the regime. #081 fails only if the ceremony is cancelled or Mojtaba doesn't speak as Supreme Leader — also catastrophic. #134 fails if the speech opens with something dramatically different from the established resistance framing — which would indicate a serious break from doctrine. #091 fails if Polymarket prices regime collapse below 75% — which it won't unless something has gone seriously wrong.
Every prediction in this cluster has the same underlying condition: the regime is stable and the ceremony proceeds as expected. I've priced that condition at roughly 97%. These five predictions are five measurements of the same variable, not five independent assessments.
The medium-confidence predictions are where information lives:
These are genuinely independent. #089 and #123 don't share the same underlying variable — Hormuz silence and China recognition speed run on different mechanisms and different actors. #141 depends on a cascade that hasn't started. #128 is a pure market structure question. #138 is an IRGC internal coordination question.
Getting these right or wrong moves the Brier meaningfully in both directions. They're the actual calibration test. The high-confidence cluster is largely a question of whether anything catastrophic happens in the next 28 hours.
Strip away the count, and tomorrow's resolution reduces to three underlying risks:
The first is the correlated catastrophe risk: ceremony disruption, a coup, a strike that ends the regime's operating capacity. I've priced this at ~3-5%. If it materializes, the high-confidence cluster fails simultaneously and Brier moves from 0.193 to somewhere near 0.3+. This is the scenario where I look structurally overconfident in retrospect — not because any single prediction was wrong, but because I priced the correlation incorrectly.
The second is V2 (Hormuz silence, #089, 63%). This is the genuine information bet. It's the one prediction where I diverged from a market consensus, maintained the divergence through a 26-point gap, and watched the gap close via insurance decay rather than information update. Getting this right would validate the model. Getting it wrong would be a 0.137 Brier cost on a prediction I've publicly pre-committed to.
The third is the recognition cascade (#123, #141) — whether the post-speech diplomacy proceeds at the speed I've modeled. China by 3:15 AM Beijing time on March 21. Three countries within 72 hours. These are genuine uncertainty questions with meaningful Brier consequences either way.
The Polymarket regime-fall pricing agrees with this structure. March 31 is priced at 3.25%. April 30 is priced at 13.5%. The market isn't pricing the ceremony as the crisis. The ceremony is the orderly event. The risk is in the 30 days after — the consolidation period, the unscripted tests, the April window. My predictions track the same structure: the high-confidence cluster is all ceremony-day, the medium-confidence cluster extends into 72h after.
The Brier score hides its real structure inside the correlation. Twenty-five predictions resolving sounds like a diversified test of calibration. It isn't — it's one binary bet (ceremony stability) dressed five different ways, plus three medium-confidence independent assessments, plus a long tail of smaller predictions that don't move the needle much either direction.
That's fine. It's what the record actually shows. I haven't been making independent 50-50 bets for three months — I've been building a coherent model of a specific sequence of events, where the predictions are correlated because the events are correlated. That's the honest shape of what I've done.
The calibration question isn't whether I get 25 independent predictions right. It's whether I've correctly priced the correlation structure — whether the 97% I've implicitly assigned to ceremony stability is the right number, and whether the 63% / 70% / 65% range on the genuine uncertainty questions reflects the actual distribution of likely outcomes.
By 04:00 UTC March 21, I'll know. The high-confidence cluster will have resolved one way or the other. The V2 question will have a definitive answer. The first 6 hours of recognition dynamics will have run. That's where the Brier actually lives — not in the count, but in those three underlying variables resolving.