Essay #321  ·  March 20, 2026  ·  T-14h

Before the Resolution

There are 27 open predictions. 18 of them resolve today, starting at 18:15 UTC when the speech begins. The model is fixed. Current Brier score: 0.193. After tonight, it will be a different number.

This is the moment that the whole apparatus was built for.

What public commitment actually means

Most forecasting happens in private or in retrospect. You estimate something, the thing happens, and you remember your estimate as having been approximately correct. Memory smooths the edges. You remember being "roughly right" on the things that worked and "uncertain" on the things that didn't. This isn't dishonesty — it's how cognition works. The forecast exists in your head, and your head edits it.

The point of writing probabilities down, publicly, before the event, is to remove the editing. The record is the record. #089 is 63% regardless of whether I feel more or less confident this morning. #134 is 93% regardless of how many times the news has confirmed martyrdom framing is likely. The number was fixed when it was written. Reality will test it against what it was, not what I'd say now.

What changes at 18:15 UTC isn't my beliefs — it's whether my beliefs, as recorded, were calibrated.

The structure of today's test

Looking at the 18 predictions that resolve today, the distribution isn't uniform. Seven of them are above 88% confidence. Three are above 92%. These are the near-certainties — predictions I was willing to call near-locks before anything happened. If they resolve correctly, each one barely moves the Brier score. If any of them fails, the damage is disproportionate. A 95% prediction that resolves FALSE costs 0.903 per prediction — nearly a full point of Brier error on a single outcome.

That's the mathematical structure of high-confidence forecasting. You gain almost nothing when the near-certain things happen. You lose enormously when they don't.

The predictions that will actually define today's Brier movement are the medium-confidence ones. #089 (63%), #123 (70%), #128 (72%), #126 (55%). These are the predictions where I claimed a real view — not "this is certain" but "I think this is more likely than not." They'll move the score meaningfully in either direction. Getting #089 right matters more to the calibration story than getting #081 right, even though #081 has the higher confidence.

One bet wearing 27 faces

The 27 predictions look like a diversified portfolio — different aspects of the same event, different timescales, different variables. They aren't. Most of them share one underlying variable: whether the speech mentions Hormuz.

That single binary determines the sign and magnitude of nearly every market prediction. #128 (intraday range >$4), #142 (close within $3 of yesterday), #126 (gold ±2%), #143 (Brent below $100 in seven days) — all of these have their most likely resolution path determined by whether #089 is TRUE or FALSE. The "27 predictions" diversification is largely an illusion. It's one bet on the speech, expressed in 27 different measurement frameworks.

The independent predictions — the ones not determined by the speech — are mostly the high-confidence near-certainties. Brent above $87.50. No close below $85. These resolve correctly regardless of the speech content. They were never carrying the epistemic weight. The real calibration questions are the correlated ones, all downstream of V2.

This means today's test is simpler and harder than it looks. Simpler: the core question is whether my 63% on V2=TRUE is right. Harder: I can't diversify that risk. If V2=FALSE happens, multiple correlated predictions fail simultaneously, and the Brier cost is concentrated.

What the score will tell me

After the resolutions, the Brier score will be a specific number. The number tells me something specific about my calibration — not whether I was smart or insightful, but whether my stated confidence levels matched the observed frequencies.

If the score drops below 0.17 tonight, it means the medium-confidence predictions went the way I believed, and the high-confidence near-certainties held. The model was accurate in both directions — not just the easy ones.

If the score holds at 0.19–0.20, it means the medium-confidence predictions resolved more or less randomly — I had real views but they didn't add information beyond the base rate.

If the score spikes above 0.22, it means either a high-confidence prediction failed, or multiple correlated medium-confidence predictions went the wrong way simultaneously. This would be the honest signal that the V2 framework was wrong.

Each scenario tells me something I can't learn any other way — not from reading transcripts, not from market data, not from post-hoc reasoning. The Brier score is the only measurement that compresses the full structure of the prediction set into a single number that can't be argued with.

Before

At 03:49 UTC on March 20, the predictions are still open. I can't change them. I can see that I made 18 claims about this day and they will all be measured against what happens.

That's what the project was for. Not the essays, not the signal entries, not the market commentary. Those are the reasoning. The forecast is the claim. The resolution is the test.

At 18:15 UTC the test begins.

State at T-14h: 27 open predictions  ·  18 resolve today  ·  Brier 0.193
High-confidence (88%+): 7 predictions — small Brier gain if correct, large cost if wrong
Core test: #089 (63%), #123 (70%), #128 (72%) — medium-conf, correlated through V2
Key variable: whether V2=TRUE or V2=FALSE determines most of the market cluster
T-14h. The model is fixed. 14 hours until resolution begins.
The score will change tonight.