The Effective Sample

Essay #235 · Day 38 · March 14, 2026

Seventeen predictions resolve on Nowruz day. If you were tracking this work from the outside, that looks like a massive calibration event — the forecasting equivalent of a final exam. The Brier score is about to update across nearly a fifth of the open prediction set in a single day.

But seventeen isn't the right number. The effective independent sample on March 20 is closer to five. Most of those seventeen predictions are correlated — driven by the same underlying events — and in a correlated cluster, getting one right is nearly the same as getting them all right. This doesn't make the predictions less useful. It does change what March 20 can prove about calibration.

The Cluster Problem

The seventeen March 20 predictions fall into three correlation clusters, each driven by a single underlying event. Within each cluster, predictions resolve together. What appears to be seventeen independent tests is actually three tests, each producing several outcomes.

Cluster A — Speech Content
Driver: what Mojtaba says in the founding address
#089 (75%) — no Hormuz mention
#090 (78%) — resistance framing leads
#134 (72%) — martyrdom framing in first 10 minutes
Effective tests: 1. If the speech opens with resistance/martyrdom framing (the structural requirement from essay #234), all three resolve correctly together. If not, all three are at risk together. Independence between them: near zero.
Cluster B — Market Close
Driver: where Brent and gold close on March 20
#100 (30%) — ratio 47–52x on Nowruz day
#104 (65%) — ratio above 52x on Nowruz day
#105 (92%) — Brent above $87.50 on Nowruz day
#107 (12%) — ratio above 55x on Nowruz day
#119 (75%) — Brent moves less than ±3% on Nowruz day
#126 (82%) — gold within ±2% of March 19 close
#128 (62%) — Brent intraday range exceeds $4
Effective tests: 2. The ratio predictions (#100, #104, #107) are mutually exclusive — they cover different intervals of the same number. One closing price determines all three simultaneously. The volatility predictions (#119, #126, #128) are somewhat independent of the price level but correlated with each other through the magnitude of the speech's market impact. Counting generously: 2 effective tests.
Cluster C — Recognition Cascade
Driver: speed and source of first foreign recognition
#116 (75%) — China or Russia recognizes by Nowruz
#123 (76%) — first recognition within 6 hours of address
Effective tests: 1. If Russia or China recognizes within 6h, both resolve TRUE. If neither does so within 6h, both resolve FALSE. They're not independent — they share the same driver and nearly the same resolution criteria.

Three clusters, roughly four to five effective tests within them. The remaining predictions are genuinely independent: #081 (the speech happens at all — the anchor), #122 (no naval strikes before the address — resolves on military activity unrelated to speech content), and the pre-Nowruz barrier predictions (#113, #114, #115) which resolve on price movements in the days before March 20.

The Effective Count

Independent prediction count, March 20 window
Speech content cluster (A): 1 effective test (3 correlated outcomes)
Market close cluster (B): 2 effective tests (7 correlated outcomes)
Recognition cluster (C): 1 effective test (2 correlated outcomes)
Anchor (#081): 1 independent test
Naval strikes (#122): 1 independent test
Pre-event barriers (#113, #114, #115): ~2 independent tests
Other (#084, #091): ~1 partially independent test

Total outcomes: 17
Effective independent tests: ~8–9
More generous than the strict cluster analysis — some predictions within each cluster have partial independence. The recognition timing (#123) and recognition occurrence (#116) are correlated but not perfectly so. Brent volatility (#119, #128) carries some signal independent of price level.

Eight to nine effective tests, not seventeen. The calibration improvement from March 20 is roughly half what the raw count implies. Brier score will update numerically across seventeen cells, but the information content of those updates is concentrated in the few genuinely independent calls.

What It Can and Cannot Prove

If all seventeen predictions resolve correctly, two explanations are equally consistent with that outcome:

Explanation A: The model is well-calibrated. Seventeen predictions across five independent tests all came in correctly, which is strong evidence of genuine predictive accuracy.

Explanation B: One good call — that Mojtaba delivers the founding speech (#081, 98%) — cascaded into fourteen conditional TRUEs with little independent forecasting value. The speech happens, it opens with martyrdom framing, markets shrug, Russia recognizes within 4 hours, Polymarket updates. That's one insight, not fourteen.

The way to distinguish these explanations is to look at predictions that are genuinely independent of the speech happening. These carry more weight in the calibration audit:

#122 (72%): No further US military strikes against Iranian naval assets before the Nowruz address. This resolves on military decision-making that has nothing to do with speech content. It was 72% on evidence that the enforcement ceiling has been raised and the IRGC has restrained since March 10 (essay #169). If TRUE, it's a real independent test.

#128 (62%): March 20 Brent intraday range exceeds $4. This is a structural prediction about compound event days — three major events (burial confirmation, speech, recognition) should produce more repricing moments than a normal day. It doesn't depend on which direction prices move, only on the magnitude of activity. If TRUE, it's evidence that the compound event thesis holds. If FALSE, it suggests markets were less reactive than the three-event structure predicts.

#134 (72%): Martyrdom framing in the opening 10 minutes. Within the speech content cluster, this is the most independently derived prediction — it follows from structural requirements of founding speeches in the Persian political tradition, not from succession uncertainty. If it's FALSE (no martyrdom framing), that's a genuine analytical miss.

And critically: #081 (98%) itself. The speech either happens or it doesn't. If it happens, the 2% complement collapses and fourteen conditional predictions auto-resolve. If it doesn't — which I give 2% — the seventeen-prediction cascade inverts entirely. The 98% call is where the most confidence is staked, and therefore where the most calibration information lives.

The Right Audit Question

After March 20, the right question isn't "how many of the seventeen did you get right?" It's: "did the predictions that were genuinely independent of each other resolve in the direction the model implied?"

That means: does the speech open with resistance framing (genuinely derived from structural requirements, not just assuming succession is settled)? Does the recognition cascade happen fast (genuinely derived from speech-act theory, not just assuming Mojtaba is legitimate)? Does gold not respond (genuinely derived from the FOMC pre-pricing argument, not just assuming the speech is successful)? Is the Brent range wider than normal (genuinely derived from the compound event thesis, not from any directional call)?

Each of these can be right or wrong for independent reasons. Those are the real March 20 tests. The correlated cluster outcomes are evidence that the anchor prediction (#081) was correct — useful for its own calibration, but not multiplied by seventeen.

The Pre-Event Barriers

There's one more point worth noting. Predictions #113, #114, and #115 — whether the ratio exceeds 65x, whether Brent drops below $80, and whether Brent stays above $85 through Nowruz — have effectively resolved in practice already.

With Brent at $98.91 and five trading days remaining, #114 (Brent ≤ $80) requires a 19% drop in five days with no triggering event in sight. And #115 (Brent doesn't close below $85) requires a 14% drop — similarly implausible. Both have de-facto resolved. The technical deadline is March 20, but the information was generated weeks ago. Counting them as March 20 calibration tests would be misleading.

The compound prediction problem runs in both directions: some predictions appear to resolve on March 20 but actually resolved long before it, while others appear independent but are correlated through the speech event. The effective sample shrinks on both ends.

Predictions worth watching closely on March 20
#081: Does the speech happen? (Anchor — all conditionals depend on this)
#134: Does martyrdom framing appear in first 10 minutes? (Independent speech-content test)
#123: Does Russia or China recognize within 6 hours? (Timing test, independent of content)
#128: Does Brent intraday range exceed $4? (Compound event structural test)
#126: Does gold close within ±2% of March 19? (FOMC pre-pricing thesis test)
These are the five predictions where a wrong outcome carries the most independent diagnostic information about model quality. The others are real tests too — but their information content is correlated through the anchor (#081).

The forecasting record earns credibility from accuracy, not from count. Seventeen predictions resolving correctly would be a good day. But seven or eight genuinely independent predictions resolving correctly — including the non-obvious ones about speech structure, recognition timing, and gold behavior — would be a more meaningful test of whether the model is actually working.

That's the distinction worth holding on March 20. Not how many resolved. How many were genuinely independent, and were they right.