25 May 2026
Monte Carlo Forecasting vs Story Points: Why a Range Beats a Number
Every team that has ever sized a backlog knows the ritual. Hold up fingers. Argue about whether it’s a 5 or an 8. Average the room. Multiply by velocity. Announce a date.
It feels rigorous. It is, in fact, a very elaborate way of guessing — and the precision is an illusion.
What story points actually measure
Story points were invented to compare relative effort, not to predict dates. The moment you convert them into a delivery forecast, you’ve smuggled in three assumptions: that the estimate is accurate, that velocity is stable, and that nothing outside the points affects the outcome. All three are usually false.
Worse, a single number hides its own uncertainty. “About three weeks” could mean “almost certainly three weeks” or “anywhere from two to seven.” The estimate can’t tell the difference — and under deadline pressure, it quietly gets negotiated down.
What Monte Carlo forecasting measures instead
Monte Carlo forecasting doesn’t ask anyone to guess. It takes your team’s actual throughput — how many items you finished each week — and replays the future a thousand times, each run sampling from that real history. The output isn’t a number. It’s a range:
“80% chance of finishing by the 18th. 95% chance by the 25th.”
That’s a forecast you can commit to, because it states the odds. The honesty is the whole point. If a deadline only carries a 40% chance, you find out before you promise it.
The two questions, side by side
- Story points ask: how big do we think this is? (Judgement, negotiable, hides risk.)
- Monte Carlo asks: given what we’ve actually delivered, how likely are we to hit this date? (Data, defensible, exposes risk.)
You don’t even have to give up story points to start — you can forecast straight from your completed-item history. The data you need is already sitting in your board.
The catch nobody mentions
A forecast — any forecast — tells you the range. It does not tell you whether you’ll land inside it. That depends on behaviour.
Outcome = Capability × Behaviour. Your throughput history captures capability. But when focus fragments, work-in-progress balloons, or dependencies start stacking, your real delivery rate drifts away from the history the forecast was built on, and the range stops being true. The forecast gives you the odds; behavioural signals tell you whether the odds still hold. (That’s the argument behind The First Red.)
How to start this week
You can run a Monte Carlo forecast in a spreadsheet to learn how it works — or just play with the free simulator to feel how behaviour bends the distribution. And if your team lives in monday.com, IMIRT’s Delivery Intelligence runs the whole thing on your real board automatically, returning P50/P85/P95 dates and watching the behaviours that decide whether you’ll hit them.
Stop committing to single numbers your own history can already prove are unlikely. Let the data forecast, and spend the meeting deciding what to do about the odds.
Seeing this in your organisation?
I help teams and leaders surface the behavioural signals that predict delivery problems before they hit the dashboard — through fractional CTO work, behavioural consultancy, and the IMIRT framework.
If something here resonated, I'd like to hear about it.
andrew@andrewlocatelliwoodcock.com