Calibration and the 90% Confidence Interval Study
Baruch Fischhoff, Paul Slovic, and Sarah Lichtenstein published “Knowing with certainty: The appropriateness of extreme confidence” in the Journal of Experimental Psychology: Human Perception and Performance in 1977 (3(4), 552–564). The methodology used a confidence interval task: participants answered general knowledge questions by providing a range (a lower bound and an upper bound) they were 90% confident contained the true answer.
A well-calibrated person setting 90% confidence intervals should have their true answers fall inside those intervals 90% of the time. In the research, the actual hit rate was approximately 60%, meaning participants’ intervals were far too narrow for their stated confidence level. The same participants who thought they were right 90% of the time were actually right only about 60% of the time when using their own stated confidence as the benchmark.
90% stated → ~60% actual
When participants set 90% confidence intervals (ranges they were 90% sure contained the true answer), those intervals captured the true value approximately 60% of the time. The gap between stated confidence and actual accuracy is the overconfidence effect, systematic, not random.
Fischhoff, B., Slovic, P. & Lichtenstein, S. (1977). Journal of Experimental Psychology: Human Perception and Performance, 3(4), 552–564.The finding has been replicated in hundreds of studies across domains. Lichtenstein, Fischhoff, and Phillips (1982) reviewed the calibration literature and confirmed that overconfidence is one of the most robust findings in judgment and decision-making research, present in students and experts, across cultures, and across a wide range of domains.
Three Types of Overconfidence
Moore and Healy (2008, “The trouble with overconfidence,” Psychological Review, 115(2), 502–517) systematized three distinct forms of overconfidence that previous researchers had conflated:
- Overprecision. Expressing more certainty than accuracy warrants: the 90%-stated/60%-actual gap in the Fischhoff study. This is the most robust and consistent form, present even when overplacement and overestimation are not. Confidence intervals are systematically too narrow.
- Overplacement. Believing you are better than others, the “Lake Wobegon effect” where most drivers rate themselves above average. This form of overconfidence is domain-dependent and can reverse: for difficult tasks, people tend to underplace (believe they are worse than average), because the difficulty of the task makes others’ performance seem relatively lower as well.
- Overestimation. Believing you will perform better than you actually do. This form also depends on task difficulty and is not universal: for hard tasks, people may underestimate their performance. The planning fallacy, underestimating how long tasks will take, is a form of overestimation about task completion speed.
Professional Consequences
- Forecasting and planning. Project timelines, market forecasts, sales projections, and strategic plans are all subject to overprecision. Schedules that appear to have no slack often have no room for the variance that reality produces because the underlying estimate ignored the systematic tendency to underestimate uncertainty. Reference class forecasting, which anchors estimates to the distribution of outcomes for similar historical projects, partially corrects overprecision by forcing exposure to the actual distribution rather than the inside view.
- Expert overconfidence. A counterintuitive finding from the calibration research is that expertise does not reliably improve calibration and may sometimes worsen it. Experts develop more internally coherent narratives about their domain, which can increase expressed confidence without proportionally increasing accuracy. Tetlock’s research on expert political forecasting found that domain experts’ confidence in their predictions exceeded their accuracy by approximately the same amount as non-experts’. Expertise improves accuracy; it does not necessarily improve the gap between accuracy and expressed confidence.
- Negotiation and commitment. Overconfident negotiators enter with positions that leave less room for compromise than the actual distribution of possible outcomes justifies. Overconfident project leads commit to deliverables with no variance buffer. The cost is asymmetric: overconfident commitments feel fine when they work, and very costly when they fail. Since they fail more often than the overconfident forecaster expected, the average outcome is worse than a well-calibrated estimate would produce.