The 2-4-6 Task
Peter Wason published “On the Failure to Eliminate Hypotheses in a Conceptual Task” in the Quarterly Journal of Experimental Psychology in 1960 (Vol. 12, pp. 129–140). The task is deceptively simple.
Participants were told that the sequence “2-4-6” conformed to a rule, and their task was to discover the rule by generating additional three-number sequences. The experimenter would tell them whether each proposed sequence conformed to the rule. When they were confident they had discovered the rule, they announced it.
The actual rule was simply “any three ascending numbers.” But participants consistently formed hypotheses like “even numbers ascending by two” or “even numbers in arithmetic progression,” then tested those hypotheses by proposing sequences that would fit their hypothesis: “6-8-10,” “14-16-18,” “100-102-104.” Each proposal confirmed the hypothesis, and participants grew increasingly confident in incorrect rules.
What almost no one did spontaneously was test a sequence that would falsify the hypothesis. For example, “3-5-7” (which would have shown that odd numbers also fit the rule), or “1-5-20” (which would have shown that non-equal-interval sequences also fit). Falsifying tests are more informative than confirming tests, but people default to the strategy of generating examples consistent with their current hypothesis. This is what Wason called a “positive test strategy”: testing by seeking confirmation rather than refutation.
The Selection Task
Wason introduced the selection task in a 1966 paper and gave it its canonical treatment in “Reasoning About a Rule” in the Quarterly Journal of Experimental Psychology in 1968 (Vol. 20, No. 3, pp. 273–281). The task uses a conditional rule, “If there is a vowel on one side of the card, then there is an even number on the other side,” and four cards showing A, K, 4, and 7.
To test whether the rule holds, which cards should you turn over? The correct answer is A (a vowel card, where you need to check if even is on the back) and 7 (an odd number card, where you need to check if a vowel is on the back, which would violate the rule). Most participants correctly choose A but also choose 4, the even number card, which cannot violate the rule regardless of what’s on the other side and provides no information about whether the rule holds.
The 4-card selection is the confirming choice: if the rule is “vowels go with even numbers,” then the even-number card feels like it should be checked. But checking it produces only a confirming or null result: it can tell you that the rule holds in one more case, but not that the rule is false. The 7 card, the disconfirming choice, is the one with actual logical power, and it is the card most frequently left unselected.
Professional Manifestations
- Due diligence. Acquisition due diligence conducted by the team that proposed the acquisition is structurally biased: they have formed a hypothesis (this is a good acquisition) and are now gathering evidence. Without explicit processes that require them to generate and pursue falsifying hypotheses (reasons the acquisition will fail, evidence that the synergy assumptions are wrong), the process will disproportionately surface confirming evidence.
- Market research and strategy validation. Research questions that ask “what do customers value about our product?” are confirmation-seeking. Research questions that ask “what would make customers switch away from our product?” are falsification-seeking. The latter produces more strategically useful information, but organizations disproportionately commission the former, especially after a strategy has been committed to.
- Performance review and hiring. Interview questions that probe the hypothesis “this candidate is strong in X” confirm the hypothesis if the candidate answers well. Questions that probe alternative hypotheses, like “what would I see if this candidate struggles under pressure?”, generate evidence that can genuinely update the assessment. Structured interviews with adversarial questions alongside positive ones reduce but don’t eliminate the positive test bias.
(1) Pre-mortem analysis: ask ‘imagine this decision has failed; what went wrong?’ to force generation of falsifying scenarios. (2) Adversarial collaboration: assign a team the explicit role of building the strongest case against the proposed decision. (3) Red team/blue team structures: two teams independently analyzing the same question with opposite starting positions. (4) Decision journaling: record assumptions before outcomes are known, which prevents hindsight bias from retroactively confirming the decision was obviously right.