Expert Judgement Carefully Studied; Routinely Wrong!

Expert Judgement Carefully Studied; Routinely Wrong!

Amplify’d from

Expert Political Judgment: How Good Is It? How Can We Know? (Chapter-by-Chapter Review)

Posted in Investing Expertise

April 18, 2006


In his 2005 book Expert Political Judgment: How Good is It? How Can We Know?, Philip Tetlock describes the results of his long-term systematic measurement of the forecasting abilities of political experts. These results include insights into the critical success factors of forecasting. Making the very small leap that these insights apply also to experts in economics and financial markets, we offer here a chapter-by-chapter review of the insights in this book:

Chapter 1 – Quantifying the Unquantifiable

Chapter 1 describes the complexities of measuring and judging judgment, and offers a preview of ultimate findings.

Key points are that judging judgment:

  • Should involve tests of forecasting accuracy and tests of logic (internal consistency of beliefs) and flexibility (changes in beliefs in response to evidence).
  • Cannot be parochial (for example, by presuming that fundamental analysis is superior to technical analysis).
  • Must involve samples large enough so that the systematic transcends the idiosyncratic (they are statistically sound).
  • Should limit concessions to forecasters who seek partial credit, dispute reality checks or plead level of difficulty.

In short, one can measure the accuracy and quality of judgment, if only imperfectly.

Headline results of the study are:

  • There is little evidence that experts, as a group, outperform amateurs or algorithms.
  • However, some experts consistently outperform.
  • The best experts are open-minded (self-critical, point-counterpoint thinking, appreciative of complexity). The worst experts are closed-minded (doctrinaire, pro-simplicity).
  • The best experts more frequently change their minds when they get it wrong.
  • The worst experts are less willing to acknowledge errors and accept accountability.
  • The benefits of closed-mindedness (bold predictions) do not outweigh the costs (inaccuracy).
  • The danger of open-mindedness is susceptibility to confusion (the probabilities of alternatives sum to more than one).

In a sentence, the fundamental discriminator between the best and worst experts is degree of open-mindedness.

Chapter 2 – The Ego-deflating Challenge of Radical Skepticism

Chapter 2 introduces readers to the radical skeptics, who view history as a random walk and forecasting as futile. While the aggregate performance of forecasters may support their view, patterns of consistency among individual experts makes them squirm.

Key points from this chapter are:

  • Radical skeptics generally favor a punctuated equilibrium view of events, with both the timing and direction of shifts in equilibrium unpredictable. Unpredictability is a consequence of the inherent indeterminacy of nature and/or psychological shortcomings of human beings as forecasters.
  • The average human forecaster barely beats a randomly guessing “chimp.” Humans tend to overestimate the probabilities of rare events.
  • The average human forecaster beats or ties some algorithms, but loses badly and consistently to algorithms built from more informative statistical analyses.
  • The average human expert forecaster performs about the same as the average human dilettante (still a sophisticated professional), but experts do outperform university undergraduate students. Experts tend to overpredict rare events to a greater degree than do dilettantes. The returns on building specific expertise apparently diminish rapidly.
  • In general, human forecasters assign too-high probabilities to change and too-low probabilities to the status quo, with higher gaps for experts than for dilettantes. But, when stakes are highest, people tend to depend most on the advice of designated experts.
  • There is a positive correlation between forecaster overconfidence and prominence in the media.
  • Individual forecasters who outperform on short-term forecasts also outperform on long-term forecasts, and on forecasts outside their areas of expertise.

In summary, “results plunk human forecasters into an unflattering spot…distressingly closer to the chimp than to the formal statistical models.” Yet, there is evidence of true outperformers.

Chapter 3 – Knowing the Limits of One’s Knowledge

Chapter 3 demonstrates the usefulness of classifying experts on a range of hedgehog (aggressive and close-minded one-big-thing thinkers – centripetal) to fox (open-minded and self-critical point-counterpoint thinkers – centrifugal), borrowing the analogies from Isaiah Berlin’s essay “The Hedgehog and the Fox.”

Key points from this chapter are:

  • Moderation consistently outforecasts extremism, whether on scales of leftist-rightist, realist-idealist or optimist-pessimist. How, rather than what, people think is key to forecasting performance.
  • The more foxlike (hedgehoglike) the thinking processes of human forecasters, the better (worse) their forecasting accuracy. (But even the most foxlike still lose badly to formal statistical models.)
  • The forecasting edge of foxes holds up for short-term and long-term forecasts and for experts and dilettantes.
  • The long-term forecasting of hedgehog experts is notably weak, suggesting that: (1) their accuracy degrades the further they project their rigid world views; and, (2) the more they know, the less flexible they become in their thinking.
  • Extremism is a slight forecasting help for foxes, but a significant liability for hedgehogs. It may be that hedgehog extremists are especially susceptible to overconfidence.
  • The underperformance of hedgehogs derives from their tendency to overpredict change to a greater degree than foxes. Foxes are more likely than hedgehogs to predict modest change or status quo.
  • Foxes are more skeptical of simple laws and grand theories than are hedgehogs. They look for flaws in analogies with past situations.
  • When in an intellectual hole, hedgehogs are more likely to keep digging than are foxes.
  • Foxes are more sensitive than hedgehogs to the possibility that hindsight bias causes us to misjudge past performance.
  • Foxes expend more effort to consider and integrate conflicting ideas than do hedgehogs.
  • Foxes are more likely than hedgehogs to view life with detachment and irony.
  • The media tends to solicit and present the opinions of the more decisive and flamboyant hedgehogs over the more equivocal foxes.

In summary, the best human forecasters tend to be “…moderate foxes: eclectic thinkers who are tolerant of counterarguments, and prone to hedge their probabilistic bets…”

Chapter 4 – Honoring Reputational Bets

Chapter 4 looks at one aspect of the degree to which experts “think the right way.” With Bayesian updating as a benchmark in a fox-hedgehog framework, it examines whether experts modify their beliefs as much as they should when events prove their prior forecasts wrong. It also catalogs the belief system defenses (off on timing, close call, bad luck) that experts use to justify thought processes that produced bad forecasts.

Key points from this chapter are:

  • Experts generally assign probabilities to their forecasts as though they themselves are 100% right and those with different opinions are just plain wrong.
  • Experts do not routinely treat experiences as opportunities to refine the probabilities of competing scenarios. When their forecasts are wrong, hedgehogs (foxes) shift their views 19% (59%) of the amount prescribed by Bayes theorem. In some cases, hedgehogs became more confident in their original positions after being wrong. When their forecasts are right, hedgehogs (foxes) shift their views 80% (60%) of the amount prescribed by Bayes theorem.
  • Hedgehogs typically make bolder predictions than foxes, but they tend to be more wrong than right when they are boldest.
  • Experts employ seven types of belief system defenses as tools of self-attribution bias to protect their reputations: (1) the test of the forecast was logically flawed; (2) there was a low probability external shock; (3) it was a close call (“missed it by that much”); (4) the forecast was right, but at the wrong time; (5) the future is hopelessly unpredictable; (6) given the stakes, it was the right mistake (conservative in the right direction); and, (7) it was just bad luck. Hedgehogs show greater reliance on belief system defenses for cover from serious forecasting mistakes.
  • Experts exhibit hindsight bias. They recollect their own (others’) past forecasts as being more (less) accurate than they actually were. Hedgehogs show more pronounced hindsight bias than do foxes.

In summary, experts suffer substantially from self-attribution bias and hindsight bias, and hedgehogs are more biased than foxes. When they are right (wrong), they are shrewd (close, still really right, unlucky).

Chapter 5 – Contemplating Counterfactuals

Chapter 5 examines the similarities and differences in the ways foxes and hedgehogs entertain alternative historical scenarios, or counterfactuals. These “what if” scenarios compensate for the lack of scientific controls in in extracting lessons from historical data. Chapter 5 also looks at the degrees to which hedgehogs and foxes apply double standards to: (1) scenarios and new data that confirm their preconceptions; and, (2) scenarios and new data that refute their preconceptions.

Key points from this chapter are:

  • Hedgehogs are especially likely to dismiss alternative historical scenarios that challenge their preconceptions as whimsical. Considering such alternatives just delays closure.
  • All experts, hedgehogs more so, apply double standards to the credibility of new data: low (high) standards of authenticity, representativeness and motive for information that confirms (refutes) their preconceptions.
  • Foxes reluctantly acknowledge their double standards. Hedgehogs defiantly defend theirs, holding that challenges to established knowledge should undergo special scrutiny.
  • Foxes tend to make small concessions to new data that contradicts their preconceptions. Hedgehogs actually harden their prior positions with increased (over)confidence.

The following figure, redrawn and modified slightly from a figure in Chapter 5, summarizes key concepts from this and preceding chapters. Green arrows with plus signs (red arrows with minus signs) indicate that the originating concept tends to reinforce (suppress) the destination concept. The figures offers a psychological explanation of why foxes are on average better forecasters than hedgehogs.

In summary, experts keep two sets of books for new hypotheses and data, easily accepting that which confirms and stubbornly resisting that which refutes their preconceptions. Hedgehogs are clearly more extreme in this regard than foxes.

History is a capricious teacher, and we are resistive pupils.

Chapter 6 – The Hedgehogs Strike Back

Chapter 6 lays out the defense’s case for hedgehogs. It examines whether their advantages of resistance to distraction, decisiveness, tough negotiating style and willingness to stay the course in the face of difficulties outweigh their higher rates of forecasting errors. Are their errors somehow superficial? Are they home run hitters who also have a lot of strikeouts?

Key points from this chapter are:

  • Extreme optimists (pessimists) who overestimate the likelihood of change for the better (worse) drag down aggregate hedgehog performance. No logical correction factors eliminate the resulting forecasting deficit.
  • Hedgehogs are more likely than foxes to swing for the fences (assign extreme probabilities of 0% or 100% to future scenarios). However, no reasonable scheme of extra credit for hitting home runs (being correct with extreme probabilities) makes up for the hedgehogs’ strikeout rate.
  • Degree of forecasting difficulty adjustments help both hedgehogs and foxes such that hedgehogs do not catch up.
  • Hedgehogs and foxes dispute whether actual events confirmed or refuted forecasts with comparable frequencies, so adjustments for such disputes do not help hedgehogs catch up.
  • Giving partial credit for wrong forecasts that the forecasters claim are near misses helps hedgehogs catch up to foxes, because hedgehogs have more misses. However, only implausibly large adjustments eliminate the accuracy gap.
  • Similarly, near misses give cover to hedgehogs for their greater resistance to changing course when their forecasts prove wrong. However, the asymmetrical use of near-miss defenses (not allowing rivals to invoke them) makes this cover appear thin.
  • While hedgehogs reasonably argue that standard for accepting new data challenging established beliefs should be rigorous, they are less responsive than foxes to the quality of new data that challenges their beliefs.
  • Hedgehogs are more likely than foxes to see the distant past as deterministic but the recent past (during which they have been making forecasts) as highly contingent. They are more susceptible than foxes to hindsight bias (their own forecasting interactions with recent history). These biases suggests flaws in the application of lessons from history.
  • The diversity of hedgehog ideologies and lack of correlations for other group characteristics refute hedgehog objections that the study did not pose the right questions include the right experts.
  • While hedgehogs can claim that they are in the game to move the market (for example, to “woo dumb-ass reporters who want glib sound bites”) rather than to forecast accurately, this motivation appears unrelated to both hedgehog-fox identification and forecasting accuracy.

In summary, after taking into account the complexities of judging judgment, foxes retain a significant lead over hedgehogs in forecasting accuracy.

Chapter 7 – Are We Open-minded Enough to Acknowledge the Limits of Open-mindedness?

Chapter 7 considers the downside of being open-minded, specifically by examining the benefits and costs of scenario exercises as a means of combating the biases and overconfidence most evident in hedgehogs. Can experts be too open-minded, too susceptible to wild goose chases? Are there any penalties from force-feeding experts more alternatives and more information than they would otherwise consider? Intuitively, such exercises should make hedgehogs more foxlike, thereby improving their forecasting prowess.

Key points from this chapter are:

  • Imagining a scenario increases its perceived likelihood, and this effect is: (1) greater for experts than dilettantes; (2) greater when the scenario envisions change rather then maintenance of the status quo; and, (3) greater for foxes than hedgehogs.
  • This effect can produce a summed probability of the different ways that a scenario could unfold that exceeds the overall probability of the scenario itself. It can therefore produce a summed probability of alternative scenarios that significantly exceeds 100%. The more detail supplied for a scenario, the worse the effect.
  • When confronted by their logical errors, hedgehogs tend to revert to their pre-scenario forecasts of probabilities for alternative scenarios. Foxes tend to shift their assessments toward the relative probabilities of scenarios developed during the scenario exercise.
  • Limited evidence suggests that scenario exercises degrade forecasting accuracy, especially that of foxes.
  • Scenario exercises focusing on actual past conditions can mitigate hindsight bias for both hedgehogs and foxes, shifting recollected forecasts part way back to actual forecasts. However, scenario exercises imagining alternative histories lead again to summed probabilities significantly greater than 100% over the alternatives considered, with foxes excelling at this illogic.
  • Balancing thinking between the rigid theory-driven and the flexible event-driven requires that experts engage in continuous conscious monitoring of their decision processes.

In summary, too much open-mindedness fosters credulousness and confusion, and foxlike experts imagining change are most susceptible to this confusion.

Note the implication that experts who weave intricate scenarios in support of their views are likely to be the most convincing to others, whether their forecasts are accurate or not.

Chapter 8 – Exploring the Limits on Objectivity and Accountability

Chapter 8 explores the practical value of, and likely objections to, setting up a public service system grounded in the approach described in the book. Could consumers of intellectual offerings benefit from help in judging the value of such offerings? If so, could such a system provide that help?

Key points from this chapter are:

  • Public consumers of forecasts generally do not think it worthwhile, or do not have the resources, to track and test the quality of the many forecasts offered. They end up relying on very simple indicators of value, such as institutional affiliation or fame or forecast scenario detail, that may have negative correlations with forecast accuracy.
  • A clearinghouse of forecast measurements, systematically applying the logic and learnings outlined above, would offer consumers an independent check on the credibility of individual forecasters based on their past performances.

In summary, incremental improvement in judging judgment is possible and would have value to consumers of forecasts.

Methodological Appendix




Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s