The Causal Inference Revolution and Statistical Reasoning

Judea Pearl and Dana Mackenzie’s The Book of Why is a gem of a book that introduces Pearl’s graphical calculus for causal discovery to a general audience. It has, as is typical, occasioned a pointed debate about whether the causal graphs approach is the best way to represent the science of causality—as opposed to, say, a statistical or econometric approach. Previous instances may be found on Pearl’s blog. The present case can be found in Andrew Gelman’s blog, where he reviews The Book of Why.

Gelman reserves his criticisms for Pearl and Mackenzie’s claims that his methods have provided solutions to problems that have previously had no solutions:

Ummm, I’m pretty sure that scientists could do all these without the help of Pearl!

It comports with the reactions from Kevin Gray from over the summer:

some of what he claims is new or even radical thinking may cause some statisticians to scratch their heads since it’s what they’ve done for years, though perhaps under a different name or no name at all.

Such reactions comport with my own too. The case study of smoking and cancer is fascinating because it provides an excellent introduction to the problem of causal inference when experiments are impossible. But this claim…

Millions of lives were lost or shortened because scientists did not have an adequate language or methodology for answering causal questions (p. 19).

No, millions of lives were lost because tobacco companies had a financial interest in marketing a product that kills people, and when confronted with evidence of the link between smoking and lung cancer, worked to sway public opinion and cultivated scientific uncertainty. Applied science is not an engineering problem, it is a social process.

What The Book of Why does extraordinarily well—even more so than Pearl’s earlier book, Causality—is to introduce the power of the causal graph as a tool of causal reasoning. Over the past thirty years, Pearl and collaborators have developed a complete language for causal reasoning that allows us to compute, given a set of causal relationships encoded in a path diagram, if a causal effect can be identified. It is a nonparametric identification engine. This language makes some things very obvious that are otherwise very hard for students to see, and when I teach students about causal identification, I rely heavily on the causal graphs approach.

  1. X ← U → Y is much more straightforward than, say, Cov(X,U)≠0 for capturing the intuition behind conditional independence in a regression framework.
  2. The “front-door criterion” is really easy to explain using causal graphs. Hard otherwise (see, relatedly, Marc Bellemare’s recent post).
  3. The conceptual distinction between a collider and a confounder is very clear; it is far superior to discussions of control variables.

But as to whether this is just a useful language of representation versus a revolution, I’m not so sure. The core innovation is a graphical algorithm to compute the nonparametric identifiability of causal relations between variables given assumptions encoded in relationships among other variables in a system. But we still require knowledge about those other relations. The problem of circularity is apparent. As Gelman writes,

If you think you’re working with a purely qualitative model, it turns out that, no, you’re actually making lots of data-based quantitative decisions about which effects and interactions you decide are real and which ones you decide are not there.