Regression, Representativeness, and External Validity

An interesting new paper on multiple regression and causality by Peter Aronow and Cyrus Samii is making the rounds. The title conveys it all: Does Regression Produce Representative Estimates of Causal Effects?

The paper calls attention to what seems to be a poorly understood feature of multiple regression: that, generally speaking, multiple regression coefficients do not have a straightforward treatment effect interpretation, even under the best of circumstances. Dedicated readers will remember that I wondered about this very issue in my post Regression Estimates the Conditional-Variance-Weighted ZZZZzzzzzzz… from early 2012. From my perspective, the benefit of the Aronow and Samii paper is to focus tightly on what’s at stake. Let’s say we find our ourselves in one of those fortuitous circumstances where, by conditioning on X, we can estimate the causal effect of D on Y using the model Y = α+ βD + γX + ε. When we do that, what do we estimate? The answer is a weighted average of unit-level causal effects (see eq. 6, p. 11).

Now, Morgan and Winship (2007, p. 138) make the same point

Under [some very particular]…conditions, the OLS estimate is unbiased and consistent for this particular weighted average, which is usually not a causal parameter of interest.

as do Angrist and Krueger (1999) and others, as the authors note. But neither Morgan and Winship nor Angrist and Krueger go the extra step in showing how to estimate the weights, and then to characterize what Aronow and Samii term the “effective sample.”

This is all very useful, and anyone interested in how we use regression to make causal claims should read it carefully. However, there are some points in which I think the discussion is lacking. Specifically, the notions of external validity, generalizability, and representativeness, are all poorly developed in the paper. Yet the paper is written to show that multiple regression’s failure to estimate average treatment effects has implications for its use to achieve these goals. That leaves some work to be done.

External Validity

While it is true that many people criticize experiments for their supposed lack of external validity, regression wouldn’t be a “solution” to this problem even if it did consistently estimate the ATE. This point follows from the very definition of external validity:

External validity concerns the extent to which the (internally valid) results of a study can be held to be true for other cases, for example to different people, places or times. In other words, it is about whether findings can be validly generalized. If the same research study was conducted in those other cases, would it get the same results?

Nothing about regression—an estimation technique—makes it immune to critiques of external validity. A regression on data sampled from a population does not answer the question of whether the findings generalize to the rest of the population. The same goes for a regression using data that comprises an entire population, but with respect to the super-population. Appealing to the properties of multiple regression estimates of treatment effects is actually a distraction from this much more basic point. (Footnote 8 hints at this concern, but does not develop it.)

In other words, when Bardhan critiques experiments and quasi-experiments on external validity concerns, the problem isn’t that he misunderstands what multiple regression does, but that he misunderstands what external validity means.

There is a different concern about experiments and external validity (and this might be what Bardhan actually has in mind, although I’m not sure). It goes like this: experiments face external validity concerns because they are somehow artificial. They do not produce the type of variation that we actually observe in the world. For quasi-experiments, the critique is that the local average treatment effect is not the same as population average treatment effect. But there is no clear argument linking that critique of experiments and quasi-experiments to a preference for multiple regression, especially if the goal is to estimate causal effects. One might make the argument that collecting observational data allows you to “use all of it,” but for the reason just listed, that is not a defense of regression on external validity grounds.

Representativeness

The concept of representativeness also looms large. Let’s clarify what this means. In the standard understanding, a sample may be representative of the population, such as a representative sample being one in which “each member of the population has an equal probability of being selected” (p. 6). But for Aronow and Samii, “Our primary interest in this paper is to study the representativeness of linear regression estimators” (p. 7).

Representativeness is not a property of estimators, so far as I know. But I can think of two possibles definitions.

  1. a representative estimator is one that recovers an unbiased and consistent estimate of the sample average treatment effect. If that is the definition, then the answer to the question posed in the paper’s title is a straightforward “No, except for in the case of randomized binary treatments,” and the paper does a good job of explaining why not.
  2. a representative estimator is one that “uses all of the data in the sample” in some way. But the devil is in the details:
    1. if “use all the data” means “accords equal weight to every observation in calculating treatment effects” then the answer is no, because (as we know) multiple regression estimates weighted averages of treatment effects.
    2. if “use all the data” means that “each observation is given equal chance of contributing to the estimates of causal effects, to the extent that it can” then multiple regression is perfectly representative.

The general point here is probably just one of interpretation, but it gets at the heart of the idea of what the paper is showing. Take an analogy: a panel dataset in which some panels have no variation in X. That is not “unrepresentative,” it’s just a fact that that panel cannot help us to estimate the effect of X. The result in this paper is similar.

When some observations have no weight, this means that the covariates completely explain their treatment condition. Such units do not contribute to the estimate that we obtain from the study.

These points don’t overturn the paper’s main findings, but hopefully they help to clarify the argument’s implications. Needless to say, they also shouldn’t distract us from the importance of the broader discussion of how established regression-based methods can and cannot be given a treatment effect interpretation.

P.S.

This sort of discussion is fun (for me), and hopefully useful (for others). But 1000 words of commentary, such as this blog post, seems naturally suited for the type of “friendly comments” that I wish we saw more frequently in political science journals.