Thanks to the PolMeth listserv, I came across a new paper by Luke Keele and Randy Stevenson that criticizes the causal interpretation of control variables in multiple regression analyses. It’s a really simple argument, really: using directed acyclic graphs (DAG) to interpret the causal structures that underlie multiple regression analyses, they show that there are many situations in which control variables X can help to identify the causal effect of D on Y, but the causal effects of X are not actually identified.
Here are two implications of their argument that they did not explore, but which are worth considering. One for applied work in general, and another for International Relations in particular.
- The fact that conditioning on alternative causal pathways to achieve identification in D usually does not allow for causal interpretations of pathways in X is one more argument against horserace regressions: testing competing theories by putting two or more independent variables D1 and D2 in a regression and then comparing coefficients and t-statistics. (These are different from garbage can regressions, in which a bunch of Xs are dumped into the regression garbage can in order to achieve identification for D.) The complications of attempting to identify D by conditioning on X only expand when attempting to identify both D1 and D2 by conditioning on X or set of Xs. It could be the case, for example, that D2 is identified only by conditioning on D1 and X1, but that D1 remains unidentified without conditioning on X2, and that conditioning on X2 prevents identification of D1—the problem of “conditioning on a collider“. (Someone else can draw that DAG for me.)
- An implication of the previous point has particular relevance for the so-called “Paradigm Wars” in International Relations, what David Lake and others often refer to as the third “Great Debate” in International Relations. There was a time in which quantitative research sought to sort out among paradigms by testing realism, liberalism, and constructivism “against one another” in multiple regression-type analyses. Lately such efforts have become rarer, at least in my read of the literature. Keele and Stevenson’s paper is a reminder that this is a really good thing. To the extent that paradigm wars continue to simmer, they do so at the epistemological or theoretical level, and that is appropriate given the limits of regression-style quantitative research for identifying the effects of multiple causal variables at once.
Marc December 9, 2014
Tom, this is a useful point, but in some sense, it is an obvious one (to me at least). We do a lot of work to identify the causal relationship flowing from D to Y in most modern applications; it should stand to reason that the Xs are our dogs that didn’t bark. That is, if we don’t say anything about them in textbooks or when teaching modern methods, it’s because they are not identified, and they are only included as controls.
That is exactly why I really dislike the “determinants” approach to writing papers (see here: http://marcfbellemare.com/wordpress/10484): In all those papers, the ONLY thing we have are Xs, and no variable is identified as D, with the end result that nothing has any claim to being causal.