Identification is Neither Necessary nor Sufficient for Policy Relevance

Via Marc Bellamare, whose blog I generally enjoy, I have come across an interview with the editors of the World Bank Economic Review. It contains this doozy of a warning to anyone doing research on development policy, broadly considered:

“Our main two criteria in selecting papers for publication are rigorous identification and policy relevance. The two go together as we cannot have credible policy recommendations without strong causal inference.”

This statement strikes me as utterly wrong, and beyond that, dangerous. I don’t know if perhaps the editors have a very peculiar understandings of “policy relevance,” or if perhaps they have particular meanings for “credible” or “go together,” but even if so, it’s my duty to disagree. I know, someone is wrong on the internet. Bear with me.

Let me start by clarifying what I take “policy relevance” to mean. I imagine a policy planner (or a bunch of NGO activists, or donors, or a politician, or any kind of decisionmaker) faced with a question of “should I do X in order to accomplish Y?” Call that the policymaker’s question. A research finding is policy relevant if it changes her beliefs about the answer to that question, conditional on her prior beliefs about the answer to that question. That seems to me to be a natural way to think about what policy relevant research looks like. It doesn’t give you 100% confidence that some policy choices will or won’t have some expected outcome, but it should change the policymaker’s beliefs about the (distribution of possible) outcomes from various interventions.

Identification, as I wrote previously, refers to the statement that X causes Y, and we are sure that there’s not some other reason why X and Y covary. That’s obviously helpful for the policymaker’s question. But it does not mechanically produce policy relevance, even by my very lax standard.

Why? Consider the first possibility that what the editors mean is that causal identification is sufficient for policy relevance. Imagine a study that uses instrumental variables to identify the relationship between political fractionalization and local budgetary outcomes. Sounds like a good policy question, right? Imagine that there is some historical reason—say, some colonial policy—why we find more political fractionalization today in some localities than others. Assume that the colonial policy is in fact a good (i.e. valid and relevant) instrument. Now, armed with that assumption (and a bunch of other ones, which I won’t focus on here, let’s just assume they’re all justifiable) we can causally identify the effect of political fractionalization on local budgetary outcomes.

I defy anyone to tell me what the policy relevant conclusion from that study is that can only be reached because the relationship between fractionalization and budgets are identified using IV. The reason why you can’t is because IV only identifies something like the Local Average Treatment Effect (LATE). (If you don’t know what that is, you’re just like 95% of applied researchers using IV to “estimate causal effects.”)** The LATE in this example tells you nothing about what a policymaker should do about political fractionalization relative to good budgetary outcomes today. That is because the LATE is only defined relative to the localities whose fractionalization was plausibly shaped by colonial policy. With a host of further assumptions, you might conclude that some other policy intervention that shapes political fractionalization might matter for budgetary outcomes today. But there’s nothing about causal identification from colonial history itself that produces policy relevance in this case for most policymaker question.

The confusion stems from the sloppy language that applied researchers use when they talk about “the causal effect of X on Y.” There is no single causal effect out there waiting to be estimated, and even if there were, for the vast majority of policymaker questions, the LATE isn’t it. I am not the first to have pointed out that the LATE that IV estimates is usually not the quantity that policymakers want, and Heckman and Urzua are worth quoting on this: “The problem that plagues the IV approach is that the questions it answers are usually defi ned as probability limits of estimators and not by well-formulated economic problems.” I don’t think that this point is broadly understood. “Causal identification” can be done without a well-formulated policy problem. In fact, I think that in the vast majority of cases, that’s exactly how it’s done.

“But wait,” you say, “we at least know that the relationship between political fractionalization and budgetary outcomes is not spurious.” That you do, and that’s a very important thing to know. But that itself is not the policymaker’s problem, and just knowing that this relationship isn’t spurious isn’t enough to solve the policymaker’s problem unless you make many more assumptions too. This isn’t a problem unique to IV, of course, but IV is so commonly invoked as the way to “estimate the causal effect of X on Y” that it’s good to note that even that doesn’t tell us much about policy relevance.

Now consider whether causal identification is necessary for policy relevance. Let’s take a different example. Forget for now the implicit assumption that only quantitative research can be policy relevant. Assume that nothing from a qualitative study of decisionmaking, nor anything from a historical study of how a previous policy was reached or implemented, nor any insights from actual interviews with real people, assume that none of that should ever shape how a policymaker evaluates the likely outcomes of policy choices. (Which of course is nonsense too, but a different problem.) In other words, for now, assume that only quantitative research “counts” as being policy relevant.

Imagine that you’re studying economic opportunity and violence. You have data on counties within a state, or provinces within a country, something like that. Someone shows you four simple correlations:

In some state, violence against women at the county level is positively correlated with income at the county level, but unrelated to inequality. On the other hand, violence against one’s neighbors is positively correlated with inequality, but unrelated to income. That’s all you know. There are no covariates included in a quantitative model, there’s no attempt at causal identification. For all you know, these relationships could be spurious. We think that violence is the product of economic conditions, but the causality might run the other way: places that are really violent against women may end up with lower average incomes.

Still, I would find it absolutely inconceivable that any policymaker would attach zero meaning to these results. It strikes me that these four correlations convey useful information that policymakers ought to know about. It’s consistent with a couple of models of how economic conditions affect behavior, and it explains variation, and those facts alone should shape policymakers’ beliefs about the answers to several policy questions. Of course, one should not conclude something like “without a doubt, if we decrease inequality, that will decrease property crime.” But that’s an unattainably high bar for a result to be policy relevant, and anyway, I don’t see how most research designs that can “identify the causal effect of inequality on property crime” would do that either, for reasons mentioned above.

Some readers will think that I’m perhaps being a bit devious. In the first example, I proclaimed that causal identification does not entail policy relevance, but my standard for what counts as relevant seems to have been really high: something like “the results of this research and no further assumptions.” In the second example, I proclaimed that causal identification is not needed for policy relevance, but my standard for what counts as relevant seems to have been really low: even if further assumptions are needed, it can still be relevant. What’s going on here?

The answer has to do with the asymmetric nature of a necessary versus sufficient conditions. To negate a claim of sufficiency, I need only show an example of insufficency; to negate a claim of necessity, I need only show an example of non-necessity. In other words, I’m sure that there are some policy relevant questions for which an IV model is useful. I’m also sure that there are also some policy relevant questions for which unidentified correlations are not at all helpful. But none of this stuff can be decided in the abstract, as if there is one causal effect to be identified, and a predetermined kind of policy for which it is or is not relevant. Blanket statements like “we cannot have credible policy recommendations without strong causal inference” are not even approximately true. I leave it to the reader to speculate about what the consequences of believing such things are.

Look, the point here isn’t that causal identification is useless for policymaking. I am just as frustrated as the WBER editors are with huge panel models that regress a bunch of endogenous variables on one another. The Credibility Revolution has by and large been a good thing. But claiming that causal identification is needed for credible policy relevance is just wrong. It misunderstands what most statistical models that promise causal identification actually do, and it runs the risk of redefining policy relevance in a way that excludes information about the world that policymakers ought to know about. It’s bad research to conflate causal identification with policy relevance, and while I’m no development expert, I’d also bet it’s bad policy too.

** In an early draft of this paper, a reviewer told me to remove any discussion of the specific effects that IV estimates, because “everyone knows how IV works” and “the discussion of the LATE is too confusing.” I don’t think that those two statements are consistent with one another.