Is Quantitative Description without Causal Reasoning Possible?

This week saw the launch of an exciting new journal entitled the Journal of Quantitative Description: Digital Media. Although the bit after the colon delimits topical scope of this particular journal, it is the bit before the colon that is most exciting and which has elicited wide commentary. JQD:DM promises to publish

quantitative descriptive social science. It does not publish research that makes causal claims

This is a big statement, because many if not all mainstream social science journals are increasingly consumed by a focus on causal inference using quantitative methods. To be fair, this has probably been true for a long time now. But the revolution in statistical methods for causal inference in the past forty years has given quantitative social scientists a very sophisticated toolkit for understanding the relationship between statistical procedures and causal claims, such that progress in the latter is now catching up with progress in the former.*

I do not think that anyone seriously holds the position that only causal inference is important. Description has always been essential to the scientific and social scientific enterprise: what is the population of Israel? what is the behavior of the cardinal eating from my bird feeder? and so forth. Yet the task of quantitative description raises an interesting question about the role of causal reasoning in making theoretically relevant descriptive statements.

I will make two assumptions as a starting point:

  1. quantitative description is always theoretical
  2. theoretically interesting tasks of quantitative description involve relating one variable to another variable.**

These assumptions are not assumptions about quantitative methods themselves—one could always simply produce descriptive statistical correlations between, say, refrigerators per capita and infant mortality across Indonesian provinces—but rather about the types of quantitative descriptions that are held to advanced social scientific knowledge. Assumption 1 tells us that we rely on theory to tell us what is potentially informative about a quantitative description, and Assumption 2 tells us that we should focus on what problems arise when we describe relations among variables.***

Under these maintained assumptions, I think that it follows that all quantitative description is done either in the shadow of causal reasoning, or with implicit restrictions on the system of causal relations that the quantitative description partially captures.

Let’s start with a classic example of what seems to be a good quantitative description: creating an index that measures a latent psychological construct. Bill Liddle, Saiful Mujani, and I did this for my 2018 book Piety and Public Opinion, creating what I called a “piety index” designed to capture individual piety across a sample of Indonesian survey respondents. We made this index from multiple variables, and used theory to restrict “what went into” this index, so Assumptions 1 and 2 hold. Isn’t this just descriptive? It is: but note that the grandfather of latent trait analysis, Spearman (1904), proceeded from a model in which the latent construct caused the observable indicators associated with it. This causal claim feels rather innocuous, but it is causal; and any attempt to relate an index of the form that I created with any sorts of other outcomes or correlates must confront some sort of causal model to be interpretable.

Turn to another example: the cross-national relationship between private gun ownership and state terror (a topic I first addressed thousands mass shootings ago). There, I produced descriptive correlations between, well, state terror and private gun ownership, but deliberately asserted that

Of course, these are not estimates of the causal effect of gun ownership (or anything else) on state terror. These are conditional correlations, and there are plenty of reasons why we might believe that the causal relations here are more complicated than what this discussion has implied. 

The point I was trying to raise is that we learn things from these correlations even when we are sure that they are not causal. This, I think, is related to the model that JQD:DM seeks to follow.

But this is not a causation-free analysis! It is interesting only insofar as we can related it to a causal question. We reason through the potential set of causal relations that could have produced that correlation to make sense of what it likely means. A long quote from a follow-up post makes the point (funny enough, it anticipated JQD:DM):

If I were writing an article for a good social science journal, I’d probably stop right here and abandon the project. Thankfully, we have eliminated some of the numerology from quantitative social science in the past two decades, meaning that we cannot wave our magic interpretive wand over a regression table to reach our preferred conclusion. If you want to claim to have identified “the effect of” gun ownership on freedom from state terror, partial correlations will no longer suffice.

But we still learn policy-relevant things from these results even if they do not identify a causal relationship. The first point is to remember that the question of interest is not the average causal effect of gun ownership on state terror (which, for better or for worse, as become the question of interest for quantitative social science research). Instead, our policy question is more squishy: does such widespread gun ownership protect American citizens from tyranny? Here is what we have learned even without an estimate of a causal effect.

1. American citizens aren’t as protected from state terror as we might think.

2. Plenty of countries rate as highly (or more highly) than the U.S. with lower levels of gun ownership.

3. Plenty of countries with lower levels of gun ownership experience far more state terror with lower levels of gun ownership.

4. The partial correlation between gun ownership and state terror disappears when you take regime type and economic development into account.

All of these data are hard to square with the idea that the ubiquity of firearms in the U.S. is protecting Americans from state terror. We can construct a theoretical world in which gun ownership at the levels that we see in the United States today is protecting us from tyranny, but that theoretical world must have a lot of curious features to it to also produce the results from yesterday. 

Understanding what the conditional correlation could have possibly meant implied that we could imagine some sort of causal system from which the qualitative description—the correlation ρY,X is statistically significant, but the correlation ρY,X|W is not—emerged.

There are other examples that I might provide, but I hypothesize that any quantitative description that is held to advance social scientific knowledge in the ways that the journal hopes will be either multivariate measurements of things or tantalizing correlations among things.

Is this bad or wrong? Does it undermine the purpose of JQD:DM? In both cases the answer is no. I reach a different conclusion: that JQD:DM and any journal like it will always confront lurking criticisms that causal reasoning is somehow being smuggled into the quantitative descriptions that they publish. This is a fine problem to have, but I suspect that even a journal explicitly devoted to quantitative description will struggle to police the boundary between descriptive and causal inference.

By way of conclusion, here is a speculative future for journals like JQD:DM. In many if not most cases, there is a lot to be learned from statistical correlations that cannot be given a strong causal interpretation. The standard in most quantitative social science is to target a causal parameter like an average treatment effect or a dose-response function****. The enterprise “fails” if the design does not allow for that target parameter to be identified, and as my example of the paper that I would abandon hints, researchers often will not even try if they know that it is unidentifiable.

Another approach would be to identify a target parameter, a quantitative descriptive fact that is partially informative about that parameter, and a mapping from the former to the latter using assumptions and logical bounds. JQD:DM and journals like it might foreground this sort of approach to highlighting what we learn from quantitative descriptive exercises. A loosely related approach such as that outlined in Little and Pepinsky (2021) might fit nicely under this model as well.


* To make explicit what I mean in this sentence: we have long had sophisticated statistical tools, but without the theory of causality required to attribute causal meaning to them.

** Examples of univariate quantitative description would be finding answers to the questions of “how many balls are in that urn?” or “what is the GDP of Venezuela?”

*** Importantly, observe that time is a variable. To describing how a single variable differs across time is a task relating multiple variables to one another.

**** Or, analogously, a sufficient statistic or an identification region.