Trump Support and Vaccination Rates: Some Hypotheses and Some Data

The United States is not going to meet President Joe Biden’s target of 70% of the eligible population vaccinated by July 4th. This is not because of a lack of supply or capacity: in every state in the country, any eligible adult can get a vaccine.* Although issues of access surely explain why some Americans have not been vaccinated yet, it almost certainly does not explain the United States’ failure to meet President Biden’s target. The bigger issue is that many eligible Americans are choosing not to get vaccinated.

What explains this disappointing result? For a couple of weeks now, people have been noticing that there is a strong relationship between President Trump’s 2020 vote share and vaccination rates. Here is Seth Masket:

But state-level results are still a pretty coarse measures–just compare 71.4% vaccinated in Tompkins County, NY (where I live) to 50.7% in Yates County, NY (not far from here). But we can repeat the same analysis at the county level, across the American states, and here is what we find.

There is a strong negative correlation across nearly every state in the union between county-level Trump vote share in 2020 and vaccination rates, measured using data maintained by the CDC.

This would seem to be pretty clear evidence of a link between Trump support and vaccine hesitancy. But there are a lot of reasons why this correlation might exist that have nothing to do with Trump itself. Here are some alternative explanations:

  • Trump-supporting counties are rural, and rural counties have lower vaccination rates due to supply, capacity, or distance-of-travel issues.
  • Trump-supporting counties how small populations, and in counties with smaller populations the urgency of vaccination is lowers than in counties with large populations.
  • Trump-supporting counties are also Republican counties, so we’re not picking up something particular to Trump, but rather an artifact of partisanship in 2020.
  • Trump-supporting counties are majority white, and whites have lower vaccination rates. Now, this idea gets at the racial dimensions of vaccine hesitancy, although it runs exactly counter to the expectation that vaccine hesitancy is higher–and, critically, vaccine access is lower–among Black and Hispanic populations.

There are plenty of other ideas that we could explore here too. To do so, we can use the tried-and-true method of multiple regression.

The analysis below shows the correlation between county-level vaccination rates (18+) and a range of predictors that can capture the ideas above:

  • Trump Swing: the county-level swing towards Trump between 2012 and 2020, to distinguish between a correlation due to Republican support and a correlation due to Trump support.
  • Black, Hispanic, and Native population shares from the 2019 American Community Survey: to pick up racial and ethnic dimensions of vaccine hesitancy.
  • County-level population (in log terms), also from the ACS.
  • Indicators for how urban or rural a county is.
  • State effects: to capture whatever differences in counties are associated with the state that the county is in.

Here is what happens when we enter these all in a regression.

The findings are clear: vaccination rates are negatively correlated with county-level Trump support in 2020, but not the county-level Trump swing, suggesting that whatever “Trump effect” there is is due to partisanship rather than Trump. Conditional on other variables, we also see lower vaccination rates in counties with larger Black and Hispanic population shares. There is no general relationship between county population or urban-rural factors.

Another way to slice this, though, would be to recognize that there are probably differences between rural counties in New York and rural counties in Wyoming, or Orange County, CA versus Orange County, FL, two large metropolitan counties. To capture this, I’ve created a new analysis that uses “state-by-urban/rural-county” fixed effects. Here is what we find:

Once we allow for different kinds of urban/rural dynamics in different states, we find more evidence for a pure Trump effect, as well as continued evidence for the partisan and racial/ethnic relationships I found above.

How can we make sense of these findings? When we look at the table of scatterplots at the beginning of this post, we do see that the relationship between Trump support and vaccination rates is different in different parts of the country. Why might this be? To investigate, we can estimate a multilevel/hierarchical regression model that allows for the county-level correlations to themselves depend on state factors: state-level population, racial/ethnic share, and so forth. My analysis shows no evidence that those factors explain differences in county-level patterns across states: in other words, knowing the state-level support for Trump doesn’t help us to explain anything about vaccination rates that we cannot figure out using county-level support for Trump, and knowing the state-level Black population share doesn’t give us any more explanatory power than county-level Black population share, etc.

However, we can also check to see if there are geographical differences by allowing county-level correlations to vary by census division, a geographic unit defined by the U.S. Census Bureau.

Estimating such a model produces a mess of coefficients, interactions, and variance components that are hard to interpret. So to see how geography matters, I’ve plotted the estimates for four important variables across census divisions.

There is a ton to learn from this figure, so let’s take our time with it. Each plot shows you a “coefficient” by a census division: read these, for instance, as “the correlation between black population share and vaccination rates in the Pacific division, controlling for other factors.” The lines reflect 95% confidence intervals. We learn that

  • There is always a negative correlation between county-level partisanship and vaccination rates, although the size of this correlation is stronger in some parts of the country (i.e. the West) than in others (i.e. New England).
  • The distinctly Trumpian relationship between Trump support and vaccination rates is confined primarily to the middle Atlantic, the Midwest, the Middle South, and the Mountain regions. The Trump effect seems to be mostly a Rust Belt phenomenon.
  • There is also always a correlation between Black population share and vaccination rates, net of other factors like urban/rural differences, state effects, and so forth. And importantly, this is not just something that we find in the South: it is evident everywhere, and actually tends to be smaller in the South** than in other parts of the country.
  • There is no general pattern that we can see between Hispanic population share and vaccination rates, once we account for other factors in this more comprehensive model.

A fuller and more complete analysis of the political and social correlates of vaccination rates will have to wait for another time. But I have placed all of these data and replication commands online to allow anyone to recreate these analyses, update the data with new or more complete vaccination figures, and add new variables (partisanship of the governor! county-level measures of poverty!) that might further refine these preliminary findings.

NOTE

* The enormous privilege of it all. Right now there are private companies advertising three week trips for Indonesians to travel to the U.S. to get their vaccine, given the slow rollout of vaccines there.

** However, these coefficients are most precisely estimated in the South.

Is Quantitative Description without Causal Reasoning Possible?

This week saw the launch of an exciting new journal entitled the Journal of Quantitative Description: Digital Media. Although the bit after the colon delimits topical scope of this particular journal, it is the bit before the colon that is most exciting and which has elicited wide commentary. JQD:DM promises to publish

quantitative descriptive social science. It does not publish research that makes causal claims

This is a big statement, because many if not all mainstream social science journals are increasingly consumed by a focus on causal inference using quantitative methods. To be fair, this has probably been true for a long time now. But the revolution in statistical methods for causal inference in the past forty years has given quantitative social scientists a very sophisticated toolkit for understanding the relationship between statistical procedures and causal claims, such that progress in the latter is now catching up with progress in the former.*

I do not think that anyone seriously holds the position that only causal inference is important. Description has always been essential to the scientific and social scientific enterprise: what is the population of Israel? what is the behavior of the cardinal eating from my bird feeder? and so forth. Yet the task of quantitative description raises an interesting question about the role of causal reasoning in making theoretically relevant descriptive statements.

I will make two assumptions as a starting point:

  1. quantitative description is always theoretical
  2. theoretically interesting tasks of quantitative description involve relating one variable to another variable.**

These assumptions are not assumptions about quantitative methods themselves—one could always simply produce descriptive statistical correlations between, say, refrigerators per capita and infant mortality across Indonesian provinces—but rather about the types of quantitative descriptions that are held to advanced social scientific knowledge. Assumption 1 tells us that we rely on theory to tell us what is potentially informative about a quantitative description, and Assumption 2 tells us that we should focus on what problems arise when we describe relations among variables.***

Under these maintained assumptions, I think that it follows that all quantitative description is done either in the shadow of causal reasoning, or with implicit restrictions on the system of causal relations that the quantitative description partially captures.

Let’s start with a classic example of what seems to be a good quantitative description: creating an index that measures a latent psychological construct. Bill Liddle, Saiful Mujani, and I did this for my 2018 book Piety and Public Opinion, creating what I called a “piety index” designed to capture individual piety across a sample of Indonesian survey respondents. We made this index from multiple variables, and used theory to restrict “what went into” this index, so Assumptions 1 and 2 hold. Isn’t this just descriptive? It is: but note that the grandfather of latent trait analysis, Spearman (1904), proceeded from a model in which the latent construct caused the observable indicators associated with it. This causal claim feels rather innocuous, but it is causal; and any attempt to relate an index of the form that I created with any sorts of other outcomes or correlates must confront some sort of causal model to be interpretable.

Turn to another example: the cross-national relationship between private gun ownership and state terror (a topic I first addressed thousands mass shootings ago). There, I produced descriptive correlations between, well, state terror and private gun ownership, but deliberately asserted that

Of course, these are not estimates of the causal effect of gun ownership (or anything else) on state terror. These are conditional correlations, and there are plenty of reasons why we might believe that the causal relations here are more complicated than what this discussion has implied. 

The point I was trying to raise is that we learn things from these correlations even when we are sure that they are not causal. This, I think, is related to the model that JQD:DM seeks to follow.

But this is not a causation-free analysis! It is interesting only insofar as we can related it to a causal question. We reason through the potential set of causal relations that could have produced that correlation to make sense of what it likely means. A long quote from a follow-up post makes the point (funny enough, it anticipated JQD:DM):

If I were writing an article for a good social science journal, I’d probably stop right here and abandon the project. Thankfully, we have eliminated some of the numerology from quantitative social science in the past two decades, meaning that we cannot wave our magic interpretive wand over a regression table to reach our preferred conclusion. If you want to claim to have identified “the effect of” gun ownership on freedom from state terror, partial correlations will no longer suffice.

But we still learn policy-relevant things from these results even if they do not identify a causal relationship. The first point is to remember that the question of interest is not the average causal effect of gun ownership on state terror (which, for better or for worse, as become the question of interest for quantitative social science research). Instead, our policy question is more squishy: does such widespread gun ownership protect American citizens from tyranny? Here is what we have learned even without an estimate of a causal effect.

1. American citizens aren’t as protected from state terror as we might think.

2. Plenty of countries rate as highly (or more highly) than the U.S. with lower levels of gun ownership.

3. Plenty of countries with lower levels of gun ownership experience far more state terror with lower levels of gun ownership.

4. The partial correlation between gun ownership and state terror disappears when you take regime type and economic development into account.

All of these data are hard to square with the idea that the ubiquity of firearms in the U.S. is protecting Americans from state terror. We can construct a theoretical world in which gun ownership at the levels that we see in the United States today is protecting us from tyranny, but that theoretical world must have a lot of curious features to it to also produce the results from yesterday. 

Understanding what the conditional correlation could have possibly meant implied that we could imagine some sort of causal system from which the qualitative description—the correlation ρY,X is statistically significant, but the correlation ρY,X|W is not—emerged.

There are other examples that I might provide, but I hypothesize that any quantitative description that is held to advance social scientific knowledge in the ways that the journal hopes will be either multivariate measurements of things or tantalizing correlations among things.

Is this bad or wrong? Does it undermine the purpose of JQD:DM? In both cases the answer is no. I reach a different conclusion: that JQD:DM and any journal like it will always confront lurking criticisms that causal reasoning is somehow being smuggled into the quantitative descriptions that they publish. This is a fine problem to have, but I suspect that even a journal explicitly devoted to quantitative description will struggle to police the boundary between descriptive and causal inference.

By way of conclusion, here is a speculative future for journals like JQD:DM. In many if not most cases, there is a lot to be learned from statistical correlations that cannot be given a strong causal interpretation. The standard in most quantitative social science is to target a causal parameter like an average treatment effect or a dose-response function****. The enterprise “fails” if the design does not allow for that target parameter to be identified, and as my example of the paper that I would abandon hints, researchers often will not even try if they know that it is unidentifiable.

Another approach would be to identify a target parameter, a quantitative descriptive fact that is partially informative about that parameter, and a mapping from the former to the latter using assumptions and logical bounds. JQD:DM and journals like it might foreground this sort of approach to highlighting what we learn from quantitative descriptive exercises. A loosely related approach such as that outlined in Little and Pepinsky (2021) might fit nicely under this model as well.

NOTES

* To make explicit what I mean in this sentence: we have long had sophisticated statistical tools, but without the theory of causality required to attribute causal meaning to them.

** Examples of univariate quantitative description would be finding answers to the questions of “how many balls are in that urn?” or “what is the GDP of Venezuela?”

*** Importantly, observe that time is a variable. To describing how a single variable differs across time is a task relating multiple variables to one another.

**** Or, analogously, a sufficient statistic or an identification region.