Elite Endorsement Experiments about COVID-19: Two Cheers for Null Results

There is a lot of social science research about COVID-19, both in the United States and around the world. Much of it focuses on the role of elite messaging in affecting pandemic-related beliefs and behaviors. Most scholars believe—instinctively and intuitively—that elite messaging is a fundamentally important driver of behavior and public opinion about the pandemic. Speaking just personally, I believe this.

But at the same time, a wealth of new research uses some of our best analytical tools to test the effects of elite messaging on public opinion about the pandemic. Specifically, this research uses elite endorsement experiments to test whether different elites giving the same message, or the same elites giving different messages, changes public opinion on COVID-19 or has any effect on COVID-19 related behaviors. And the stylized finding from all of this research is that experimentally-manipulated elite endorsements do not affect public opinion or behavior. These are null results: these papers generally find no evidence to reject the null hypothesis that elite endorsements have no effect on public opinion or behavior. Some of these papers also report evidence that this effect is, if anything, small (in other words, they reject the null hypothesis that the effect is large).

To borrow a phrase from a colleague, elite endorsement experiments about COVID-19 are “a big nothingburger.”

I can speak from some experience. I have been involved in three separate research projects that test the effects of elite endorsements on some COVID-19 related thing: this article with Shana Kushner Gadarian and Sara Wallace Goodman on partisan endorsements and COVID-19-related policy attitudes, this article with Nick Kuipers and Saiful Mujani on encouraging Indonesians to pray from home, and another one which is not quite ready for primetime. Three projects, three nothingburger sliders.

And I am not alone. Here is a selection of articles and pre-prints that I pulled from Google Scholar:

COVID-19 has compelled officials to institute social distancing policies, shuttering much of the economy. At a time of low trust in government and high political polarization, Americans may only support such disruptive policies if recommended by politicians of their own party. A related concern is that some Americans may resist advice coming from “elite” sources such as government officials, public health experts, or the news media. We test these possibilities using novel data from an April 2020 online survey of 1,912 Pennsylvania residents. We uncover partisan differences in views on several coronavirus-related policies. Yet overall, respondents report strong support for social distancing policies and high levels of trust in medical experts. Moreover, a survey experiment finds no evidence of more negative reactions to or less support for social distancing policies when they are advocated by elites, broadly defined. Instead, respondents over 65 prove more likely to adopt expert-advocated positions.
from Bhanot and Hopkins

Public health communication play an important role in the fight against COVID-19. We used five well-established psychological levers to improve on the efficacy of two posters used by the French authorities (one on protective behaviors and one on proper handwashing). The five levers were: simplification (streamlining the posters), sunk costs (emphasizing the costs already paid to fight the pandemic), morality (emphasizing the duty to help others), self-protection (emphasizing the personal risks), and disgust (pointing out and illustrating that not following the protective behaviors or proper handwashing had consequences that should trigger disgust). We tested on a large (N = 3000) nationally representative French sample whether versions of the posters using these levers, compared to a control condition, were clearer, better recalled, and increased people’s intention to follow the posters’ recommendations. On the whole, there were no effects of the manipulations on any of the measures. The only consistent pattern was that the control protective behavior poster was better recalled than the alternatives (except for the simplified version), possibly because it contained one fewer message. The lack of effect on behavioral intentions might be attributed to the potential saturation in terms of health communication at the time of the experiment. Our results–mostly null and potentially negative–confirm the importance of testing interventions before using them in a public health campaign, even if they are grounded in successful past interventions.
from Hacquin et al

Our expectation was that respondents in the treatment groups would favour, or disfavour, the incumbent and assign blame to government for the pandemic compared with the control group. We observe no such results. Several reasons may be adduced for this null finding. One reason could be that public health is not viewed as a political issue. However, people do think health is an important policy area (>85% agree) and that government has some responsibility for health (>90% agree). Another reason could be that people view public health policies through partisan lenses, which means that health is largely endogenous, and yet we find little evidence of polarisation in our data. Alternatively, it could be that the global nature of the pandemic inoculated politicians from blame and yet a majority of people do think the government is to blame for the spread of the pandemic (~50% agree).
from Acharya et al

Media critics frequently complain about the tendency of reporters to cover political news using partisan conflict or partisan game frames, which describe policy disagreement as sites of partisan conflict where the parties can score “wins” or “losses.” Such frames, thought to decrease trust and increase partisan polarization, may be particularly dangerous when used in the coverage of public health crises such as the COVID-19 pandemic. We report a survey experiment where 2,455 respondents were assigned to read coverage of the pandemic that was framed in non-partisan terms, in terms of partisan conflict, or as a game where one party was winning and the other losing. Contrary to expectations, we find no effect of these frames across a broad range of opinions about and actions related to the pandemic, with the exception of a small negative effect of partisan game-framed coverage on the desire to consume news about the pandemic. These results suggest that partisan framing may not have negative effects during a public health crisis or, alternately, that such effects are difficult to detect in real-time using traditional survey experiments.
from Myers

Does emphasizing the pandemic as a partisan issue polarize factual beliefs, attitudes, and behavioral intentions concerning the SARS-CoV-2/COVID-19 pandemic? To answer this question, we conducted a preregistered survey experiment with a “questions as treatment” design in late March 2020 with 1587 U.S. respondents recruited via Prime Panel. Respondents were randomly assigned to answer several questions about then-president Donald J. Trump and the coronavirus (including receiving an information cue by evaluating one of Trump’s tweets) either at the beginning of the survey (treated condition) or at the end of the survey (control condition). Receiving these questions at the beginning of the survey had no direct effect on COVID-19 factual beliefs, attitudes, and behavioral intentions.
from Spälti et al

These are just the first such experiments that I found, but uniformly, the headline result is no effect of the elite endorsement treatment.

Now, this is not a proper or complete review of the literature, nor is this a meta-analysis. It could be that these papers are not representative of the full universe of results from COVID-19 endorsement experiments. But there are enough elite endorsement experiments with null results out there to ask, what is going on?

Two Cheers for Null Results

It is mostly good news that we are learning about so many of these nothingburger elite endorsement experiment results. It reflects a positive development in quantitative social science: the dissemination of null results. I do suspect that there are still a number of papers out there that will not be published because there are null results, and indeed some of these abstracts above are from pre-prints that have not yet, as far as I can tell, be accepted for publication in a peer-reviewed journal. But that so many null results are being published marks important progress in the social sciences’ ongoing battle against the so-called file-drawer problem, in which findings that do not pass an arbitrary statistical significance threshold are not accepted for publication, so that the observed distribution of published findings are not representative of what we have learned.

But the news is not all positive. I think it is now appropriate to ask what we are doing here with all of these elite endorsement experiments, and where do we go from here.

Rarely the Question is Asked, Is Our Discipline Learning?

If we are serious about combating the file-drawer problem, the goal of publishing null results is not simply to establish that some experiments were run that did not produce the anticipated effects of our interventions, but rather to learn from the results of those experiments. The goal, in other words, is to reason from the results of those experiments to the broader class of substantive problems that we wish to understand through experimental manipulations.

So let us construct a world in which we have run 10 experiments that are designed to test whether something about elite endorsements affects COVID-19 health behaviors, and each of the 10 experiments returns a null result. What do we conclude? Let us consider two starkly different possibilities.

Elite endorsements don’t matter
Elite endorsements matter, but elite endorsement experiments do not show how they matter

There are other possible answers too, but let us consider these two as the two most extreme things that one might reasonably conclude from our 10 null-result elite endorsement experiments. When I look at all the null results, what should I conclude?

You can probably anticipate where I’m going. The results of the experiments themselves cannot differentiate between (1) and (2). But more pointedly, I am starting to believe that these elite endorsement experiments cannot do this specifically because they are not strong or pointed enough to be dispositive of anything. They certainly cannot falsify any theory (my own experiments included). I continue to believe (2), that elite endorsements matter, and if I am really and truly honest my own account of how elite communication matters is almost entirely unchanged by the fact that my coauthors and I have found no evidence that there is no difference in beliefs or behaviors in response to manipulating the elites who endorse a particular pro-social health behavior.

So what is the research program of elite endorsement experiments on COVID-19? Imre Lakatos has an answer: a degenerate research program.

Let’s quote the Stanford Encyclopedia of philosophy on research programs:

What is it for a research programme to be progressive? It must meet two conditions. Firstly it must be theoretically progressive. That is, each new theory in the sequence must have excess empirical content over its predecessor; it must predict novel and hitherto unexpected facts. Secondly it must be empirically progressive. Some of that novel content has to be corroborated, that is, some of the new “facts” that the theory predicts must turn out to be true. As Lakatos himself put the point, a research programme “is progressive if it is both theoretically and empirically progressive, and degenerating if it is not”. Thus a research programme is degenerating if the successive theories do not deliver novel predictions or if the novel predictions that they deliver turn out to be false.

This is actually still rather too optimistic a take, though. I don’t think this work has delivered novel predictions, yes, but I don’t think that any theory’s predictions have even been shown to be false! The null results from elite endorsement experiments are vacuous.

My hardest interpretation of elite endorsement experiments is that they are games. The game is, can I use this methodological tool (survey experiment) to find evidence that is consistent with an interesting conclusion? No one takes the methodology seriously enough to believe it is an honest test of any theory, but it is a methodology that produces unbiased estimates of causal effects, and we do favor those.

Lots of Games Out There

I reach an analogous conclusion thinking about the results of a prominent recent audit experiment, published in the high-profile Proceedings of the National Academy of Sciences of the United States of America, that found that recipients of an email inviting them to take a survey were less likely to do so if they came from a sender with a common Black name* than if they came from a sender with a common White name. This is framed as a test of the existence of what the authors term “paper cut discrimination,” a kind of widespread everyday racial discrimination that the authors say “is exhibited by all racial/ethnic subgroups—outside of Black people themselves—and is present in all geographic regions in the United States.”

I have been critical of this article because there is more variation among the selection of “Black names”** than there is between White- and Black-names, as I show in the plot below:

I think that readers’ conclusion would be very different had they presented this result.

But forget that point. Is this test of everyday discrimination strong enough for me to learn from a null result? Not at all: under no circumstance would I look at this figure that I just produced, and conclude that paper cut discrimination doesn’t exist. Instead, it is neat that the authors are able to use this well-respected tool to uncover evidence of paper cut discrimination. They won the game.

Where is Progress?

Where to go from here? My hunch is that the discipline of political science needs a reckoning with elite endorsement experiments. Certainly we should start talking about just how many COVID-related null effects results there are out there.

But we should also take stock of whatever research program we think these experiments are contributing to, and try to design experiments that have greater substantive and theoretical stakes. An experiment that matters is one that changes what you think when you see any result, not one that confirms what you believe when you see positive results.

NOTES

* I really don’t like the “Black-sounding name” phrasing. The authors are very good about explaining that what they have done is to identify names that are statistically more common among Black versus White Americans. But in the end, the internal logic of the study relies on the belief that a survey respondent infers the racial identity of the sender from the name alone.

** “Black name”, again, is just really an awkward phrasing. Andre Jefferson is not a Black name in any objective sense. It is a name that, in the U.S. Census, is statistically more common among Black Americans than among White Americans. The same in reverse with Nicholas Austin.