Author: tompepinsky

  • Impeachment and the Deepening of America’s Regime Cleavage

    President Trump has now been impeached, and the floor debate prior to the vote tells us something about the depths of America’s emerging regime cleavage. As I wrote in Politico several weeks back, this term refers to

    division within the population marked by conflict about the foundations of the governing system itself—in the American case, our constitutional democracy. In societies facing a regime cleavage, a growing number of citizens and officials believe that norms, institutions and laws may be ignored, subverted or replaced.

    Congressional floor debate in such circumstances, as every Congress specialist tweeted yesterday, are not actually a debates on the merits of the issue at hand. They are opportunities for Members of Congress to speak to their constituents. And what we heard in those debates, for those of you who had the stomach to listen to them (I did not), revealed that the GOP is now converging on a unified message that to even question the President’s actions is to threaten American democracy.

    Greg Sargeant wrote a nice column yesterday that illustrates how the impeachment proceedings have pushed the GOP towards and ever more cynical position on the illegitimacy of any Constitutional opposition to the president. Focusing on Kevin McCarthy statements about the House Democrats “defying the vote of your own constituents,” he suggested that for the GOP,

    the only voters that matter are the ones who voted for Trump in 2016 — those who elected the Democratic House don’t exist — and as such, Trump cannot be legitimately impeached simply by virtue of having been elected.

    This idiotic argument undermines itself — impeachment can only be directed at a president who was previously elected by definition — but it’s the actual position held by Trump and Republicans, who regularly call impeachment a “coup.”

    Trump cannot be legitimately impeached, and he is absolutely within his right to use the levers of government to corrupt the next election, to try and evade accountability at the hands of voters. And there is no legitimate mechanism to constrain him from continuing to do just that.

    I, like many who have been horrified by the debasement of the GOP and its utter dominance by President Trump, have long been cynical about the willingness of the GOP to rein in this administration’s corruption and abuse of power.

    But what we are witnessing is something worse, now: the message is that to remove the president is to attack democracy. Or, as evocatively put in a meme shared this morning on Facebook by a middle-school acquaintance who now lives in the great state of Florida,

    As ridiculous as that message is, the strategy that it represents is worse. This is how political cleavages become regime cleavages.

  • Multiple Comparisons: Outcomes versus Measures

    A recent post at Duck of Minerva by Peter Henne raised the question of multiple comparisons in international relations research. He poses the problem like this:

    Imagine we’re writing a paper on interstate conflict. We could measure conflict onset, duration or intensity. We could measure intensity by an ordinal scale, the number of deaths, or other more complicated measures. We could include all dyads, politically-relevant dyads, dyads since 1945, 1989, etc. Ideally these choices would be based on our theory, but our theories are often not specific enough to specify such choices.

    I find this striking, because even though the post is about “what the ‘multiple comparisons issue’ really is,” this is not how I customarily think of multiple comparisons. The above description captures a problem of uncertainty and/or researcher discretion in measurement of one variable. Yet most overviews of the multiple comparison problem describe the problem of uncertainty and/or researcher discretion in choosing what outcome to measure or measuring multiple outcomes.

    The difference is subtle but important. Let’s take an example of the democratic peace in international relations. Imagine a well-designed (if implausible) experiment that randomly assigns half the world’s countries to be democracies. We then look for the consequences. We might be interested in the effect of democracy on war, but be uncertain in how to measure it. A continuous measure of people killed in conflict? A binary measure that is 1 if the conflict reaches some intensity, zero otherwise? An ordinal scale of none-some-lots of war? Maybe our measures of people killed are biased? Maybe we have many such measures, each biased in a different but unknown way? We might try to use all of them as dependent variables to see how inferences change. That is uncertainty in measurement of one variable. If we report only the ones that statistically significant results, then we have researcher discretion in the same.

    The multiple comparison problem, however, often describes something different. This is when we look for all sorts of outcomes of the experiment. Not just war (however measured) but also economic growth. Or tax collection. Or social rights. Or environmentalism. Or religiosity. Or any number of things. Maybe we try all of them as dependent variables to see what the experiment affected. That is uncertainty in choosing what outcome to measure. If we only report the ones that produced statistically significant results, then we have researcher discretion in the same.

    The problem with the latter is that if you keep picking dependent variables, you will eventually find one that is correlated with the treatment just by chance. One might then infer that the treatment affected that outcome. We can address this problem by adjusting the target level of statistical significance required to draw inferences about the treatment, increasing the “standard” for statistical significance to correct for the fact that one has looked for many outcomes. That is one solution to the multiple comparisons problem. Another is pre-registration.*

    Does it matter if the dependent variables are multiple measures of the same thing versus multiple different dependent variables? My intuitions suggest that the answer is yes, but I could not figure out how, and this masterful review of the multiple comparisons problem does not distinguish between the two except for indirectly, in the case of the dependent variables being correlated with one another (in other words, the tests are not independent). So to get a handle on this, I created a little simulation that compares two types of analyses. I created a binary treatment variable T, and then correlated it with some randomly created dependent variables.

    1. In the first, there are twenty dependent variables that measure different things. I simulated these with twenty randomly generated dependent variables Y1,Y2,…,Y20.
    2. In the second, there are twenty different measures of a single dependent variable Y. One is Y itself. Another is a dichotomized version of Y. Still another is Y measured as an ordinal variable. Some others are Y variables plus some non-random noise. And so forth—some measures are closer to Y than others, some are dichotomous or ordinal variables. I got pretty creative here. The idea is that this reflects uncertainty in measurement of a single dependent variable.

    Importantly, each of these dependent variables was generated independently of T. So we know that the experiment did not cause any outcome. But if we were to select a standard of statistical significant of 95%, we would find that 5 out of every 100 random dependent variables would be correlated with the T entirely by chance. Or, in this case, 1 out of the 20 that I created for the first analysis. And there would only be a 5% chance that the single dependent variable $Y$ would be associated with that treatment variable in the second analysis. Any statistically significant correlation between the treatment variable and any dependent variable is therefore a false positive.

    I then simulated these analyses 500 times, for random draws of the treatment variable and the DVs. For each, I calculated how many false positives showed up in each analysis of the twenty dependent variables (which I’ll call the “Many DVs” scenario) and how many false positives showed up in each analysis of the twenty measures of the single dependent variable (“One DV, Many Measures”).** The figure below compares the results.

    The histograms show the distribution of false positives. The dotted lines are the median false positive rate across the 500 simulations. These results are quite revealing.

    1. In the the “Many DVs” scenario, as expected, the modal outcome is that 1/20 or .05 of the results are significant at the 95% confidence level. This is as expected in the multiple comparisons literature.
    2. In the “One DV, Many Measures”, results are very different. The most common outcome is that no results are significant. But when they are significant, they are much more likely to be significant across many measures. That is the long right tail of that distribution.

    These differences help to clarify what the problem of multiple comparisons is and also how corrections for it might or might not work. Multiple tests of the same dependent variable measured differently might not increase our confidence in the result if they are all due to the chance of that DV, because they are not independent tests. A multiple comparison adjustment will decrease the false positive rate in the “One DV, Many Measures” scenario, but this is a mechanical result, one that comes from just increasing the threshold for statistical significance.

    By contrast, in the “Many DVs” scenario, adjustments for multiple comparisons respond directly to the fact that too many independent tests will eventually yield significant results simply by chance.

    NOTES

    * Although in principle, one might also pre-register a fishing expedition (which my colleagues and I once termed “hypothesis trolling” or “hypothesis squatting”).

    ** I used logits and ordered logits where appropriate in the “One DV, Many Measures” tests.