Author: tompepinsky

  • Inferring Whether the Polls Were Correct

    Let’s say we want to estimate a quantity \theta. We form an estimate of that quantity \hat{\theta}_{A} = 51, with a 95% confidence interval of (49,53). Let’s say we form another estimate \hat{\theta}_{B} = 49, with a confidence interval of (47,51). And then it is revealed to us that \theta = 50. Which estimate, \hat{\theta}_{A} or \hat{\theta}_{B}, is the correct one? Can we infer that \hat{\theta}_{A} is correct?

    It is easy to see that the answer to the first question is “we can’t tell, the data are equally consistent with both estimates.” The second question is more subtle, but the existence of \hat{\theta}_{B} suggests to us that we ought to be cautious about inferring that \hat{\theta}_{A} is somehow “correct.”

    This toy example reveals something fundamentally rotten in the election polling postmortem.

    Many polling pundits are arguing this week that the polls were “correct” in some sense because polling results produced estimates with confidence intervals (or credible intervals) that captured the final two-party vote share, either nationally or by state. Here is one example but there are many others to find. The above example makes clear that such inference should not be drawn. If we call the election poll aggregates \hat{\theta}_{A}, the results from Tuesday’s election (call them \theta) are equally consistent with the hypothesis that the aggregated estimates were perfectly unbiased and that those estimates were biased upward by four points.

    The general point is this. The confidence interval around the two-party votes share estimate from polls reflects the standard error of the…estimate from the polls. It is not a confidence interval that captures the actual two-party vote share except under the hypothesis that the data generating process that produces the polls is the same data generating process as generates the vote. The same point holds for polling aggregates. We may not infer anything about the accuracy of the polls or the quality of the poll aggregates from the relationship between the election result and some confidence interval except for by maintaining that hypothesis. If we maintain an alternative hypothesis that the polls were systematically subject to substantial error in modeling turnout and/or voter intentions, these results are also consistent with many such hypotheses about the size of that error.

    This point has momentous implications for public opinion polling and for American democracy. If one makes inferences about the quality or correctness of polls from C.I. coverage, then one might conclude that there is no need to reevaluate the polls themselves. Estimates of uncertainty are necessary in public opinion polling, but they also make it hard to diagnose fundamental, systematic error. The more informative way to proceed is to identify errors, as Sam Wang has done (“The business about 65%, 91%, 93%, 99% probability is not the main point”), and going forward, to learn how to minimize that error.

    There is no way to avoid the secondary conclusion that this will be hard. As I wrote two days ago,

    Future aggregates for future elections by sites like 538 are going to use historical performance (i.e., prediction error today) to weight or “adjust” future polls. It is possible that some polls were more accurate than others because they had better models of turnout and voter intentions. It is also possible that all polls were just off (“correlated errors,” in the lingo), and some of these randomly happened to be less off than others. If the latter is true, then adjustments in the future will be worse than useless—they will be chasing noise.

    The future of election polling is not “whose polling aggregation method had the greatest uncertainty?” The future is “whose polls are the most accurate, and how do we know?” Anyone who suggests otherwise is either confused, or trying to sell you something.

  • Some of the Worst Things about the Election

    There are many potentially ominous consequences of Trump’s defeat of Clinton last night. Many opponents of President-elect Trump are particularly worried about the safety and inclusion of people of color, women, and religious minorities; the GOP’s legislative agenda; and the future of U.S. foreign policy. Here is a short list of three other contenders, from the perspective of political science.

    Dynamic Information Effects

    As Przeworski argues in “Minimalist Conception of Democracy: A Defense,” one of the key strengths of democratic elections is that they convey information. This is how strong I am. This is how strong you are. We learned this morning that the white nationalist, patriarchal vote bloc is large enough to decide an election. Heretofore, it was not clear how large that bloc was, and whether or not it could swing an election. Now it is clear that this is a winning strategy for national political office. Future candidates will be more likely to campaign in this way simply because they now know that it is a winning strategy. This his how strong I am. This is how strong you are.

    Back to the Drawing Board with Polling and Aggregates

    Until about 8:30 EST the smart money was not on following polls, but rather on following polling aggregates like 538, PredictWise, Votamatic, Princeton Election Consortium, and others. There will be postmortems about which one of these was best, and the instinct is to defend 538 because it only gave Clinton a 71% chance of winning vis-a-vis others in the 80 – 99% range, but if you conclude anything other than they were all fatally flawed you have not drawn the right inference. The reason why they were all fatally flawed is that they all drew on the same information: polls, sometimes augmented by a “fundamentals” model (Votamatic), sometimes with prediction markets (PredictWise). It is a clear instance of Garbage In, Garbage Out.

    Here is what is more worrisome. Future aggregates for future elections by sites like 538 are going to use historical performance (i.e., prediction error today) to weight or “adjust” future polls. It is possible that some polls were more accurate than others because they had better models of turnout and voter intentions. It is also possible that all polls were just off (“correlated errors,” in the lingo), and some of these randomly happened to be less off than others. If the latter is true, then adjustments in the future will be worse than useless—they will be chasing noise. Forget polling aggregates then. The strategy now is to identify the good polls in a world in which (1) almost every one failed and (2) we don’t know why.

    Ratchet Effects and the Devastating Failure of Ground Game

    Ground Game,” the Clinton campaign’s mobilizational capacity, get out the vote efforts, and others methods to help get voters to the polls, was supposed to be her singular advantage over Trump. It has obviously failed. Either the Democrats’ ground game was not as strong as observers believed, or it did not matter in the context of media saturation and the other advantages that Trump voters had (shorter lines, less voter suppression, more enthusiasm, whatever).

    What comes next will be efforts that counteract the kinds of advantages that ground game can bring to relatively disenfranchised voters even in the best of times. Decisions taken by state legislatures, the Congress, and a Supreme Court with new justice nominated by a president whose party holds all branches of government will further stack the deck against voters in urban areas, from poorer backgrounds, and visible minorities. These could have a ratchet effect, leading to a sharp and discontinuous decrease in the ability of mobilization to bring people to the polls who already face higher costs for voting. Such effects could be visible for a generation or more. Voting may be habit forming. So is hopelessness.