Author: tompepinsky

  • Multiple Imputation with Colliders

    I have found myself thinking a lot recently about multiple imputation in the presence of colliders. Proponents of MI commonly recommend that any variable available in the dataset should be included in the imputation stage, even if it will not be included in the analysis stage. In one important statement (PDF):

    For greater efficiency, add any other variables in the data set that would help predict the missing values (p. 57).

    I have often wondered if this is true—if there are particular cases in which adding variables atheoretically at the imputation stage can lead to bias in the analysis stage even if the analysis stage model is correct.

    Colliders are a candidate for such a variable. A collider is a variable that is jointly caused by both an independent variable and a dependent variable: X -> C <- Y. It is well-known that in a regression of Y on X in which the true causal effect of X is zero, conditioning on a collider can create the illusion that X actually causes Y. Take the simple case where X and Y are random variables that are unrelated to one another, but imagine that they jointly cause a third variable C, and we include that as a control variable. If we were to generate those data and run that regression in R, here is what we would get.

    x <- rnorm(100)
    y <- rnorm(100)
    c <- y + x + rnorm(100)
    summary(lm(y~x+c))$coefficients
    
    ##               Estimate Std. Error   t value     Pr(>|t|)
    ## (Intercept)  0.1009087 0.07196625  1.402167 1.640576e-01
    ## x           -0.3802116 0.09452905 -4.022166 1.141927e-04
    ## c            0.4776374 0.04893261  9.761127 4.397550e-16
    

    Conditioning on a collider in this simple example generates a highly significant but entirely spurious correlation between X and Y.

    So what happens if you were to include a collider when imputing missing data? My intuition is that doing so would create similar problems, but to test that intuition I made a little simulation.* Consider the best possible case: the analyst knows that the correct analysis model excludes the collider, but includes the collider in the imputation stage because it is correlated with the variables with missing data. First load the necessary packages and reset ggplot2's colors away from the default mouldy waffle setting.

    library(mice)
    library(reshape2)
    library(ggplot2)
    theme_set(theme_classic())
    set.seed(14850)
    

    The following function simulates a dataset with missing data in Y those probability of missingness depends on the value of Y. It then runs five analyses

    1. with the full data (no missing values): Full
    2. using listwise deletion/complete case analysis: LD
    3. using MI using no auxiliary variables: MI
    4. using MI but with a proxy variable for Y: MI-full
    5. using MI but with the collider C: MI-collider
    simulation<-function(p){
    
      # set up the data 
      N <- 1000
      x <- rnorm(N)
      u.y <- rnorm(N)
      y <- u.y + rnorm(N)
      c <- y + x + rnorm(N)
    
      # full data frame, no missing values
      dat.full <- as.data.frame(cbind(y,x,u.y,c)) 
    
      # create missingness in Y
      dat.missing.collider <- dat.full                     
      dat.missing.collider$miss.y<-rbinom(N,1,pnorm(y+p))
      dat.missing.collider$y[dat.missing.collider$miss.y==1]<-NA
      dat.missing.collider$miss.y <-NULL  
    
      # df includes Uy but no collider, missing values for Y
      dat.missing.full<-dat.missing.collider
      dat.missing.full$c<-NULL
    
      # df includes collider but not Uy, missing values for Y
      dat.missing.collider$u.y <- NULL    
    
      # df omits Uy and c, missing values for Y
      dat.missing <- dat.missing.collider
      dat.missing$c <- NULL                 
    
      # if all data were observed
      res.full <- summary(lm(y ~ x, data=dat.full))$coefficients
      b.full <- res.full[2,1]
      p.full <- res.full[2,4]
    
      # listwise deletion
      res.missing.ld <- summary(lm(y ~ x, data=dat.missing))$coefficients
      b.missing.ld <- res.missing.ld[2,1]
      p.missing.ld <- res.missing.ld[2,4]
    
      # impute the data
      mi <- mice(dat.missing)                   
      mi.full <- mice(dat.missing.full)          
      mi.collider <- mice(dat.missing.collider) 
    
      # results without c or Ux, Uy
      res.missing.mi <- summary(pool(with(mi, lm(y ~ x))))
      b.missing.mi <- res.missing.mi[2,1]
      p.missing.mi <- res.missing.mi[2,5]
    
      # results with Ux, Uy
      res.missing.mi.full <- summary(pool(with(mi.full, lm(y ~ x))))
      b.missing.mi.full <- res.missing.mi.full[2,1]
      p.missing.mi.full <- res.missing.mi.full[2,5]
    
      # results with collider
      res.missing.mi.collider <- summary(pool(with(mi.collider, lm(y ~ x))))
      b.missing.mi.collider <- res.missing.mi.collider[2,1]
      p.missing.mi.collider <- res.missing.mi.collider[2,5]
    
      outputs <- cbind(b.full,
                       b.missing.ld,
                       b.missing.mi,
                       b.missing.mi.full,
                       b.missing.mi.collider,
                       p.full,
                       p.missing.ld,
                       p.missing.mi,
                       p.missing.mi.full,
                       p.missing.mi.collider)
      return(outputs)
    }
    

    Run this simulation 250 times and collect the output.

    sims <- data.frame(t(matrix(replicate(250, simulation(0)),nrow=10)))
    sims <- cbind(1:nrow(sims),sims)
    names(sims) <- c("Iter",rep(c("Full","LD","MI","MI-full","MI-collider"),2))
    

    First, we compare the estimates across all five estimators (remembering that the true effect of X on Y is zero).

    colMeans(sims[2:6])
    
    ##         Full           LD           MI      MI-full  MI-collider 
    ## 0.0006753861 0.0030361351 0.0016691195 0.0021527321 0.0029543371
    
    to.plot.b <- melt(sims[1:6], id.vars="Iter", variable.name="Estimator", value.name="Estimate")
    ggplot(to.plot.b, aes(x=Estimate, color=Estimator)) + geom_density()
    

    plot of chunk plot1
    This looks pretty good: the distribution of coefficient estimates is centered around zero for each of these models, and the mean results for MI with a collider aren't any worse than for MI without a collider. But what of Type-1 error rates, a particular concern when conditioning on a collider?

    colMeans(sims[7:11]<.05)
    
    ##        Full          LD          MI     MI-full MI-collider 
    ##       0.044       0.060       0.036       0.048       0.044
    

    To my surprise, Type-1 error rates don't seem to be a problem here so long as the collider does not appear in the analysis stage.

    Puzzled by this result and not believing my lying eyes, I looked online for further discussion, and came across this interesting paper (PDF) on auxiliary variables in MI. They diagnose a more subtle problem: sometimes adding a collider into the imputation stage can transform the data missingness from Missing at Random (MAR) to Missing Not at Random (MNAR). In the simulations above, data are MAR because they were confined to Y and we have X through which to model Y. But if Y causes a variable C that is correlated with the missingness mechanism and that missingness mechanism also depends on X, then conditioning on C will transform MAR data into MNAR data.

    To illustrate this problem, adjust the simulation function as follows:

    simulation_coll_missingness<-function(p){
    
      # set up the data 
      N <- 1000
      u.y <- rnorm(N)
      u.c <- rnorm(N)
      x <- rnorm(N)
      y <- u.y + rnorm(N)
      c <- y + u.c + rnorm(N)
    
      # full data frame, no missing values
      dat.full <- as.data.frame(cbind(y,x,u.y,c)) 
    
      # create missingness
      # p(Y missing) correlated with C and X
      dat.missing.collider <- dat.full                     
      dat.missing.collider$miss.y<-rbinom(N,1,pnorm(u.c+x)) 
      dat.missing.collider$y[dat.missing.collider$miss.y==1]<-NA
      dat.missing.collider$miss.y <-NULL  
    
      # df includes Uy but no collider, missing values for Y
      dat.missing.full<-dat.missing.collider
      dat.missing.full$c<-NULL
    
      # df includes collider but not Uy, missing values for Y
      dat.missing.collider$u.y <- NULL    
    
      # df omits Uy and c, missing values for Y
      dat.missing <- dat.missing.collider
      dat.missing$c <- NULL                 
    
      # if all data were observed
      res.full <- summary(lm(y ~ x, data=dat.full))$coefficients
      b.full <- res.full[2,1]
      p.full <- res.full[2,4]
    
      # listwise deletion
      res.missing.ld <- summary(lm(y ~ x, data=dat.missing))$coefficients
      b.missing.ld <- res.missing.ld[2,1]
      p.missing.ld <- res.missing.ld[2,4]
    
      # impute the data
      mi <- mice(dat.missing)                   
      mi.full <- mice(dat.missing.full)         
      mi.collider <- mice(dat.missing.collider) 
    
      # results without c or Ux, Uy
      res.missing.mi <- summary(pool(with(mi, lm(y ~ x))))
      b.missing.mi <- res.missing.mi[2,1]
      p.missing.mi <- res.missing.mi[2,5]
    
      # results with Ux, Uy
      res.missing.mi.full <- summary(pool(with(mi.full, lm(y ~ x))))
      b.missing.mi.full <- res.missing.mi.full[2,1]
      p.missing.mi.full <- res.missing.mi.full[2,5]
    
      # results with collider
      res.missing.mi.collider <- summary(pool(with(mi.collider, lm(y ~ x))))
      b.missing.mi.collider <- res.missing.mi.collider[2,1]
      p.missing.mi.collider <- res.missing.mi.collider[2,5]
    
      outputs <- cbind(b.full,
                       b.missing.ld,
                       b.missing.mi,
                       b.missing.mi.full,
                       b.missing.mi.collider,
                       p.full,
                       p.missing.ld,
                       p.missing.mi,
                       p.missing.mi.full,
                       p.missing.mi.collider)
      return(outputs)
    }
    

    The difference is now that the probability that Y is missing depends both on a factor that determines C and on X. Simulating these data 250 times again, here is what we get:

    sims <- data.frame(t(matrix(replicate(250, simulation_coll_missingness(0)),nrow=10)))
    sims <- cbind(1:nrow(sims),sims)
    names(sims) <- c("Iter",rep(c("Full","LD","MI","MI-full","MI-collider"),2))
    to.plot.b <- melt(sims[1:6], id.vars="Iter", variable.name="Estimator", value.name="Estimate")
    ggplot(to.plot.b, aes(x=Estimate, color=Estimator)) + geom_density()
    

    plot of chunk simulation2
    Now we see that using a collider in the imputation stage generates bias in the analysis stage even when the analysis stage does not control for the collider.

    colMeans(sims[7:11]<.05)
    
    ##        Full          LD          MI     MI-full MI-collider 
    ##       0.028       0.040       0.048       0.052       0.644
    

    And Type-1 error rates are unacceptably high.

    What is an example of an “imputation-stage collider”? Imagine we wish to use education (X) to predict partisanship (Y), but we have missing data on partisanship for people who feel excluded from the political system and who also have lower levels of education. And let's also imagine that members of party R are more likely to respond that they don't trust the government (C), as are people who feel excluded from the political system. Adding trust in government at the imputation stage will bias estimates of the effect of education on partisanship, but excluding it (given a series of other assumptions about what causes what that I'll leave unspecified here) would not.

    Note also that trust in government should be highly correlated both with education and with partisanship, so it looks like a good candidate for including in our imputation model if all we thought about was its predictive capacity.

    So what have we learned from this exercise? The main takeaway—as always—is that for MI to yield unbiased estimates of regression coefficients or causal parameters of interest, its assumptions have to be met. But more precisely, even having the correct model of the analysis stage does not absolve the analyst of considering the relationship between the imputation stage variables, the causal model, and the missingness mechanism. It turns out that in this simple example, imputing with an analysis-stage collider is innocuous (so long as it is excluded at the analysis stage). But imputation-stage colliders can wreck MI even if they are excluded from the analysis stage.**

    As I have argued elsewhere, MI cannot be a theory-free exercise. Just as there is no rule of thumb for comparing MI to its alternatives, there is no simple rule of thumb for auxiliary variables such as “include as many variables as you can in the imputation stage” or “in principle, including all of the remaining auxiliary variables in the imputation model is desirable” (PDF).

    NOTES

    *This post also serves as a test to see if I could make the Rmarkdown-to-Wordpress integration work, via knit2wp.
    **And in case you're wondering if including the imputation-stage collider as a control in the analysis stage will help, it won't.

  • The True Nature of Trump’s Presidential Power

    Just over two years ago, I described the problem of inferring how strong or weak a leader is from the political outcomes that we observe. What prompted that discussion was the hullabaloo surrounding the first couple weeks of the Trump administration—the Muslim ban, the reorganization of the White House, Bannon and Kushner, and so forth. It is easy to recall the panicked reactions by many in the commentariat.

    My point was that the worst interpretations of the Trump administration’s chaos are observationally equivalent with more innocuous ones. That is because, as I wrote,

    weak leaders often act like strong leaders, and strong leaders often act like they are indifferent. Weak leaders have every incentive to portray themselves as stronger than they are in order to get their way. They gamble on splashy policies. They escalate crises. This is just as true for democrats as for dictators…The consummate strong ruler is one who does not issue any command or instruction at all because she does not have to—her will is implemented already.

    In the ensuing two years, we have learned quite a bit about just how weak and ineffective the president is. The best commentary on Trump’s presidential weakness comes from Matt Glassman. See, for example, this magisterial thread, following Neustadt‘s analysis of presidential power.

    https://platform.twitter.com/widgets.js

    Although what I wrote in 2017 is correct, and although I also endorse Glassman’s analysis, both are vulnerable to a radical critique that a conventional analysis of presidential power will miss. His power is not to convince, or to set the agenda, but to define for others what their interests are.

    This distinction between influence, coercion, and agenda-setting, on one hand, and domination and interest-making, on the other, follows the classic “faces of power” debate in political science as articulated by Bachrach and Baratz (PDF) and Lukes (PDF). Their analyses are rich and subtle, but the essence of the “third face” of power is that a A exerts power over B by defining for B what B‘s interests and desires actually are.

    This view of power as domination and interest-making has quotidian applications: I expose my children to certain types of music in order to ensure that they grow up liking that music, for example. In the political sphere, the third face of power encompasses the power to socialize others into believing in a particular order of things, to habituate individuals to an acceptable state of affairs, to shape desires in a way that may leave the subject unaware that this has ever occurred.

    This third face of power is challenging. It would be very hard to observe this kind of power in action, because it operates through structures, habits, practices, and impressions rather than through commands or rules. Moreover, you cannot query the subject about whether power has been exerted over her, because the essence of this view of power is that the subject is probably unaware that it exists.* It also has a tendency to remove agency from the subject, conceiving her as vulnerable to the whims of the powerful.

    Holding these caveats aside for now, what if it is power’s third face that defines the true nature of Trump’s presidential power? His is the power to dominate his co-partisans in the executive, legislative, and judicial branches. This rationalizes the policy ineffectiveness of the Trump White House—just so long as we redefine President Trump’s core objectives as enriching himself and surviving in office. On that assumption, President Trump has been effective. He has made is copartisans believe that they wish to do the things that they do in defending his manifestly corrupt, morally bankrupt, and ineffective administration.

    There is abundant evidence that President Trump is a weak president when we conceptualize power as influence, coercion, or agenda-setting. Glassman’s analysis is spot-on in this regard. Even the Mueller report details instances of the President failing to do terrible things because he orders his subordinates to do them and they refuse.

    At the same time, President Trump has produced a Congress and an executive branch filled with conservative politicians who believe that this president may violate the Emoluments Clause, challenge the independence of the press and the Fed, apologize for literal Nazis, abandon free trade, conspire with the Russian government to interfere with U.S. elections, all without any threat of being held to account because Congress does not have the authority to check the executive branch. The word that I have used throughout to describe this phenomenon is domination, understood either in theoretical sense of Scott** or in the ethological sense of evolutionary psychology.

    Of course, the third face of power is the most contested face of power. Maybe President Trump’s co-partisans are just making a rational calculation about what serves their short term interests. They don’t really believe these things, they just say them because doing so is politically expedient, or they demur and ignore his ogrelike and venal behavior because they do get some policy goodies here and there. If so, then what I have written above is incorrect, and the observational equivalence problem that I described in 2017 applies here too. But the most radical view of power would suggest that we must take seriously the possibility that President Trump’s power is the power to dominate. *ahem*

    NOTE

    * Why, for instance, do I think that this is the best song of the 1960s?
    ** Scott, in fact, is useful as a critique of the third face of power, because he sees resistance as ever-present in the face of domination, and he clarifies that we may not observe resistance for the same reason that we may not observe domination.