Modeling Possibly Nonlinear Confounders

In recent empirical work, my coauthors and I have found it useful to treat ordinal variables as factor variables rather than as continuous variables when entering them on the right-hand side of a regression. What I mean is a situation as follows: we are estimating the model

Y = a + βT + γU + ε

where the parameter of substantive interest is β—the partial correlation between T and Y—but we include variable U on the hypothesis that U confounds the relationship between T and Y. A motivating example would a situation in which I want to know the partial correlation between party ID and tax policy preferences, but I suspect that income affects party identification and tax policy preferences, so we need to condition on income in order to “deconfound” the partial correlation of interest. 

When we measure income, we normally don’t have a continuous indicator of dollars/year or something like that, but rather a set of income categories: $20k/year or less, $20k/year-$40k/year, $40k/year-$60k/year, and so forth. This is an ordered variable (category 2 is more than category 1, category 3 is more than category 2…), but it is not a continuous variable. Lately, I have chosen to estimate 

Preferences = a + b*PartyID + c1*IncomeCat2 + c2*IncomeCat3 + c3*IncomeCat4 + c4*IncomeCat5

rather than 

Preferences = a + b*PartyID + c*Income

The former approach allows the relationship between income and tax preferences to be nonlinear, whereas the latter constrains that relationship to be linear. 

But here’s the thing: in this example we are entirely uninterested in the coefficients c1, c2… we only care about my estimate of b. And this weekend, over beers and hamburgers, some wise friends asked me the following: what’s the point of treating U as a factor variable if we are indifferent to c? The question arose in the context of a discussion of Hainmueller et al’s analysis of interaction terms, which shows that standard advice assumes that interactions are linear (which is often not true, and can have pernicious consequences) and recommends exploring nonlinear relationships rather than imposing a linear functional form. One can do this very simply by “discretizing” continuous variables and treating them as factors, just in the model above. 

Although I have a good handle on why Hainmueller’s flexible nonlinear approach is useful for modeling interaction effects, I found myself at a loss to explain why the nonlinear approach would make sense in a non-interactive context in which U is nothing but a confounder. Why not just control for it? What’s the benefit of allowing that relationship to be nonlinear?

As is often the case, a little bit of simulation can help to develop intuitions. So this is what I have done, prompted by some idle ruminations while watching TV over breakfast. The full exposition is below, but the TL;DR result is that treating a control variable U as a factor variable allows a standard OLS setup to estimate b in contexts where both of the following are true: (1) the relationship between U and T is nonlinear, and (2) the relationship between U and Y is nonlinear. When only one (or neither) of those two conditions is true, treating U as continuous will work just fine, but there is little cost to modeling it as a factor variable. 

The Setup

Here is a simple simulation in R in which there is a binary causal variable of interest T that is a (possibly nonlinear) function of U, an ordinal confounder that has five values (1…5).  Y, in turn, is a function of T and a (possibly nonlinear) function of U.

  u <- sort(rep(seq(1:5),n/5))

  t <- rep(NA,n)

  t[u==1] <- rbinom(n/5,1,p1)

  t[u==2] <- rbinom(n/5,1,p2)

  t[u==3] <- rbinom(n/5,1,p3)

  t[u==4] <- rbinom(n/5,1,p4)

  t[u==5] <- rbinom(n/5,1,p5)

  y <- rep(NA,n)

  y[u==1] <- t[u==1] + rbinom(n/5,4,q1)

  y[u==2] <- t[u==2] + rbinom(n/5,4,q2)

  y[u==3] <- t[u==3] + rbinom(n/5,4,q3)

  y[u==4] <- t[u==4] + rbinom(n/5,4,q4)

  y[u==5] <- t[u==5] + rbinom(n/5,4,q5)

In this setup, we capture the relationship between U and T with the parameters p1…p5. An example of a linear relationship between U and T is if p1 = .1, p2 = .2, p3 = .3, p4 = .4, and p5 = .5, in which the probability that T = 1 increases linearly across the values of U. 

Here, by contrast, are examples of nonlinear relationship between T and U:

  1. p1 = .1, p2 = .1, p3 = .2, p4 = .2, and p5 = .9
  2. p1 = .6, p2 = .5, p3 = .2, p4 = .3, and p5 = .9

In the former, the probability that T = 1 increases discontinuously at the highest value of U. In the latter, the probability that T = 1 is higher when U = 1 than when U = 2, 3, or 4, subsequently rising again when U = 5. 

We capture the relationship between U and Y with the parameters q1…q5 in an analogous way. Y is our outcome of interest, which is always a linear function of T, with β = 1, plus U. Notice that if q1=q2=q3=q4=q5 then Y is independent of U; otherwise U confounds our estimates of β. Notice as well that we have a statistical model that looks pretty much like the motivating example of tax preferences, partisanship, and income, in which Y is a five-category dependent variable, much like a survey response on a Likert scale. 

To imagine the nonlinear relationships that this model will capture, start with income (U). It could be that the probability that one is a Republican (T) rises linearly with income. But it could also be the case that the probability that one is a Republican is much higher at the highest income bracket than in any of the lower income brackets; or it could be that the probability that one is a Republican is actually a bit higher among those with low incomes than among those at the middle of the income distribution, rising again among those at the highest income brackets. 

The same is true with views about tax cuts. It could be that the probability that one supports tax cuts (Y) rises linearly with income (U), but it could also be that preferences for tax cuts are substantially higher among the wealthiest than they are among others, or that there is some other relationship in the data. We suspect, though, that those who identify as Republicans (T) are more likely to support tax cuts (Y).*


We are using ordinary least squares regression to estimate β.** Consider three different ways to estimate this relationship.

  1. Omit U entirely, which will certainly create bias unless q1=q2...=q5 (call this “No U”)
  2. Enter U as a single control variable (“Linear”)
  3. Enter U as a series of k-1 dummy variables, where k is the number of categories of U (“Factor”)

Below I have plotted the results from estimating this model 250 times on randomly generates U, T, and Y with a sample size of 1000. I allow for nonlinear relationships between U and T as well as U and Y, calculating the bias of the estimates as βb and plotting the distribution of estimates of bias across replications. The footer of the graph shows you the specific values of the p and q terms.

Look first at the blue line, which shows the distribution of bias in b when using the Factor approach. This distribution is centered around 0, which is just what we’d expect if this method were unbiased. It is not surprising to see that the black line, corresponding to estimates that omit U entirely, are highly biased, centered far from 0. But compare these to the red line: here, just accounting for the confounder U using the Linear approach yields biased estimates of the parameter of interest.

What happens if we explore different forms of nonlinearity? The substantive conclusions remain the same.

In this case, the relationship between U and Y is non-monotonic, not just nonlinear as above, but the conclusion remains the same. And just how much nonlinearity is needed to produce biased estimates? Here is a relatively mild case of nonlinear relationships between U and T and U and Y:

When nonlinearity is relatively mild, the bias is relatively small, but even here it is clear that the Linear approach produces worse estimates than the Factor approach.

As it turns out, however, that both nonlinearity in U,Y and U,T are essential for the Factor approach to be clearly superior to the Linear approach. Here is a case where the relationship between U and T is linear, but the relationship between U and Y is not:

We see that both Factor and Linear estimates appear to be unbiased, although the spread of the Factor estimates around 0 is less, suggesting that the Factor approach is more precise. And here is a case where the relationship between U and T is nonlinear, but the relationship between U and Y is.

In this case, both the Factor and Linear approaches appear to be unbiased, but the Factor approach is less precise than the Linear approach—at last, a point in favor of the Linear approach. And for completeness, when both U and T, and U and Y, are linearly related.

This last set of results tells us that if there are no nonlinearities at all, it is immaterial whether one models confounders as linear or not.

Summary Conclusions

The takeaway message from this simulation exercise is fairly simple. Modeling confounders as factor variables makes good sense if there is any possibility that the relationships between both the confounder and the causal variable and the confounder and the outcome variable are nonlinear. If both of these are linear, it doesn’t much matter which you choose. If one relationship is linear but the other is nonlinear, the Linear approach can sometimes yield more precise estimates than the Factor approach, but the Factor approach is still unbiased.

This is an argument for including confounders as factors as the default. There are costs to including long strings of dummied-out confounders in terms of degrees of freedom, but with sufficient sample size these costs are probably relatively minor. It would be interesting to explore whether one might specify the choice of linear versus factor specifications in terms of a bias-variance tradeoff.

Bigger Picture: Nonparametric Identifiability versus Estimation

Stepping back from the question of how best to model confounders, this example provides a useful example of the pedagogical limits of Directed Acyclic Graphs for applied causal inference research.

Here is the point: to my understanding, every single data generating process described above would be modeled using the following DAG:

The great benefit of DAGs is that they are extraordinary tools for ascertaining the nonparametric identifiability of causal relationships. Because our model holds that U -> T and U -> Y, we know that we must condition on U to identify the effect of T on Y. But moving from identification to estimation is not straightforward, and nothing about this DAG tells us about that relationship as an actual estimation problem.*** Thinking about nonparametric identifiability is crucially important for any causal quantity, but this case reinforces to me that in practice, there is still a devil in the statistical modeling details.


* One could also allow the relationship between partisanship and tax preferences to vary by income itself (Y = T + U + T×U), but that is the standard interaction effect case that Hainmueller et al have already studied.

** We could estimate β using a nonlinear ordered dependent variable approach, but that won’t make much difference.

*** It could be that there does exist a DAG that differentiates among those cases, with implications for how we estimate them. But I am not aware of it (and would be pleased to learn).

What Happens When Evergrande Collapses?

The markets are watching closely for what seems to be the inevitable collapse of Evergrande, a massive Chinese property developer that is almost certainly unable to service its debts. A default would have tremendous repercussions for the Chinese financial system–and with it, the Chinese economy and the global economy as well. All the immediate focus is on the financial implications of a massive default of a highly-connected property firm, both for China and for the rest of the world. But the political implications of Evergrande’s collapse are no less interesting.

So what happens to Chinese politics when Evergrande collapses? About five years ago, Jeremy Wallace and I asked a question like this in a paper called “Hard Landings and Political Change in Nondemocracies” (PDF). This paper was never published (and Jeremy and I tried valiantly to place it), probably because the paper is nonstandard in the way that it tries to answer the question using qualitative and quantitative evidence and the fact that China is different.* But our argument rests on the premise that we can learn from the experiences of things like the collapse of Evergrande, even those that don’t happen in China, to develop some insights about what would happen in this specific case.

So what will follow politically from the collapse of Evergrande? One place to look is East and Southeast Asia, and to the political upheavals that followed from the serial collapse of the exchange rates of Thailand, Indonesia, Malaysia, and South Korea during the Asian Financial Crisis. These financial crises became political crises for the simple reason that politics shapes financial markets, and the financial turmoil affected economic actors with political power.

What we learned during 1997 was just how financially interconnected these economies were, just how many politically powerful individuals and groups were using preferential access to credit as a cash machine, and just how liquidity-constrained these actors were when their cash machines quit spitting out bahts, won, rupiah, and ringgit. What might have been just a property crisis in Bangkok or an exchange rate crisis in Jakarta turned out to be a financial meltdown that affected everyone. And the very same political actors who had happily supported the incumbent governments at the time started demanding that these governments do something–anything!–to get the cash machines back up and running.

And when it turned out that governments could not help them, even as they tried, these powerful actors began to look for new governments who could.

Now, the specifics of the Asian Financial Crisis differ from whatever crisis Evergrande’s default may create. Thailand, Indonesia, Malaysia, and South Korea suffered from having open capital accounts and fixed exchange rates, coupled with enormous amounts of unhedged foreign currency debt. The details differ across the four cases, but at the end of the day the problem was how to stabilize currencies and keep interest rates low at the same time.

China does not face this problem; and in fact, one reason why it doesn’t face this problem right now is because it looked to the south and east in 1997 and said “no thanks” to an open capital account. So the financial crisis management problem that China faces will be different; if Evergrande defaults, it will need to unjam the credit markets that will seize up in response. And it will need money to do that. And there will be some powerful economic actors who do not want to spend that money, as well as some who will be desperate to spend that money.

The Xi regime will, in turn, need to find a way to placate both set of actors, or crush one. From an outsider’s perspective, it will not look politically contentious, because the regime will suppress any mention of dissent or discord. But behind the facade, there will be intense disagreement, and we will not be able to know just how precarious the situation is until it is too late. If the Xi regime cannot find a way to either placate those powerful actors who demand protection, or to crush them entirely, we will see these actors start to abandon the Xi regime altogether.**

One response might be that China’s institutions are different: they are so strong, the party so unified and hierarchical, and so pervasive within Chinese society that a political crisis like this is impossible. I couldn’t rule it out. But one lesson that studying financial crises hammers home is that time and again, people vastly overestimate how much durable political loyalty institutions create when the going gets tough. China’s political institutions created enormously powerful and wealthy people, but those people are not loyal to those institutions, they are loyal to their wealth and power.

This script was written long ago, and the actors have changed over the years. It is the story of how an authoritarian juggernaut turns into the “sick man” of its part of the world. It happens quickly, and in retrospect it seems inevitable. But it’s still a waiting game right now; we won’t know, and foreign observers like me have predicted at least twenty out of the last zero Chinese economic and political crises since Tiananmen. But it remains the case that after, say, just five more sleeps we may wake up to an entirely different kind of Chinese politics.


* Read the last three words in that sentence in a grumpy comparativist voice.

** My prediction, if it gets this far, is that in at least one public statement Xi will describe those who are demanding policy change as “rats.” You know the crisis is serious when the rat talk starts (see Indonesia and Mexico, among others).