Category: Research

  • Illustrating Contamination Bias, with Implications for Intersectional Description

    A new NBER working paper by Paul Goldsmith-Pinkham, Peter Hull, and Michal Kolesár shows that in experiments with more than one mutually exclusive treatment, regression with covariate adjustment does not identify treatment effects even under the assumption of conditional ignorability. Here is the abstract:

    We study the interpretation of regressions with multiple treatments and flexible controls. Such regressions are often used to analyze stratified randomized control trials with multiple intervention arms, to estimate value-added (for, e.g., teachers) with observational data, and to leverage the quasi-random assignment of decision-makers (e.g. bail judges). We show that these regressions generally fail to estimate convex averages of heterogeneous treatment effects, even when the treatments are conditionally randomly assigned and the controls are sufficiently flexible to avoid omitted variables bias. Instead, estimates of each treatment’s effects are generally contaminated by a non-convex average of the effects of other treatments. Thus, recent concerns about heterogeneity-induced bias in regressions leveraging potential outcome restrictions (e.g. parallel trends assumptions) also arise with “design-based” identification strategies. We discuss solutions to the contamination bias and propose a new class of efficient estimators of weighted average effects that avoid bias. In a re-analysis of the Project STAR trial, we find minimal bias because treatment effect heterogeneity is largely idiosyncratic. But sizeable contamination bias arises when effect heterogeneity becomes correlated with treatment propensity scores.

    This is surprising and striking. It is surprising because when there is exactly one treatment and one control, regression does identify treatment effects. It is striking because lots of experiments have more than one treatment condition, and scholars often use covariate-adjusted regression to estimate causal effects in these regressions. Importantly, these results are not only true for estimating treatment effects, but also for describing conditional group averages. The authors write,

    Even more broadly, contamination bias can arise in descriptive regressions which seek to estimate averages of certain conditional group contrasts without assuming a causal framework—as in studies of treatment or outcome disparities across multiple racial or ethnic groups, studies of regional variation in healthcare utilization or outcomes, or studies of industry wage gaps. Our analysis shows that in such regressions the coefficient on a given group or region averages the conditional contrasts across all other groups or regions, with non-convex weights

    These results have implications for how scholars use statistical models to summarize demographic and sociological data across groups. My goal in this post is to illustrate one such implication.

    Before getting to that, though, let’s review the issue at hand. The working paper explains all of the intuitions well and derives the analytical results clearly, but here is one of these cases where I find that I don’t really understand the phenomenon in question unless I work through it myself.

    Simulating Contamination Bias

    Here is the code to simulate a regression with two mutually exclusive treatment states and a control state, which we can represent as t(0), t(1), and t(2). Treatment probabilities vary across three different groups W = 1, 2, 3. For example, individuals in group W=1 receive the control t(0) with a probability of .1, they receive the first treatment t(1) with a probability of .5, and the second treatment t(2) with a probability of .5; but individuals in group W=2 are equally likely to get each of two treatment states and the control. And so forth.

    tosim <- function(effect1,effect2,effect3){
    t <- c(
    sample(c(0, 1, 2), size = n_w, replace=TRUE, prob = c(.1,.4,.5)),
    sample(c(0, 1, 2), size = n_w, replace=TRUE, prob = c(1/3,1/3,1/3)),
    sample(c(0, 1, 2), size = n_w, replace=TRUE, prob = c(.5,.4,.1))
    )

    modelmat <- data.frame(rep(NA,3*n_w),t,c(rep(1,n_w),rep(2,n_w),rep(3,n_w)))
    names(modelmat) <- c("y","t","w")

    modelmat$y[modelmat$w==1 & modelmat$t==0] <- 0 + rnorm(length(modelmat$y[modelmat$w==1 & modelmat$t==0]),0,3)
    modelmat$y[modelmat$w==1 & modelmat$t==1] <- 0 + rnorm(length(modelmat$y[modelmat$w==1 & modelmat$t==1]),0,3)
    modelmat$y[modelmat$w==1 & modelmat$t==2] <- effect1 + rnorm(length(modelmat$y[modelmat$w==1 & modelmat$t==2]),0,3)

    modelmat$y[modelmat$w==2 & modelmat$t==0] <- 0 + rnorm(length(modelmat$y[modelmat$w==2 & modelmat$t==0]),0,3)
    modelmat$y[modelmat$w==2 & modelmat$t==1] <- 0 + rnorm(length(modelmat$y[modelmat$w==2 & modelmat$t==1]),0,3)
    modelmat$y[modelmat$w==2 & modelmat$t==2] <- effect2 + rnorm(length(modelmat$y[modelmat$w==2 & modelmat$t==2]),0,3)

    modelmat$y[modelmat$w==3 & modelmat$t==0] <- 0 + rnorm(length(modelmat$y[modelmat$w==3 & modelmat$t==0]),0,3)
    modelmat$y[modelmat$w==3 & modelmat$t==1] <- 0 + rnorm(length(modelmat$y[modelmat$w==3 & modelmat$t==1]),0,3)
    modelmat$y[modelmat$w==3 & modelmat$t==2] <- effect3 + rnorm(length(modelmat$y[modelmat$w==3 & modelmat$t==2]),0,3)

    sim <- summary(lm(y~factor(t)+factor(w), data=modelmat))$coefficients

    return(c(sim[2,1],sim[2,4]))
    }

    We are interested in estimating the effect of t(1) relative to the control t(0). In this example, the average treatment effect is zero for all individuals in all three groups. However, the average treatment effect of the second treatment varies across groups, represented by the variables effect1, effect2, and effect3. We estimate the following regression:

    lm(y~factor(t)+factor(w), data=modelmat)

    … and then collect the results for the first treatment effect, and we hope that the coefficients on t1 estimate the true treatment effect for the first treatment, which we know to be zero. Are we right?

    The answer is no—not unless the average treatment effect for the second treatment is constant across values of W. To see this, I’ve run that simulation 250 times each for four possible combinations of effect1, effect2, and effect3. For each combination, I plot both the estimated treatment effect for the first treatment (which we know to be zero) and the associated p-value of that estimate. I also calculate the percentage of simulations in which we would falsely reject the null hypothesis of no treatment effect for the first treatment.

    We see that with any group-level heterogeneity in the effect of the second treatment, we become more likely to infer that the first treatment effect is positive, even though the true treatment effect is zero. The greater the heterogeneity in treatment effects, the worse our estimates. We only get a false positive rate that approximates the nominal expected false positive rate of 0.05 when treatment effects are identical across groups.

    As it turns out, it’s essential that the probability of treatment assignment varies across W. If I do the same simulations with equal assignment probabilities within W, our estimates of the effect of the first treatment are correct no matter how heterogenous the effects of the second treatment.

    With that in mind, let’s turn to a case of quantitative description.

    Quantitative Description with Intersectionality

    Imagine there were a country with three communal groups: Magenta, Cyan, and Indigo. In this country’s 1970 census, there were differences in urbanization rates among these three groups.

    • Magentas, who are 60% of the population, are 85% rural and 15% urban.
    • Indigos, who are 10% of the population, are 66% rural and 34% urban.
    • Cyans, who are 30% of the population, are 53% rural and 47% urban.

    We are interested in testing whether there are income differences between Indigos and Magentas, and collect a simple random sample of 1000 respondents. Because we think that there are differences in income between urban and rural people, and we know that the distribution of urban versus rural varies by communal group, we control for urban and rural status too.

    Let us stipulate the following:

    • urban people have higher incomes than rural (urban – rural = 1).
    • Indigos have lower income than Magentas (Indigo – Magenta = -1), and this is the same for urban and rural Indigos
    • Urban Cyans and rural Cyans have higher incomes than urban Magentas and rural Magentas, respectively

    We then run this regression:

    lm(income~factor(community)+factor(urban))

    Magentas are the base category, and we hope that the coefficient on Indigos represents the urban-adjusted average difference in income between Magentas and Indigos, which should be -1. Does it?

    Once again, no. The figure below compares the distribution of estimated differences in income between Indigos and Magenta from that regression, repeated 250 times for each combination of differences between urban/rural Magentas and Cyans.

    In the top two plots, where there are no income differences between Magentas and Cyans or where the urban Magenta/Cyan difference is the same as the rural Magenta/Cyans difference, all is fine. But in the remaining plots, the estimated Magenta/Indigo difference differs from the truth, with the direction and size depending on whether the urban Magenta/Cyan differences are bigger than the rural Magenta/Cyan differences.

    Goldsmith-Pinkham et al show that we can recover the correct estimates of the differences between Magentas and Indigos by interacting the communal indicators with the demeaned urban-rural indicator (reminiscent of the Lin adjustment).

    meanw <- modelmat$w-mean(modelmat$w)
    lm(y~factor(t)*meanw)

    This works no matter if there is urban/rural heterogeneity in the Magenta/Cyan difference. Below, I add the results from that approach in bold green to show how it works:

    The green distributions are centered on the true difference, as they should be.

    Intersectional Description

    Stepping back from this example, the results here have important implications for how we describe group-level differences in the presence of intersectional differences. By intersectional differences I mean phenomena like the difference between men and women in some outcome is different for black and white people; or the difference between urban and rural outcomes varies according to immigration status.

    Intersectionality is widely understood in the social sciences as a conceptual matter. And it is also fairly well-known that we can accommodate intersectionality in quantitative description by describing features as interacting multiplicatively rather than additively. It would seem obvious that ignoring intersectionality will mean that you will, in fact, not be able to uncover intersectional differences. But Goldsmith-Pinkham et al’s results are novel in a subtle and important way.

    Specifically, if there are more than two groups that you are comparing, then the presence of intersectional differences in groups A and B affect our estimates of differences between groups A and C even if there are no intersectional differences between A and C. When we have complex comparisons among mutually exclusive categories, like race (white, black, Asian, etc…) or class (working class, middle class, upper class, etc…) or nationality (German, British, American…) then any intersectionality in any one category will affect estimates of differences across every other category, even when there are no intersectional differences in those other categories.

    Here is a case where simple analytical tools that work well in simple cases may yield truly erroneous conclusions in more complex cases.

  • Economic Growth Under Authoritarianism and Democracy: The Indonesian Case

    Indonesia is one of most striking cases of economic transformation in modern history. At independence, Indonesia was a low-income country subject to centuries of exploitation and resource extraction under Dutch colonial rule. By the 1990s it was one of the Newly Industrialized Economies of East and Southeast Asia. Today, it is a middle-income country, classified by the World Bank as a Lower Middle Income country but not far from Upper Middle Income status. This economic transformation produced a measurable increase of material wellbeing for hundreds of millions of people. Prior to China by the 2010s, no country’s economy had grown so quickly for so long as did Indonesia’s.

    Indonesia is also a paradigmatic case of economic growth under authoritarian rule. The New Order regime arose in the wake of the brutal extermination of the Indonesian left, and was headed by a general-turned-president who tolerated only limited organized opposition and almost no anti-regime mobilization for over three decades. Soeharto, though, legitimated his rule as providing stability as the foundation for prosperity and economic transformation, eventually becoming known as Bapak Pembangunan or “Father of Development.”*

    Since the collapse of the New Order and the transition to democracy in 1999, following the economic crisis of the late 1990s, many Indonesians have lamented the loss of the feelings of optimism, dynamism, and economic progress under the New Order. Inequality has continued to rise, and the Indonesian economy, today, plainly does not serve all Indonesians equally. And one often encounters a general sense** that the New Order was a time when things were getting a lot better, a lot faster.

    (There is something lurking deeper here about how Indonesians recall the terms of authoritarian citizenship, although that is a post for another time.)

    One possible explanation for the belief that Indonesia’s economy was somehow more dynamic under authoritarianism is that the New Order actually did create higher economic growth than did its democratic successor. Indeed, this is a view that I have often shared in my teaching and in my research—I do not support authoritarianism, but I nevertheless insist that we must look the facts of economic development under Soeharto squarely in the face. But more than 20 years after the transition to democracy, is it actually true that New Order Indonesia grew faster than democratic Indonesia has since?

    As it turns out, no. Prompted by an email exchange among some very respected Indonesia-watching colleagues, I decided to do a proper test of the relationship between regimes and economic growth. To do so, I grabbed data from the World Bank on Gross Domestic Product per capita between 1960 and 2020, measured in constant 2015 US Dollars. This ensures that we are doing the proper comparisons over time, adjusting for inflation and purchasing power (although see the caveat at the very end of this post). I then calculated economic growth as the year on year percentage change, and plotted this series over time by separating out the pre-New Order period (1961-1965), the New Order period (1966-1998), and the post-New Order period (1999-2020). Here is what we see.

    The thick lines are yearly growth. The dashed lines are the average within the period in question. And it is clear: average growth from 1966-1998 was no better or worse than average growth from 1999-2020. Measured this way there is simply no evidence that New Order Indonesia grew faster than democratic Indonesia.

    We can test this in a regression framework too:

    There is no significant relationship between a dummy variable capturing “post-New Order” and economic growth in any of the models. This is true however I model this relationship: M2 above controls for the natural log of the previous year’s GDP in order to reflect the well-known convergence hypothesis of declining rates of growth at higher levels of income, and M3 also controls for the log of Indonesia’s population size.

    All in all, I just do not see any evidence that Indonesian economic growth was faster under authoritarianism than under democracy.

    This matters, and it doesn’t. It matters in the sense that to understand Indonesia’s political economy, we need to appreciate just how much Indonesia has grown after the end of the New Order. That is a lot of sustained economic growth.

    And it matters in a second way too. When you look at the graph above, you also can’t miss that Indonesia’s economic growth has been much more stable under democracy than it was under authoritarianism. That’s not nothing. Indonesia is not lurching from crisis to crisis, the way that New Order Indonesia had a petroleum crisis, then the early 1980s crisis, then the Asian Financial Crisis. People ought to be writing about the general macro-stability of Indonesia’s democratic economic performance, even given a lot of global economic turmoil.

    But these results also don’t matter in the sense that the political fact of nostalgia for a sense of stability and order does not depend on whether the economic facts are consistent with it. People will remember the New Order as a time of order; that, truly, was the point.

    And the other thing is inequality. That might really be the story of democratic Indonesia’s political economy, how such wealth and prosperity can emerge and persist in a country still beset by serious challenges. That said, it is important to recognize not only that there are so many extremely wealthy Indonesians, but also that there are so many nouveau wealthy Indonesians.

    There is an important caveat to all this, though. The data that I’ve used here are derived from the World Bank, which calculates GDP per capita in a lot of different ways but only has real GDP data in US dollar terms. What I would really like to have is the same series, but in the (inflation-adjusted) local currency unit: real GDP per capita measured in constant rupiah. I don’t have immediate access to that, and although I can grab the current data from Indonesian government websites they do not go back in time to the 1960s. So an open call: if you, reader, have access to that data series, I will make the same figure and estimate the same regressions, and report the results here.

    NOTES

    * I doubt that this was a spontaneous expression of endearment. Someone in the regime probably coined the phrase and then encouraged it to spread.

    ** Sometimes called “Indonesian SARS”, which stood for Sindrom Aku Rindu Soeharto [I Miss Soeharto Syndrome].