Category: Research

  • The Language You Speak Predicts Your Beliefs, Values, and Behaviors

    Several weeks back I tweeted the preliminary results from a new research project that uses World Values Survey data to test the linguistic relativity thesis.

    The idea that your language constrains how you conceive of the world around you is an old one—attributed originally to Willem von Humboldt but more commonly to Edward Sapir and Benjamin Lee Whorf. It is also one that has enjoyed a recent resurgence in economics, sociology, and political science.

    In a new paper, I take a closer look this literature. Here is the first paragraph:

    There is an emerging consensus that linguistic structure has a direct, causal effect on speakers’ economic and social beliefs (Chen, 2013; Davis and Abdurazokzoda, 2016; Feldmann, 2018; Ginsburgh and Weber, 2016; Hicks, Santacreu-Vasut and Shoham, 2015; Jakiela and Ozier, 2018; Liu et al., 2018; Mavisakalyan, 2015; Mavisakalyan, Tarverdi and Weber, 2018; Pérez and Tavits, 2017; van der Velde, Tyrowicza and Siwinska, 2015). This paper contributes to this dynamic new literature by uncovering the linguistic origins of nativist public opinion, focusing on how human languages structure the relationship between subjects, objects, and verbs (Dryer, 2013). Languages in which the object of a verb follows the verb encode a concept of distance and difference between subject and object directly into linguistic structure. By contrast, languages in which the object of a verb precedes the verb are more likely to place the subject and object next to one another and also highlight the receiver of the action over the action itself. Contrast the two following two constructions of “I love you”:


    (German)   Ich liebe dich
               SUBJ VERB OBJ
               I    love  you
    (Japanese) Watashi wa anata o aishiteimasu
               SUBJ        OBJ     VERB
               I           you     love

    This distinction between “VO” languages like German and “OV” languages like Japanese has implications for how language speakers conceptualize social difference and distance. By grammatically requiring separation between subjects and objects, VO languages lead their speakers to conceptualize the social world in “us-them” terms. Consistent with this prediction, I document a highly statistically significant correlation between speakers of VO languages and nativist preferences (specifically, opposition to hiring immigrants) using over 200,000 respondents to the World Values Survey, covering over one hundred countries across three decades and controlling for a rich set of demographic features. These results contribute to the emerging literature on the linguistic origins of economic and social beliefs, and also suggest that the very languages that we speak affect our conceptualizations of identity and belonging.

    Be sure to download the paper to read more. The results may surprise you.

  • The Causal Inference Revolution and Statistical Reasoning

    Judea Pearl and Dana Mackenzie’s The Book of Why is a gem of a book that introduces Pearl’s graphical calculus for causal discovery to a general audience. It has, as is typical, occasioned a pointed debate about whether the causal graphs approach is the best way to represent the science of causality—as opposed to, say, a statistical or econometric approach. Previous instances may be found on Pearl’s blog. The present case can be found in Andrew Gelman’s blog, where he reviews The Book of Why.

    Gelman reserves his criticisms for Pearl and Mackenzie’s claims that his methods have provided solutions to problems that have previously had no solutions:

    Ummm, I’m pretty sure that scientists could do all these without the help of Pearl!

    It comports with the reactions from Kevin Gray from over the summer:

    some of what he claims is new or even radical thinking may cause some statisticians to scratch their heads since it’s what they’ve done for years, though perhaps under a different name or no name at all.

    Such reactions comport with my own too. The case study of smoking and cancer is fascinating because it provides an excellent introduction to the problem of causal inference when experiments are impossible. But this claim…

    Millions of lives were lost or shortened because scientists did not have an adequate language or methodology for answering causal questions (p. 19).

    No, millions of lives were lost because tobacco companies had a financial interest in marketing a product that kills people, and when confronted with evidence of the link between smoking and lung cancer, worked to sway public opinion and cultivated scientific uncertainty. Applied science is not an engineering problem, it is a social process.

    What The Book of Why does extraordinarily well—even more so than Pearl’s earlier book, Causality—is to introduce the power of the causal graph as a tool of causal reasoning. Over the past thirty years, Pearl and collaborators have developed a complete language for causal reasoning that allows us to compute, given a set of causal relationships encoded in a path diagram, if a causal effect can be identified. It is a nonparametric identification engine. This language makes some things very obvious that are otherwise very hard for students to see, and when I teach students about causal identification, I rely heavily on the causal graphs approach.

    1. X ← U → Y is much more straightforward than, say, Cov(X,U)≠0 for capturing the intuition behind conditional independence in a regression framework.
    2. The “front-door criterion” is really easy to explain using causal graphs. Hard otherwise (see, relatedly, Marc Bellemare’s recent post).
    3. The conceptual distinction between a collider and a confounder is very clear; it is far superior to discussions of control variables.

    But as to whether this is just a useful language of representation versus a revolution, I’m not so sure. The core innovation is a graphical algorithm to compute the nonparametric identifiability of causal relations between variables given assumptions encoded in relationships among other variables in a system. But we still require knowledge about those other relations. The problem of circularity is apparent. As Gelman writes,

    If you think you’re working with a purely qualitative model, it turns out that, no, you’re actually making lots of data-based quantitative decisions about which effects and interactions you decide are real and which ones you decide are not there.