Category: Language

  • Language and Indonesian Regionalism

    On Thursday, Joe Errington gave a fascinating presentation entitled “In Search of Middle Indonesian” at the SEAP Brown Bag. Joe is a linguist and an anthropologist, yet his presentation hit on a number of basic and enduring political science themes about the construction of national identity in the post-colonial world.

    Alas, there is no recording, and no paper to circulate, but I did jot down some snippets that capture the message of the talk. The basic facts, which generate the fascinating puzzle, are these: On one hand, “everywhere you go people believe that [the language we know as Bahasa Indonesia is] self-evidently and legitimately the language of Indonesia and that everyone should speak it.” On the other hand, there is “no native-speaking models of Indonesian” which means that “when linguists want to characterize ‘colloquial Indonesian’ they must describe the language of a place.” Yet in common practice “non-standard language use is not understood to be different from some standard” even though there actually is a standard variety of formal Indonesian that Indonesians learn in school.

    So the following linguistic repertoire is common for people who grow up outside of Jakarta or various provincial capitals.

    LOCAL ETHNIC LANGUAGE: e.g. Javanese, or Sasak, or Uab-Meto, learned at home and used in everyday life

    FORMAL LANGUAGE: “proper” Bahasa Indonesia, learned in school, seen on national TV

    But when people like this move to a city (Joe’s example was Kupang in West Timor) they find that people do not speak either the local language or formal Indonesian. They speak something else, a regional koine with no official status but which usually cannot be easily understood by speakers of formal Indonesian yang baik dan benar (“which is good and proper”) even though it is clearly (like Indonesian) some outgrowth of a Malay-based lingua franca. I gather that this what Joe means by Middle Indonesian, and that the point is that there are many Middle Indonesians.

    MIDDLE INDONESIAN: Jakarta Malay, Kupang Malay, etc., not formally recognized by most speakers as languages or even dialects but rather as some sort of diminished or improper thing, like a slang

    This means that even when Indonesian census takers report that there are now people who speak Indonesian as their first language, what they really mean is that there are people who speak Jakarta Malay, Kupang Malay, etc. as their first language. Both JMP and I have some experience navigating this uneasy distinction between Jakarta Malay and the language that we learned in school. I often find myself hanging out with Jakartans and not being able really to follow what they are saying, at which point they consciously switch to standard Indonesian and apologize. Many times I have been told by Indonesians that my Indonesian is very good, and it took me awhile to realize that they don’t mean that I am good at speaking Indonesian, but rather that I am speaking a good kind of Indonesian adequately!

    So why is this political? Because the standard story of Indonesian is that the entire idea of Bahasa Indonesia is political: Bahasa Indonesia means “Indonesia language” and can only be understood as something which emerged to reinforce the idea of Indonesia itself. Formal standard Indonesian is not the native language of anyone—Joe called it “un-native Indonesian”—even though more than 90% of Indonesians speak it fluently and it is “self evidently and legitimately the language of Indonesia.” It is not an “ethnic” or “regional” language, and notice that this means that “all ethnic languages are equivalent, and in an equivalent relationship to the standard language, because they are regional languages.” So Javanese = Uab-Meto, and both are subordinate to Indonesian. This only makes sense as part of a project to subordinate Javaneseness and Timoreseness to Indonesianness.

    But Joe argues that in the same way that Indonesian generates a sense of we-ness for Indonesians, Middle Indonesians are generating senses of we-ness for regional communities too. We now see billboards, for example, written in Kupang Malay, which means that this mode of speaking is believed to be meaningful for the people who read it. And this means that there is a group of people who are excluded: all the non-local Indonesians who cannot easily use Kupang Malay.

    Now here is where Joe’s presentation ends and my editorial comment begins. What remains to be seen is whether the “we-ness” that Jakarta Malay and Kupang Malay generate today has the same sorts of consequences for Indonesian regional identities that Indonesian had for the Indonesian national identity. We will see this especially in the cities, where a new generation is growing up speaking these Middle Indonesiansbut not formal standard Indonesian—as their native language. It remains to be seen whether Kupang Malay and the other Middle Indonesians are merely interesting linguistic phenomena, or politically meaningful origins for regional identities.

    My guess is that the latter is unlikely, at least in my lifetime. I bet that over the next two or three decades, Jakarta Malay will become the high-status register and that the other Middle Indonesians will be used in parallel as low-status registers in the regions, as markers of place and perhaps class, rather like Geordie or Scouse, but not of political identity in any consistent way.

  • If It Rains Tomorrow, I Save

    The dork blogs are all abuzz about this working paper (PDF) by Keith Chen entitled “The Effects of Language on Economic Behavior.” Here’s the abstract

    Languages differ widely in the ways they partition time. In this paper I test the hypothesis that languages which grammatically distinguish between present and future events (what linguists call strong-FTR languages) lead their speakers to take fewer future-oriented actions. First, I show how this prediction arises naturally when well-documented effects of language on cognition are merged with models of decision making over time. Then, I show that consistent with this hypothesis, speakers of strong-FTR languages save less, hold less retirement wealth, smoke more, are more likely to be obese, and suffer worse longrun health. This is true in every major region of the world and holds even when comparing only demographically similar individuals born and living in the same country. While not conclusive, the evidence does not seem to support the most obvious forms of common causation. Implications of these findings for theories of intertemporal choice are discussed.

    This paper has been called a first attempt at Whorfian economics, which hearkens back to the old Sapir-Whorf Hypothesis which holds that, roughly speaking, language independently shapes human behavior. There are both strong and weak versions of this. The strong version would look at a language like Navajo that only grammatically encodes the colors red, white, and black to conclude that Navajo-speakers will have a more difficult time conceiving of green versus blue than English speakers (and that Russians will have an easier time distinguishing light blue (голубой) from dark blue [синий] than English speakers). The weak version would make much (of course) weaker claims, but still hold that we should be able to observe differences in human behavior across otherwise identical people if they speak different languages.

    I happen to have gone to college to study Linguistics. In fact, I actually tried to major (*ahem*, concentrate) in Anthropological Linguistics before a very kind adviser told me that that was even stupider than just majoring in Linguistics. So the debate about whether language affects human behavior is near and dear. It also happens to be the single most common argument that I have with my father-in-law, so it comes up. I strongly believe that the Sapir-Whorf hypothesis as commonly understood by non-specialists is a huge mistake, and deeply flawed (it’s a post for another time about just why I have such strong reactions against it) but there is good evidence that in some of its weaker forms, in a probabilistic sense, it might hold for particular issues, e.g. the relative prevalence of perfect pitch among Mandarin speakers.

    So that’s why this paper fascinates me. The argument is simple. All languages have the ability to talk about future events, but some of them require the speaker to make particular grammatical gestures to do that. English is one. If you want to talk about the possibility of rain tomorrow, you have to say it will rain tomorrow. You cannot say *it rains tomorrow. In Indonesian, however, you don’t need to do that. You can say hujan besok (which translates literally to rain tomorrow). In German, you can say Morgen regnet es (or tomorrow rain it). All human languages can be divided into two groups: those that require grammatical encoding of future events (English) and those that do not (German, Indonesian). The former are called Strong-FTR languages, and the latter Weak-FTR languages.

    Chen’s hypothesis is that people will behave differently with regard to the future based on their native language, which I find a bold prediction but not that surprising given the literature. What he then proposes is that he should be able to observe such differences in their economic activities. To me, that’s an amazingly bold claim! Here is a specific proposition:

    languages with strong-FTR force their speakers to differentiate present and future events when speaking about them. It seems plausible that with finer distinctions in timing comes greater precision of beliefs…if more finely partitioning events in time leads to more precise beliefs, weak-FTR language speakers will be more willing to save than their strong-FTR counterparts. Intuitively, since discounting implies that the value of future rewards is a strictly-convex function of time, uncertainty about the timing of future payoffs makes saving more attractive.

    I mean, wow.

    The bulk of the paper goes into establishing that net of a lot of other systematic determinants, it looks like people who speak strong-FTR actually are less likely to report having saved in the past year than those who speak weak-FTR languages. I want to commend Chen: most of the critiques I’ve seen of his paper hold that he’s missing some omitted variable somehow, and he’s been very careful to rule out the most likely reasons why this would be a spurious correlation.

    Still, like many readers, I suspect, I have a tough time buying this. But we’re scientists here, so we go with the evidence rather than our intuitions or gut, which both tell me to run screaming from this finding. I want to comment on four things: two theoretical issues, an empirical question, and a methodological issue. (I also highly recommend that you read my former professor Julie Sedivy and her comments on it.)

    First theoretical point: I don’t know this for sure, but I believe that the distinction between strong-FTR and weak-FTR is not as grammatically encoded as we think. The example is in the title to this post: if it rains tomorrow, I save. Here I have constructed a grammatically correct English sentence in which I speak about a possible future state but in which I have never had to use the word will or anything like that. I never grammatically encoded the future, it’s understood from context. Here’s why that matters: Chen’s argument is that strong-FTR languages oblige speakers to divide time in particular ways. It’s easy for me to construct an example in which grammar doesn’t force me to do this. I do have to speak conditionally, but if the argument is that encoding in non-conditional contexts is what matters, then that must be made clear.

    Second theoretical point (UPDATE: See comments at the end of the post for a helpful correction; thank you, readers!): this is one of those papers in which I was entirely open to the possible that the exact opposite of the author’s hypothesis was the hypothesis to be tested. Consider this statement: “if more finely partitioning events in time leads to more precise beliefs, weak-FTR language speakers will be more willing to save than their strong-FTR counterparts.” It rests on the idea that weak-FTR languages partition time more finely; Chen tells us that “it seems plausible” that this is true. Well not to me. Why does that follow from the lack of grammatical encoding of the future tense? What if someone told you the exact opposite: speakers strong-FTR languages partition time more finely (because, say, they have to talk about it). I would be just as likely to believe that theory. That just makes me very, very nervous about the theoretical underpinnings of this completely contrary finding.

    Now onto empirics: the results here rely on a statistical method called conditional logistic regression, which despite the fact that it appears in STATA and on a couple of grad syllabi, is not widely understood by political scientists. I had to read up a lot on this method to figure out exactly what was happening to generate these results. I think that the paper could benefit from a much, much richer discussion of how conditional logistic regression “matches an individual with others who are identical on every dimension listed above, but who speak a different language”. All of the inference rests on this point. I’m not saying that this is wrong, but rather that this strikes me as a rather imprecise way of describing exactly what’s happening. If I can be confused, others can be too.

    More on point, it’s interesting to me that the author adopts this methodology (which allows for a huge number—millions!—of potential fixed effects regardless of the sample size), but then enters a couple of variables (Trust, Employment, Beliefs about Saving) into the models as linear predictors instead of dummied-out and jointly interacted fixed effects like the others. As a reviewer, if I saw this I’d immediately ask what happens if these are included as fixed effects too…a more flexible modeling strategy that seems in the spirit of the overall analysis anyway. The fact that Chen does not do this strikes me as fishy. In general, this is the type of paper where I’d like to play with the analysis code myself to see what commands are being entered into the computer to produce these results.

    Finally, a methodological question. Andrew Gelman has written about what he calls “Type M” errors, the errors that arise when we try to estimate small effects. By any stretch of the matter, I think it’s reasonable to assume that the effect of grammatical encoding of the future on savings behavior is a small one! Yet Chen’s baseline estimate is that “strong-FTR families sav[e] only 46% as often…as weak FTR families.” That’s a gigantic effect (although to be fair, maybe the baseline savings rate in the entire population is only 1%, we can’t tell from the paper). Gelman observes that the classical null hypothesis testing in a model like this is particularly likely to give results that imply large effect sizes (such as this) when the actual effect size is small, as it probably ought to be if it exists as all. This analysis seems ripe for a Bayesian reanalysis.

    In all, I’ve written 1600 words on this. If nothing else, that tells me that this is interesting food for thought.