If It Rains Tomorrow, I Save

The dork blogs are all abuzz about this working paper (PDF) by Keith Chen entitled “The Effects of Language on Economic Behavior.” Here’s the abstract

Languages differ widely in the ways they partition time. In this paper I test the hypothesis that languages which grammatically distinguish between present and future events (what linguists call strong-FTR languages) lead their speakers to take fewer future-oriented actions. First, I show how this prediction arises naturally when well-documented effects of language on cognition are merged with models of decision making over time. Then, I show that consistent with this hypothesis, speakers of strong-FTR languages save less, hold less retirement wealth, smoke more, are more likely to be obese, and suffer worse longrun health. This is true in every major region of the world and holds even when comparing only demographically similar individuals born and living in the same country. While not conclusive, the evidence does not seem to support the most obvious forms of common causation. Implications of these findings for theories of intertemporal choice are discussed.

This paper has been called a first attempt at Whorfian economics, which hearkens back to the old Sapir-Whorf Hypothesis which holds that, roughly speaking, language independently shapes human behavior. There are both strong and weak versions of this. The strong version would look at a language like Navajo that only grammatically encodes the colors red, white, and black to conclude that Navajo-speakers will have a more difficult time conceiving of green versus blue than English speakers (and that Russians will have an easier time distinguishing light blue (голубой) from dark blue [синий] than English speakers). The weak version would make much (of course) weaker claims, but still hold that we should be able to observe differences in human behavior across otherwise identical people if they speak different languages.

I happen to have gone to college to study Linguistics. In fact, I actually tried to major (*ahem*, concentrate) in Anthropological Linguistics before a very kind adviser told me that that was even stupider than just majoring in Linguistics. So the debate about whether language affects human behavior is near and dear. It also happens to be the single most common argument that I have with my father-in-law, so it comes up. I strongly believe that the Sapir-Whorf hypothesis as commonly understood by non-specialists is a huge mistake, and deeply flawed (it’s a post for another time about just why I have such strong reactions against it) but there is good evidence that in some of its weaker forms, in a probabilistic sense, it might hold for particular issues, e.g. the relative prevalence of perfect pitch among Mandarin speakers.

So that’s why this paper fascinates me. The argument is simple. All languages have the ability to talk about future events, but some of them require the speaker to make particular grammatical gestures to do that. English is one. If you want to talk about the possibility of rain tomorrow, you have to say it will rain tomorrow. You cannot say *it rains tomorrow. In Indonesian, however, you don’t need to do that. You can say hujan besok (which translates literally to rain tomorrow). In German, you can say Morgen regnet es (or tomorrow rain it). All human languages can be divided into two groups: those that require grammatical encoding of future events (English) and those that do not (German, Indonesian). The former are called Strong-FTR languages, and the latter Weak-FTR languages.

Chen’s hypothesis is that people will behave differently with regard to the future based on their native language, which I find a bold prediction but not that surprising given the literature. What he then proposes is that he should be able to observe such differences in their economic activities. To me, that’s an amazingly bold claim! Here is a specific proposition:

languages with strong-FTR force their speakers to differentiate present and future events when speaking about them. It seems plausible that with finer distinctions in timing comes greater precision of beliefs…if more finely partitioning events in time leads to more precise beliefs, weak-FTR language speakers will be more willing to save than their strong-FTR counterparts. Intuitively, since discounting implies that the value of future rewards is a strictly-convex function of time, uncertainty about the timing of future payoffs makes saving more attractive.

I mean, wow.

The bulk of the paper goes into establishing that net of a lot of other systematic determinants, it looks like people who speak strong-FTR actually are less likely to report having saved in the past year than those who speak weak-FTR languages. I want to commend Chen: most of the critiques I’ve seen of his paper hold that he’s missing some omitted variable somehow, and he’s been very careful to rule out the most likely reasons why this would be a spurious correlation.

Still, like many readers, I suspect, I have a tough time buying this. But we’re scientists here, so we go with the evidence rather than our intuitions or gut, which both tell me to run screaming from this finding. I want to comment on four things: two theoretical issues, an empirical question, and a methodological issue. (I also highly recommend that you read my former professor Julie Sedivy and her comments on it.)

First theoretical point: I don’t know this for sure, but I believe that the distinction between strong-FTR and weak-FTR is not as grammatically encoded as we think. The example is in the title to this post: if it rains tomorrow, I save. Here I have constructed a grammatically correct English sentence in which I speak about a possible future state but in which I have never had to use the word will or anything like that. I never grammatically encoded the future, it’s understood from context. Here’s why that matters: Chen’s argument is that strong-FTR languages oblige speakers to divide time in particular ways. It’s easy for me to construct an example in which grammar doesn’t force me to do this. I do have to speak conditionally, but if the argument is that encoding in non-conditional contexts is what matters, then that must be made clear.

Second theoretical point (UPDATE: See comments at the end of the post for a helpful correction; thank you, readers!): this is one of those papers in which I was entirely open to the possible that the exact opposite of the author’s hypothesis was the hypothesis to be tested. Consider this statement: “if more finely partitioning events in time leads to more precise beliefs, weak-FTR language speakers will be more willing to save than their strong-FTR counterparts.” It rests on the idea that weak-FTR languages partition time more finely; Chen tells us that “it seems plausible” that this is true. Well not to me. Why does that follow from the lack of grammatical encoding of the future tense? What if someone told you the exact opposite: speakers strong-FTR languages partition time more finely (because, say, they have to talk about it). I would be just as likely to believe that theory. That just makes me very, very nervous about the theoretical underpinnings of this completely contrary finding.

Now onto empirics: the results here rely on a statistical method called conditional logistic regression, which despite the fact that it appears in STATA and on a couple of grad syllabi, is not widely understood by political scientists. I had to read up a lot on this method to figure out exactly what was happening to generate these results. I think that the paper could benefit from a much, much richer discussion of how conditional logistic regression “matches an individual with others who are identical on every dimension listed above, but who speak a different language”. All of the inference rests on this point. I’m not saying that this is wrong, but rather that this strikes me as a rather imprecise way of describing exactly what’s happening. If I can be confused, others can be too.

More on point, it’s interesting to me that the author adopts this methodology (which allows for a huge number—millions!—of potential fixed effects regardless of the sample size), but then enters a couple of variables (Trust, Employment, Beliefs about Saving) into the models as linear predictors instead of dummied-out and jointly interacted fixed effects like the others. As a reviewer, if I saw this I’d immediately ask what happens if these are included as fixed effects too…a more flexible modeling strategy that seems in the spirit of the overall analysis anyway. The fact that Chen does not do this strikes me as fishy. In general, this is the type of paper where I’d like to play with the analysis code myself to see what commands are being entered into the computer to produce these results.

Finally, a methodological question. Andrew Gelman has written about what he calls “Type M” errors, the errors that arise when we try to estimate small effects. By any stretch of the matter, I think it’s reasonable to assume that the effect of grammatical encoding of the future on savings behavior is a small one! Yet Chen’s baseline estimate is that “strong-FTR families sav[e] only 46% as often…as weak FTR families.” That’s a gigantic effect (although to be fair, maybe the baseline savings rate in the entire population is only 1%, we can’t tell from the paper). Gelman observes that the classical null hypothesis testing in a model like this is particularly likely to give results that imply large effect sizes (such as this) when the actual effect size is small, as it probably ought to be if it exists as all. This analysis seems ripe for a Bayesian reanalysis.

In all, I’ve written 1600 words on this. If nothing else, that tells me that this is interesting food for thought.

Comments

7 responses to “If It Rains Tomorrow, I Save”

February 24, 2012

Reading Week | Matt Glassman

[…] Tom Pepinksy reviews a new paper on language and economic behavior. I’m not sure there’s more than a handful of people […]
February 24, 2012

Matt

On your second theoretical point, I think you might have the proposition backwards here: Chen is saying that strong FTR languages partition language more finely, not weak FTR languages, and that the partitioning in itself opens up for a possible future to be devalued. Example:

“Put another way, I ask whether a habit of speech which distinguishes present from future, can lead to a habit of mind that devalues future rewards.”

So the hypothesis which you find more probably is the hypothesis being tested, no?
February 24, 2012

Matt

That should read “more probable”‘ which is what I get for typing on an iPad…..
February 24, 2012

Tom

Thanks for reading. Yes, I think you’re right. “languages with strong-FTR force their speakers to differentiate present and future events when speaking about them. It seems plausible that with finer distinctions in timing comes greater precision of beliefs…if more finely partitioning events in time leads to more precise beliefs”

So, strong-FTR -> finely partitioning time -> more precise beliefs -> devaluation of future -> less saving; weak-FTR speakers will save more

Still, the fact that I was confused still kinda proves my point! Let me just propose the opposite: weak-FTR -> finely partitioning time because the speaker has to work with context to understand time rather than having grammar encode it. If you told me that hypothesis I’d be willing to accept it.
February 27, 2012

The effect of language on savings | Blog Pra falar de coisas

[…] that saving is the same as to postpone present consumption). A good critical summary can be found here (by Tom Pepinsky). And here’s the […]
February 27, 2012

Manoel Galdino

I made a quick comment in my blog about the conditional logistic regression. I didn’t have the time to read in the detail the paper, so I may have misunderstood something. However, I found it quire surprising that he had more than one million categories in one model (this is too many parameters).

best,
Manoel

ps.: here is the link to my blog post.

The effect of language on savings
February 27, 2012

Tom

Thanks for reading, Manoel. Your reaction is precisely the same as mine: it seems supremely odd to condition on so many parameters, more than their are observations. I’m still wrapping my head around the fact that that’s even possible.