One of the hard things about studying ethnicity is that socially-embedded meanings and ideas about ethnic identity are hard to uncover. Qualitative and contextual research is essential, but this stands in the way of other important goals such as generalization and inter-group comparison. Political scientists who study ethnicity in the comparative context have struggled to characterize exactly how understandings of ethnicity differ across contexts. Some of the most important work on identity relies on situations where “objective cultural differences… [between groups] are identical,” which obviously makes it impossible to study differences. Other work comparing ethnic groups is forced to assume that they can be treated identically aside from their population share and distribution. This, for example, is implicit in the creation of ethnic fractionalization indices (PDF). There are alternatives to fractionalization that rely on some metric of cultural similarity or distance, but such metrics are coarse, usually unidimensional, and imposed by the researcher. It is always possible to embed questions about ethnicity or identity in surveys, but to be useful, the survey creator already must know what questions to ask, what dimensions matter, and how important they are relative to one another.
In a new working paper I propose a different way to identify the “content” of ethnic identity and how it differs across groups. In survey data collected in 2017 in peninsular Malaysia and in three provinces (and one city) in Sumatra, enumerators asked each respondent to say two things that came to mind when respondents thought of the Malay ethnic group. Most respondents were able and willing to answer this question. Sometimes the answers are sorta funny (one Javanese respondent in Sumatra said “talk too much” [= suka bicara panjang], one Malay respondent in Malaysia said “easily colonized” [= bangsa yang mudah di jajah]). But we don’t want to pick out to silly responses, we want to pick out general patterns.
To do this, I used a text-analytic procedure called structural topic modeling to uncover, from among the nearly 2000 responses, coherent “ideas” (or “topics”) about what Malayness means. I then used features of the respondents—age, gender, their own ethnic group, and most importantly for my purposes, whether they are Indonesian or Malaysian—to predict the likelihood that any respondent would invoke each of these ideas. (The method comes from Roberts et al. 2014.) To visualize the results, see the figure below.
The result is a test of the hypothesis that, say, Malaysian respondents are more likely to talk about Islam (or religion in general) when describing Malays than are Indonesian respondents. That is what the above figure shows. It also shows that Malaysians are more likely to use words like “lazy” [= malas] and to invoke royalty (or more generally governance) [= raja].
There are many other ways that one might slice these data; this is just an illustration. For a longer and more detailed introduction to this method and its uses, including exploring differences across groups within one country, see here.