Category: Indonesia

  • The Indonesian Linguistic Diversity Challenge

    This weekend some colleagues and I are hosting a delegation visiting from Badan Pusat Statistik (BPS), the Indonesian central statistical agency. We are hoping to find a way for Cornell to warehouse some of the vast wealth of statistical data that BPS has been collecting for the past fifty years. As part of BPS’s presentation, we learned about something called the Sekolah Tinggi Ilmu Statistik, which you’d translate as something like the Institute for Statistical Sciences.

    The Institute for Statistical Sciences

    Looking at the name of the institute, I was struck by something interesting: each word has a different etymology. Check it out:

    • Sekolah was borrowed into Indonesian from the Portuguese escola, for “school”
    • Tinggi is from Malay, and means “high”
    • Ilmu was borrowed from the Arabic ‘ilm (عِلْمٌ), or “knowledge”
    • Statistik was borrowed from the Dutch Statistiek, which of course means “statistics”

    Four words, four different origins. It got me thinking: how many different word origins could you fit into one natural Indonesian phrase or sentence?

    Which leads me to the Indonesian Linguistic Diversity Challenge. The goal is to produce a phrase or sentence that contains the most number of word origins in the most compact way possible. I can take Sekolah Tinggi Ilmu Statistik and lengthen it by one: Perpustakaan Sekolah Tinggi Ilmu Statistik, or “the library at the Institute of Statistical Sciences,” by adding Perpustakaan, formed from the Sanskrit root pustaka, or “book.” But that’s kinda cheating.

    The most compact example I can think of by myself is lumpia pisang coklat keju:

    • lumpia borrowed from the Hokkien lun pia (潤餅), and refers to a spring roll
    • pisang is just the Malay word for “banana”
    • coklat is “chocolate,” from Dutch
    • keju is from Portuguese queijo, or “cheese”

    Food is a particularly good area for inspiration (many words from Hokkien), as are household items (Dutch and Portuguese); science, culture, and literature (Arabic and Sanskrit); and technology (English).

    The ultimate winner of the challenge would be a phrase or sentence that combines the nine most common sources of Indonesian vocabulary—Arabic, Dutch, English, Hokkien, Javanese, Malay, Portuguese, Persian, and Sanskrit—in just nine words. Frankly, that seems almost impossible, given that English, Javanese, and Persian are actually pretty rare.

    Much more likely but no less impressive would be one that included the “Big Six”—Arabic, Dutch, Hokkien, Malay, Portuguese, and Sanskrit—in six words. I leave it to you, readers. The winner gets 1 internet.

  • Fractionalization and Competition Reprise

    About nine months ago I posted about a peculiar finding about Indonesian local politics: the observation that there appeared to be high levels of political fragmentation in the most ethnically homogenous districts. It makes sense to observe that ethnic heterogeneity produces a fragmented party system, but not that ethnic homogeneity would do the same. I found this puzzling.

    Sometimes, though, a puzzling result is just a mistake. I think that that is what happened here. Together with my co-authors, I have been recreating political fractionalization and ethnic fractionalization scores from the original raw data, ethnicity data from the 2000 Census and party seat shares in the district legislatures from the General Election Commission. (We had been using indices created by someone else, probably for a different purpose.) In the process of doing this, I found a load of errors in our original data, many of them quite significant.

    Here is that same scatterplot using the corrected data.

    Ethnolinguistic and Political Fractionalization Indices

    The indices of political and ethnic fractionalization are the standard Herfindahl-style indices: if pi is the proportion of ethnic group (or political party) i in a district, then a district’s total fractionalization score is

    FRACTIONALIZATION = 1–Σ(pi)2

    Very straightforward stuff. In addition to a positive correlation between ethnic and political heterogeneity, we observe in this figure as well a classic example of heteroskedasticity: there is a higher variance in political fractionalization in more ethnically homogenous districts than in more ethnically heterogenous ones (a Breusch-Pagan test strongly rejects the null of homoskedasticity; some further digging indicates that the political fractionalization index is not normally distributed). We also see that the green dashed line (the linear fit) and the red solid line (the lowess fit) are nearly identical, which suggests that there isn’t any significant non-linearity in the bivariate relationship.

    Anyway, the question about what to do about the peculiar relationship that we’ve uncovered between ethnic fractionalization and political fragmentation turns out to be an artifact of some bad data. Happily, with the better data our earlier results appear even stronger: in Indonesian local politics, more political fragmentation -> lower budget surpluses.