The Indonesian Linguistic Diversity Challenge

This weekend some colleagues and I are hosting a delegation visiting from Badan Pusat Statistik (BPS), the Indonesian central statistical agency. We are hoping to find a way for Cornell to warehouse some of the vast wealth of statistical data that BPS has been collecting for the past fifty years. As part of BPS’s presentation, we learned about something called the Sekolah Tinggi Ilmu Statistik, which you’d translate as something like the Institute for Statistical Sciences.

The Institute for Statistical Sciences

Looking at the name of the institute, I was struck by something interesting: each word has a different etymology. Check it out:

  • Sekolah was borrowed into Indonesian from the Portuguese escola, for “school”
  • Tinggi is from Malay, and means “high”
  • Ilmu was borrowed from the Arabic ‘ilm (عِلْمٌ), or “knowledge”
  • Statistik was borrowed from the Dutch Statistiek, which of course means “statistics”

Four words, four different origins. It got me thinking: how many different word origins could you fit into one natural Indonesian phrase or sentence?

Which leads me to the Indonesian Linguistic Diversity Challenge. The goal is to produce a phrase or sentence that contains the most number of word origins in the most compact way possible. I can take Sekolah Tinggi Ilmu Statistik and lengthen it by one: Perpustakaan Sekolah Tinggi Ilmu Statistik, or “the library at the Institute of Statistical Sciences,” by adding Perpustakaan, formed from the Sanskrit root pustaka, or “book.” But that’s kinda cheating.

The most compact example I can think of by myself is lumpia pisang coklat keju:

  • lumpia borrowed from the Hokkien lun pia (潤餅), and refers to a spring roll
  • pisang is just the Malay word for “banana”
  • coklat is “chocolate,” from Dutch
  • keju is from Portuguese queijo, or “cheese”

Food is a particularly good area for inspiration (many words from Hokkien), as are household items (Dutch and Portuguese); science, culture, and literature (Arabic and Sanskrit); and technology (English).

The ultimate winner of the challenge would be a phrase or sentence that combines the nine most common sources of Indonesian vocabulary—Arabic, Dutch, English, Hokkien, Javanese, Malay, Portuguese, Persian, and Sanskrit—in just nine words. Frankly, that seems almost impossible, given that English, Javanese, and Persian are actually pretty rare.

Much more likely but no less impressive would be one that included the “Big Six”—Arabic, Dutch, Hokkien, Malay, Portuguese, and Sanskrit—in six words. I leave it to you, readers. The winner gets 1 internet.