Category: Research

  • The Journal of East Asian Studies Will Transition to Full Open Access by 2025

    As editor of the Journal of East Asian Studies, I am very pleased to announce an exciting change to JEAS’s publication model. In collaboration with Cambridge University Press and with the full support of the East Asia Institute, starting in 2025 JEAS will transition to a fully Open Access publication model. This means that starting next year, every new article published by JEAS will be freely available to any reader, anywhere in the world, at no cost. Even more importantly, none of the costs of this transition to full Open Access will be borne directly by our authors.

    This is a major milestone for JEAS, and for scholarly publishing on East and Southeast Asian politics. But it is the culmination of a process that has been in motion for at least a decade, as academic institutions and funding organizations have realized that the traditional model of scholarly publication placed substantial barriers on scholarly information and exchange. For JEAS’s audience—readers and prospective authors alike—that publication model was particularly harmful, because it made it difficult for readers who aren’t employed in wealthy institutions to access our journal. For authors who wanted to publish their articles under an Open Access model, the costs of doing so—through “Article Processing Charges”—were often prohibitive.

    Even with these constraints, JEAS has already made great strides towards Open Access publication. In 2022, 61% of our research articles were published on an Open Access basis. This next step will remove all remaining barriers to access, making all research in JEAS available to everyone, regardless of position, institutional affiliation, or ability to pay.

    The new model that CUP and EAI have introduced works because CUP has secured agreements with thousands of institutions around the world to support Open Access publication for their own researchers. These agreements allow CUP to subsidize Open Access publication for any author who is not covered by such an agreement, which is especially important for scholars from East and Southeast Asia. This means that beginning in 2025 (Volume 25), any author whose manuscript is accepted for publication for JEAS will either be covered by CUP’s existing agreements, or will be automatically granted a waiver for all Article Processing Charges by CUP.

    This is an exciting time for JEAS. Thanks to the hard work of my predecessors Byung-Kook Kim and Stephan Haggard, and to the dedication of our excellent Editorial Board and our committed peer reviewers, I am fortunate to step into my position as editor knowing JEAS has an outstanding reputation for publishing cutting-edge research on politics, political economy, and international relations in East and Southeast Asia. I am committed to continuing to build JEAS’s international reputation as a premier venue for scholars of Asian politics, one that is open to submissions from any methodological tradition or theoretical perspective but with high standards for evidence, analysis, and argument.

    But my other commitment is for JEAS to amplify the voices of underrepresented scholars—not just from Asia but from around the world—doing research on East and Southeast Asian politics and society. Our transition to full Open Access is an important step towards this goal. Stay tuned for more, and check out the excellent research that is published and forthcoming at JEAS.

  • ChatGPT-4 Knows Malaysia’s Ethnic Order

    I am writing a book about Malaysia’s ethnic order*, using the Malay world as a way to think more generally about social categories like ethnicity and how to conceptualize them. This book project combines all of my interests: maritime Southeast Asia, political economy, colonial history (Portuguese, Dutch, English, and Japanese), language, analytic philosophy, social ontology, diaspora studies, criticizing things for not really making sense, trying to subvert paradigms, making custom maps, and statistics. The last of these is the subject of this post.

    One of the tasks of my book is to demonstrate that my understanding of Malaysia’s ethnic order is not simply a projection of my own biases onto Malaysian history, society, and culture. To do this, I’ve been using survey data collected from Malaysians (and Indonesians… although that’s not really relevant for today’s post) to try to characterize the country’s ethnic order. An ethnic order, for me, is

    the set of beliefs, practices, and ancillary institutions that organize and sustain ethnic categories within a given political community

    Such an abstract concept is hard to characterize, and hard to survey people about without supplying exactly the answers that I hope that they will provide (thus invalidating the exercise). So what I have done is to survey Malaysians and ask them questions about what comes to mind when they think of various ethnic groups in the country. They provided open-ended responses, which I analyzed through a Structural Topic Model to estimate the relative salience of different types of responses for different types of ethnic groups.

    The result is an analysis that says that, for example, when responses are generally focusing on things like language and religion, they are more likely to be talking about Malays versus Chinese. When they’re talking about economic matters, they are more likely to be talking about Chinese than Indians. And so forth. When the book is done, it will explain why such results are consistent with my particular approach to understanding Malayness in Malaysia.**

    But with ChatGPT-4 and other large language models (LLMs) completely blowing up everything, it occurred to me this morning that one might wonder what the computer says about my data. I was inspired to do this by reading this paper from Petter Törnberg, which used ChatGPT-4 to predict what the partisan affiliated with the authors of tweets. I’m reporting here what I found from some initial playing around, and it is remarkable, scary, and open to a lot of interpretation (more on this at the end).

    Here is what I did: I logged into ChatGPT-4 and opened up my file containing the open-ended survey responses. I then typed the following at the prompt:

    I asked a Malaysian survey respondent the following question:

    Now I am going to ask you about several different ethnic groups. I would like to you to tell me two (2) things that come to mind when you think about these groups. There is no right or wrong answer; you may think of particular words or phrases, or perhaps nothing at all.

    I asked this about ethnic Malays, ethnic Chinese, and ethnic Indians in Malaysia. Imagine that the respondent gave the response “kurang rajin Islam.” Which of the three ethnic groups above do you think this response would be associated with?

    The words kurang rajin Islam are the responses that one survey respondent actually gave. It is a combination of two responses, because I asked the respondent this question twice (one response was kurang rajin and the other was Islam). After hitting Enter, the ChatGPT-4 spit out the following:

    Based on the response “kurang rajin Islam,” it is likely associated with the ethnic Malay group in Malaysia.

    “Kurang rajin” translates to “less diligent” in English, and “Islam” refers to the religion of Islam. The phrase suggests a perception or association related to the Malay community and their commitment or dedication to their religious practices. It’s important to note that this response reflects the perspective of the individual respondent and should not be taken as a generalization for the entire Malay ethnic group.

    This is actually bananas. The computer correctly identified which group that response was associated with. And it kept doing it. I asked

    Imagine that the respondent gave the response “pemegang ekonomi.” Which of the three ethnic groups above do you think this response would be associated with?

    And the computer said

    The response “pemegang ekonomi” translates to “economic holders” or “economic owners” in English. Given this response, it is likely associated with the ethnic Chinese group in Malaysia.

    The phrase “pemegang ekonomi” suggests a perception or association related to the economic prowess or prominence of the ethnic Chinese community in Malaysia. It reflects the recognition of their significant contributions to the economic sector in the country. However, it is important to note that this response represents the viewpoint of the individual respondent and should not be taken as a generalization for the entire ethnic Chinese group.

    It gets worse, but I’ll spare you. And although it doesn’t always work, these initial forays show that LLMs are remarkably accurate at predicting what group a short survey response was referring to in this specific example. You can think of a lot of other ways that one might study social categories using these AI tools.

    I don’t know exactly what to do with these results, but I do know that they’re getting at something significant. For one, the AI developers are mostly trying to rein in LLMs to prevent them from saying racist, sexist, and otherwise biased things. But here is a case where we have the reverse: actual statements (and rather explicitly biased in both cases) by real humans that can be classified according to the best guess of who is being referred to. How can this be stopped?

    But I think that there’s another level at which to think about this result. That is, knowing that ChatGPT-4 is not thinking, but rather just doing a super-duper prediction model of what words are most likely to be sensible responses to other words, what could one learn from this? Is this evidence of something, and if so, of what? What people think? What people have written down? What Malaysians have written down? What researchers or journalists have written down?

    There are even more levels. What about the coarseness of my own prompt (I wrote Malaysia, when I really mean peninsular Malaysia)? Should I give feedback on these results, thus contributing to reinforcement learning? If I write up these results, would I then be further contributing to stereotypes by producing a text that will feed into the same LLMs to produce even more such results? What if my own ethical goals are not to reinforce, but rather to undermine the present social order?

    I don’t have any great answer to these questions. I am a pessimist about what LLMs will do to human society who nevertheless marvels at what they can do sometimes.

    NOTES

    * You can think of an ethnic order as like a “racial order.”

    ** But not Indonesia. That’s important. And also, not really even Sabah/Sarawak, this is about peninsular Malaysia. That’s also important.