Author: tompepinsky

  • AI Generated Maps of Southeast Asia Are Here

    AI Generated Maps of Southeast Asia Are Here

    Last summer, I tried to use generative AI models to create interesting maps of Southeast Asia. My results were disappointing on the whole, and led me to the skeptical conclusion that commercially available generative AI models were not great at fact-based spatial reasoning (even if it is very good at other things).

    But that was 2025. Now it’s 2026.

    So less than a year later, I’m back to considering whether generative AI can create maps of interesting things in Southeast Asia. Inspired by my own continuing obsession with the the region’s linguistic complexity—so many language families, such interesting spatial variation both across space and by altitude—we return to the case of drawing a map of the world’s major language families in Southeast Asia, something which probably exists but does not exist online and certainly not in the format which I need it.

    In August 2025, just ten months ago, the best I could obtain was this figure, which is hilarious and wrong in dozens of different ways. It cost US$20 and was generated by manually prompting o4-mini-high for about half an hour.

    Generative AI has changed a lot in the intervening year. The advances are pretty staggering. I didn’t know what “agentic AI” meant in June 2025; in June 2026 I work with Claude Code in my terminal, and read endless commentary on how to herd my agents. But most critically, the most advanced and expensive AI tools that were available to me in June 2025 have been surpassed by several new generations of models.

    In June 2026, for the same US$20, I am able to work with Claude Fable 5, which Anthropic bills as its most powerful and advanced consumer-facing product:

    Fable 5’s capabilities exceed those of any model we’ve ever made generally available. It is state-of-the-art on nearly all tested benchmarks of AI capability, showing exceptional performance in software engineering, knowledge work, vision, scientific research, and many other areas. The longer and more complex the task, the larger Fable 5’s lead over our other models.

    In true Silicon Valley fashion, Anthropic also teases us a little bit by warning us that Fable 5 might be a little bit dangerous!

    Releasing a model this capable comes with risks. Without safeguards, Fable 5’s capabilities in areas like cybersecurity could be misused to cause serious damage.

    I don’t think my light usage will endanger national security or the survival of the human species, but I guess it’s a waiting game right now.

    So now for the big reveal. Here is what I can now make using Claude’s most advanced AI model ever:

    This is good. It is, in fact, really good. The colors group linguistic areas by major language family, every label is legible and spelled correctly, and the shaded areas correspond to the locations in which these languages are spoken with a pretty high degree of precision. This is impressive. I can, and will, use this when I teach Southeast Asian politics, and that was the purpose of this exercise in the first place.

    With that said, the details and the process of generating this map may be of some interest. First, the details:

    • Fees: To make this map, I had to pay US$20 for my Claude Fable 5 subscription. Free versions of Claude, like other AI models, are still wholly unable to create something like this.
    • Time: It took about 24 hours to make this map. Partially this can be explained by limits on Claude messages; I could have made this in less time had I paid even more for a the most premium subscription.

    The costs associated with creating such a map—in terms of money and time—will decline over time as ever more powerful AI models replace current ones. In a year, I bet I will be able to do this for free.*

    The process is perhaps more interesting. Put very directly, the first attempts at this map contained major errors. Among the list of things that I had to manually explain through repeated prompting:

    • Rhade, Jarai, and several others are large minority languages in Vietnam’s Central Highlands.
    • The northern provinces of Vietnam are home to speakers of Kra-Dai languages, like the Tày and Nùng.
    • Assamese does not extend into Myanmar/Burma (an error in earlier versions).
    • Within China, Austroasiatic languages are not spoken in most of coastal Guangxi (an error in earlier draft versions), but they are widely spoken in some parts of Yunnan.
    • Mon languages in Myanmar/Burma are located primarily in Mon State and northern regions of Tanintharyi Region, not in the south (as in earlier draft versions).
    • There are speakers of Dravidian languages in Bhutan and Assam.
    • Tsat is an Austronesian language on Hainan.

    Moreover, I had to manually supply information on groups and their geographic locations:

    • Maps of the Hmong-Mien and Austroasiatic language families.
    • Lists and geographic locations for ethnic groups and minority language communities in Bangladesh, Cambodia, China, India, Indonesia, Laos, Myanmar, and Vietnam. These came from Wikipedia and Ethnologue.

    Other tweaks included explaining that Papuan and Australian are not language families, but rather geographic designations for non-Austronesian languages in the eastern and southern reaches of the map; replacing “Tai-Kadai” with “Kra-Dai” in the label, and removing extraneous labels and text with errors.

    All of that tells me, and should tell you, that the latest generative AI models are incredibly powerful. But they are also still meaningfully limited. They need guidance for factual accuracy and fidelity in representation. Because maps are models, not territories, an exercise such as this is bound to be imprecise, implying a tradeoff in clarity versus precision. If the goal is to show roughly where major language families are in Southeast Asia, advanced AI models can do it. If the goal is to illustrate the quirks and the details, you must supervise. It remains way too easy for the results produced by AI to be authoritatively wrong.

    It remains an open question whether I could have printed out a blank map, colored it in with help from Wikipedia, and scanned it to produce roughly the same result more quickly and cheaply. But make no mistake: Claude’s map is very accurate, and it is also beautiful.

    NOTES

    * I will set myself a reminder for 12 months from now, and will report back.

  • Agentic AI and Social Science Research Practice

    Agentic AI and Social Science Research Practice

    I’ve been exploring AI tools of various forms for more than a year now, mostly from a critical perspective of identifying things that they cannot do (such as draw maps), but also so that I can understand how they work. Just recently, inspired by Andrew Hall and Scott Cunningham and Andrew Little, I’ve started exploring Claude Code. And frankly, Claude Code seems like a game-changer for social scientists. 

    I am now convinced that generative AI does represent a seismic shift for the practice of social science research, but perhaps not in the way many initially expected. For years, AI evangelists and corporate boosters have proclaimed that generative AI displays reasoning and intelligence, but after all of this time, it is still not useable for serious research that involves generating novel prose.* But Claude Code really shines in a completely different domain: developing and executing code.

    The Citation Hallucination Problem

    Ask an LLM to provide citations supporting a theoretical argument in political science, and it will produce a mix of real and fabricated sources. The “hallucinated” citations often look entirely plausible, complete with realistic author names, journal titles, and publication years. But they do not exist. Or, if they do exist, they are scrambled up and inapposite for the claim being referenced. 

    This problem is well-understood, but it’s important to state it clearly. The problem is that when LLMs generate text about research, they are producing correct-sounding text based on their training data, not retrieving information from a database or matching a search string to a document. The result is prose that can sound compelling but which is unreliable.

    For social scientists accustomed to standard citation practices and evidence-based arguments, this creates an obvious problem. We can’t simply ask an LLM “What does the literature say about voter turnout?” and trust the response without extensive verification. I know for sure that people are LLMs right now to generate literature reviews and reference lists. Trust me, I have played around quite a bit with these applications, using the latest and most powerful AI tools with enterprise subscriptions to see how they fare when confronting real research tasks of the form that I encounter on a daily basis. The hallucination problem is real.**

    Creating and Executing Code

    But on the other hand… Upload a CSV file containing survey data to Claude Code and ask it to run a regression analysis and plot the quantities of interest, and it works. Not only does it work, it writes and executes code better and faster than what I can do on my own, producing reproducible and well-documented scripts to clean and analyze data.

    I believe that the difference boils down to the distinction between task completion and information retrieval. When Claude Code writes and executes an analysis script, it is not trying to predict what sounds plausible based on its training data. Instead, it is applying logical rules and programming syntax to manipulate data according to what it understands from your instructions. Programming rules and syntax are easy to learn because they are consistent across its training data. The code it generates follows basic statistical principles, and it can be verified by executing the script. Claude Code also is remarkably good at just automating away things like getting a Stata dataset into R, reading a codebook and cleaning data, and generating a report on what it has done. 

    Ask Claude Code to examine the relationship between education levels and political participation in your dataset, for example, and it will:

    • Load and examine your data structure
    • Clean the data as needed
    • Run regression models with the correct syntax, following whatever model you tell it to follow (and if you don’t tell it, it will guess, usually correctly),
    • Create appropriate visualizations, and
    • Save publication-ready tables and figures

    The distinction between knowledge retrieval and code execution is the distinction between LLMs and “agentic AI”—AI systems that can perform tasks and execute workflows rather than simply answering questions. I predict that this will lead to a major shift in how social scientists approach empirical research.

    Consider a typical research workflow in the quantitative social sciences: you have a dataset or some batch of data, a research question, and an analysis plan. In 2022, implementing this analysis required some familiarity with statistical software, either the scripting language or a GUI. You had to know how to read your data into the statistical package and how to clean it for us. You also had to know how to represent your analysis plans in that code. But Claude Code makes this all unnecessary. You don’t have to know R at all, or what package runs the ordered logit and how to specify that model, you just have to have R installed on your computer and connected to the internet.

    Want to test for heterogeneous treatment effects across demographic subgroups? Claude Code can write that code. Need to visualize a trend over time? Easy—you don’t need to remember how your software handles dates, Claude Code will figure it out. Require multiple robustness checks with different model specifications? Claude can generate dozens of variations in minutes, and organize the results.

    Interpretation is the Frontier

    The frontier between information retrieval and code execution is interpretation. In my experience, Claude Code can infer your intentions and execute your tasks, but if allowed or invited to interpret the results, it goes astray. In several test cases, Claude Code seemed to be guessing what I wanted to hear, interpreting basic regression results incorrectly but provocatively. I conjecture that this is because it is hard for an LLM to learn the rules about how to interpret statistical output, even output which it generated itself, because of the poverty of the stimulus

    This is dangerous, because Claude Code and other agentic AI tools will interpret your findings, sometimes even if you do not ask them to. And although this is just a hunch, I think that these tools’ interpretations might be a function of how you pose their prompts. These tools are just ripe for p-hacking of the worst form: the computer might generate the result that it “thinks” you want to hear, and if you’ve turned over the execution to the computer, you as the author might not even know what it has done and what it has “chosen” not to do.

    So What?

    What does this all imply for social science practice? The state of the art is evolving extremely quickly, and I hadn’t even heard of Claude Code until last month. But I suspect that I will use Claude Code (or some other tool) extensively to accelerate data cleaning, visualization, and related tasks. I am very skeptical of agentic AI’s knowledge claims, reasoning, and ability to interpret what it produces. 

    The shorthand is, use agentic AI for tasks that involve following rules. Do not use agentic AI for tasks that generate answers, arguments, or interpretations.

    My greater worries are about how the field responds. Yes, we will be faster at doing computer and data work, but there are basically no guardrails for how else it can be used. Let me be clear: with a dataset and a codebook and nothing else, I can have a research article draft in 10 minutes, which is probably executed correctly as a statistical matter, but likely interpreted incorrectly as a substantive matter.

    People will respond to this dramatic decrease in the cost of producing research articles.  At the Journal of East Asian Studies, I have already seen at least a half-dozen papers written by generative AI. There will be many more. The ones I have seen have basic problems of interpretation that I can identify, and there is a characteristic “tell” for what a Claude Code paper looks like.***  

    The big picture takeaway is, though, that as the costs of doing computer work continue to decline, the relative value of being able to read and interpret and understand just went up. For social scientists, the relative value of knowledge and expertise is higher than it was five years ago, and the value of being able to code has plummeted.

    NOTES

    * It can try, and it can make things that look like high-quality novel prose. It will fool people, especially those eager to be fooled. That is the nub of the issue.

    ** An example: Reid, Anthony. 2019. “A Plural World of Knowledge: Is Southeast Asian Studies Losing its Autonomy?” Kyoto Review of Southeast Asia 25.

    *** I won’t write it down here, because then it will be fixed.