Category: Teaching

  • When AI Does Statistics For Us, Will We Know Any Better?

    There has been a lot of commentary about new and more powerful AI algorithms like ChatGPT, and how they will change how to write college essays, job applications, and so forth. I don’t think that enough people are paying attention to the fact that these AI algorithms can also write code.

    I am not a computer programmer or a coding expert. But I have noticed that ChatGPT can answer questions about statistical programming pretty effectively. I can ask it how to work with ggplot2, for example, and it helps me to avoid silly errors. I had the thought recently that it might also be able to help me write code in a programming language that I don’t actually know, like Python.

    Here is an example that I can imagine happening already. I have asked ChatGPT to write code for me. All I know is that I want to do logistic regression in Python using NumPy. I don’t know how to actually do this, but I typed the following into the prompt: Write code in Python to perform a logistic regression with a dataset named “data” and a dependent variable called “outcome” using numpy

    And here is what I got:

    import numpy as np

    from numPy import logistic

    # Split data into train and test sets

    X_train, X_test, y_train, y_test = train_test_split(data, outcome, test_size=0.2, random_state=0)

    # Fit the logistic regression model

    log_reg = logistic.LogisticRegression(C=1e5)

    log_reg.fit(X_train, y_train)

    # Make predictions on the test set

    y_pred = log_reg.predict(X_test)

    # Compute accuracy

    accuracy = log_reg.score(X_test, y_test)

    print('Accuracy: {}'.format(accuracy))

    A couple of remarks are in order. First, this code will work [EDIT: No it won’t. Needs more things to actually run. So long as you can figure out those,] if you need to do logits in Python, for whatever reason, you can just do them now. Second, this code is commented—it explains what the code is doing. (Automatic commenting is a very interesting computer science problem, one that AI is already being applied to.)

    Third and most importantly, though, this is not the only possible response to the prompt. It is a response that smuggles in a lot of implicit decisions, and even assumptions about the data that you have and the use that you imagine for them. By this I mean, not every logit model has prediction accuracy as the objective. This code, though, presumes that that is your objective.

    I came to this question as part of a conversation with some college friends about the future of the humanities, in response to the New Yorker essay that everyone is talking about. The idea is that people want marketable skills from college. But as this crowd of friends includes both professors like me and computer scientists with decades of professional coding experience, there was a deeper conversation about what sorts of marketable skills will still be marketable over a timespan of more than the next five years or so. What happens to coding-focused majors when computers can do lots of the coding themselves?*

    I’ll conclude with a reflection. On my one serious visit to Silicon Valley, I spent the day mostly drinking free club sodas and flavored kombuchas at a FAAMG headquarters and just talking to people. That was a special weekend for a lot of reasons. But what I remember most from those conversations were the hints that “the singularity” was coming: for them, that was the coding invention that put coders out of business. They used this to explain why their children were getting violin lessons and tutoring in French, which I thought was precious at the time because it reflected a level of privilege and possibility that seemed entirely out of reach for anyone who wasn’t in that part of our new tech ecosystem. Maybe they were right, though, and maybe all of us will need to wrestle with these implications.

    NOTE

    * I would be remiss if I didn’t acknowledge that in some ways, this is just the latest “get off my lawn” complaint about how technology is replacing understanding by automating what used to be done manually. I probably would know more about statistics if I had to use punchcards and code up an optimizer rather than just typing logit y x into Stata.

    I will happily concede this. But fast computers did put most people whose careers depended on punchcards out of business, so the analogy holds.

  • Teaching Innovations in COVID Times: Intro to Stats, Flipped

    COVID has all of us doing things that we’re not used to doing—like not leaving the house for two weeks in a row, or holding meetings in your daughter’s bedroom. In this way, it’s encouraging all of us to innovate. In my case, this means a new PhD-level course, Introduction to Probability and Statistics, and a new way of teaching through a flipped classroom model.

    I’ve never taught stats before, and I’ve never taught using an asynchronous flipped classroom model, so this will be new all around. But I’ve profited from lots of discussion of how to teach statistics, how to make flipped classroom experiences work, and from thinking about my own experience taking Introduction to Statistics with Tasos Kalandrakis back in fall 2001, another unusual time.* I’ve especially learned from Gary King’s GOV 2001, which has been teaching some of the same material to the same sorts of students using a similar flipped classroom format for some time now.** It is entirely possible that a flipped classroom model is the best way to teach introductory probability and statistics.

    For those curious, here is the syllabus (PDF).

    I also feel it necessary to acknowledge all of the materials that were instrumental for helping me to prepare.

    Acknowledging these online notes, part of me says “just go take these courses instead” but maybe my own remix will be fun too. It does include slides such as this, for example.

    But more seriously, what makes my course special, I think, are three things:

    • Assuming only basic mathematical background: nothing beyond basic algebra. This is serious: anyone who can do junior high algebra can complete this course.
    • Teaching R and Stata at the same time: no pen and paper homework, all problem sets done exclusively via the computer. (I also assume no computer science, scripting, or programming background.)
    • An emphasis on developing intuitions via simulation, as a complement to analytical results (which cannot involve anything more advanced than basic algebra, of course).

    The objective here is fundamentals for everyone. We sacrifice some more advanced concepts and results in favor of intuitions and understanding the basics at what I like to call the “no-bullshit” level. If this works out how I hope, then students with no background or particular inclination towards probability or statistics will be able to understand how this stuff works, to consume the quantitative social science research that they encounter, and will be ready for those more advanced courses out there.***

    NOTES

    * As is typical, I did not appreciate at the time just how good that class was, and how hard it must have been to teach.
    ** You can also watch all of his lectures here.
    *** They may also appreciate some dad jokes.