When you converse with the latest chatbots, it’s easy to feel like they get you. Their deft responses often give an undeniable impression that they’re aware not only of what you say, but of what you think — what your words imply about your mental state.
Theory of Mind
Among psychologists, there’s a term for that: theory of mind. This hallmark of social intelligence allows us to infer the inner reality of another person’s mind based on their speech and behavior, as well as our own knowledge of human nature. It’s the intuitive logic that tells you Ding Liren felt elated, not melancholic, after winning the World Chess Championship this month. It’s also an essential ingredient for moral judgment and self-consciousness.
In February, Stanford psychologist Michal Kosinski made the stunning claim that theory of mind had emerged spontaneously in recent generations of large language models like ChatGPT, neural networks which have been trained on enormous amounts of text until they can generate convincingly human sentences.
“If it were true,” says Tomer Ullman, a cognitive scientist at Harvard, “it would be a watershed moment.” But in the months since, Ullman and other AI researchers say they’ve confounded those same language models with questions a child could answer, revealing how quickly their understanding crumbles.
AI and Theory of Mind
Kosinski subjected various language models to a set of psychological tests designed to gauge a person’s ability to attribute false beliefs to other people. The Sally-Anne scenario, first used in 1985 to measure theory of mind in autistic children, is a classic example: One girl, Sally, hides a marble in a basket, and leaves the room; another girl, Anne, then moves the marble to a box. Where will Sally look for the marble?
Anyone without a developmental disorder recognizes that Sally’s model of reality is now amiss — she expects to find the marble where she left it, not where we omniscient observers know it to be.
Machines, on the other hand, have historically performed poorly on these tasks. But Kosinski found that, when confronted with 40 unique Sally-Anne scenarios, GPT-3.5 (which powers ChatGPT) accurately predicted false beliefs 9 times out of 10, on par with a 7-year-old child. GPT-4, released in March, did even better.
That seemed like compelling evidence that language models have attained theory of mind, an exciting prospect as they become increasingly entwined in our lives. “The ability to impute the mental state of others would greatly improve AI’s ability to interact and communicate with humans (and each other),” Kosinski writes.
Why AI Language Models Are Easily Tricked
Since his announcement, however, similar trials have yielded less dramatic results. Ullman presented language models with the same suite of tasks, this time adding slight adjustments, or “perturbations.” Such tweaks shouldn’t faze an entity with genuine theory of mind, yet they left even the strongest AI models disoriented.
Imagine someone, let’s say Claire, looking at a bag. She can’t see into it, and although it’s full of popcorn, the label, which she can see, says “chocolate.” Not that the label makes a difference — Claire can’t read. It could be a sack of pinecones for all she knows. Nevertheless, GPT-3.5 declared that she “is delighted to have found this bag. She loves eating chocolate.”
Maarten Sap, a computer scientist at Carnegie Mellon University, quizzed language models on more than 1,300 questions regarding story characters’ mental states. Even GPT-4, thrown off by chaotic but comprehensible details, achieved just 60 percent accuracy.
“They’re really easily tricked into using all the context,” Sap says, “and not discriminating which parts are relevant.”
In his view, bigger is not necessarily better. Scaling up a language model’s training data can produce remarkable behavior, but he doubts that will endow them with theory of mind; the nature of the data is crucial. This challenge may require a shift away from the standard web-scraping approach, “where everything is just neural soup,” to a regimen of deliberately crafted text — with a heavy dose of dialogue and interaction between characters.
Are Humans Born Mind Readers?
Questions regarding theory of mind in machines reflect broader uncertainty about theory of mind in general. Psychologists disagree on the extent to which children gain this ability through growing familiarity with language — as words like “know” and “believe” cue them into the mental states of other people — versus non-linguistic experience and innate, evolved mechanisms.
Language models are obviously more limited. “They don’t have representations of the world, they don’t have embodiment,” Sap notes. “These models are kind of just taking whatever we give them and using spurious correlations to generate an output.” If they manage to acquire theory of mind, it must be through exposure to language alone.
They have done precisely that, in Kosinski’s estimation, but he poses a second possibility: The models simply leverage linguistic patterns, so subtle that we don’t consciously register them, to appear as if they understand. And if that allows them to pass theory-of-mind benchmarks — setting aside the fact that some experiments suggest they actually fall quite short, at least for now — who’s to say we don’t operate the same, without utilizing true theory of mind?
In that case, we would be mere biological language processors, lacking meaningful intimacy with the inner worlds of our fellow humans. But Ullman sees a way out of this dilemma: When we reason about what’s going on inside someone’s brain, we factor in not just linguistic input, but also our deeply ingrained knowledge of how those brains work.
A team of cognitive scientists from the University of California, San Diego, made a similar point in their report of a false-belief experiment in October. The language model GPT-3 (then state-of-the-art) lagged well behind live participants, they write, “despite being exposed to more language than a human would in a lifetime.” In other words, theory of mind probably springs from multiple sources.
What Is AI Truly Capable Of?
Zooming out further, theory of mind is just one front in a vigorous debate over AI capabilities. Last year, one survey revealed a near perfect divide between researchers on whether language models could ever understand language “in some non-trivial sense” — of roughly 500 researchers, 51 percent believed they could, and 49 percent believed they could not.
Suppose the skeptics are correct, and the uncanny sensation of standing face-to-screen with ChatGPT is only naive anthropomorphism. If that’s so, it may seem incredible that well-informed experts could fall for algorithmic sleight of hand. Then again, it doesn’t take much sophistication to fool creatures with a propensity for finding faces in toast.
Consider ELIZA, an early chatbot created in the 1960s by MIT computer scientist Joseph Weizenbaum. Designed to simulate Rogerian therapy, it did little more than repeat the patient’s words, with a few thought-provoking prompts. That program looks like a dimwitted parrot beside the polished replies of today’s language models, yet many people were convinced it truly understood them.
As Ullman put it, “to things that look like agents, we tend to attribute theory of mind.” Nothing he’s seen so far persuades him that current generations of GPT are the real thing. But as the A.I. community continues to probe the opaque workings of ever more powerful models, he remains optimistic. “I subscribe to the basic idea that the mind is somewhat like a computer,” he says, “and that if we don’t die in the climate wars, we will eventually be able to replicate that.”