Despite their apparent intelligence, conversational artificial intelligences often lack logic. The debate rages on: do they reason or do they recite snatches of text memorized on the Internet?
Note
This post was originally published in French as part of my scientific chronicle in Les Echos. I updated it with new references.
Conversational AI, or large language models, are sometimes seen as the gateway to general artificial intelligence. ChatGPT, for example, can answer questions asked at the International Mathematical Olympiad. And yet, on other, seemingly much simpler questions, ChatGPT makes surprising mistakes. What aspects of conversational AI intelligence explain its ability to solve some problems and not others?
Thomas McCoy and co-authors conjecture that it has to do with their underlying model of autoregression: technically, these AIs are trained to complete texts found on the Internet. If an AI is very good at calculating (9/5) x + 32, but not (7/5) x + 31, it is because the first formula corresponds to the conversion of degrees Celsius to Fahrenheit, a very frequent conversion on the Internet, while the second does not correspond to any particular formula. Conversational AIs would therefore be good at reproducing what they’ve already seen. Indeed, numerous studies have shown that they have a certain tendency to reproduce snippets of known text. So, if an AI can solve problems from the International Mathematical Olympiad, is it simply because it has memorized the answer?
Something new?
In terms of intelligence, inventing a new mathematical demonstration requires mastering abstractions and the ability to string together complicated logical reasoning with an imposed start and finish. This seems much more difficult than memorizing a demonstration. This is one of the traditional oppositions in machine learning, the line of research that gave rise to today’s AIs: memorizing is one thing, knowing how to generalize is another. For example, if I memorize all the additions between two numbers smaller than ten, I cannot extrapolate beyond that. To go further, I need to master the logic of addition… or memorize more.
And precisely, conversational AIs have an enormous capacity for memorization, and have been trained on almost the entire Internet. Given a question, they can often dip into their memory to find answers. So, are they intelligent or just have a great memory? Scientists are still debating the importance of memory to their abilities. Some argue that their storage capacity is ultimately limited by the size of the Internet. Others wonder to what extent the impressive successes highlighted are not on tasks already solved on the Internet, questioning their ability to do anything new.
But could memorization be an aspect of intelligence? In 1987, Lenat and Feigenbaum conjectured that, for a cognitive agent, accumulating knowledge enables it to solve new tasks with less learning. Perhaps the intelligence of conversational AI lies in knowing how to pick up the right bits of information, and combine them.
Related academic work:
Embers of autoregression show how large language models are shaped by the problem they are trained to solve, R. Thomas McCoy, Shunyu Yao, Dan Friedman, Mathew D. Hardy, and Thomas L. Griffiths, PNAS 2024 (ArXiv)
Princeton researchers show that properties of large language models (LLMs) are governed by the data that they are trained on, including for they arithmetics abilities.
GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models, Iman Mirzadeh, Keivan Alizadeh Hooman Shahrokhi, Oncel Tuzel, Samy Bengio, Mehrdad Farajtabar
Apple researchers show that LLMs solve mathematical challenge via probabilistic pattern matching on previously seen examples, rather than logical reasonning.