People thinking without speaking, part two
A response to critiques of my argument that "scale is not all we need" with LLMs
There are few things more satisfying in life than having one’s ideas taken seriously, at least for me. Two weeks ago, I published an essay about the limits of scaling large-language models that appears to have struck a chord – traffic to this Substack spiked and the essay spurred debate across a variety of platforms. I am grateful to everyone who’s engaged with the ideas I’ve advanced, and so now I want to provide some follow-up commentary in response to the thoughtful pushback I’ve received. (I welcome pushback!)
I’ll start off by clarifying what I am arguing, proceed to a brief error correction, and then address the two main objections that have been raised in response to my claim, as least as I see them.
What I am arguing
Because language is not essential to thought, merely “scaling up” large-language models by adding more data and computing power will be insufficient to develop anything that resembles human-level intelligence. My argument is intended as an explicit response to the claim of many AI enthusiasts – including Sam Altman, Bill Gates, and most recently Leopold Aschenbrenner – who believe “artificial general intelligence” will be achieved based on extrapolating from trendlines in LLM capabilities.
In a bit of fortuitous timing, Arvind Narayanan and Sayash Kapoor at Princeton just published a great essay on this topic that I recommend highly. Their argument dovetails with my own, though they offer different reasons for being skeptical of scaling leading to AGI – among other things, they observe that building bigger and more powerful models may not be economically viable, and already companies are ratcheting down their rhetoric.
But leaving aside the business aspect, my argument is that science strongly suggests this approach will not be fruitful. Leaning on the recent Nature article, I’m arguing that if many of the cognitive capabilities that we associate with general intelligence – including reasoning, abstraction, inference, and generalization – do not depend upon language, then it is highly unlikely that such capabilities will emerge artificially simply by improving a tool that models language. Scaling is not all you need.
Error correction
I made (at least) one factual mistake in my essay when I claimed that the Nature article made no reference to AI – in fact, there is a brief mention of it in a sidebar, noting that the cognitive relationship between linguistic representations and mental computations is an “open question” that large-language models should help us better understand.
More to the point, in subsequent correspondence with Fedorenko, she pointed me to this recent article she co-authored that explores the implications arising from the distinction between language and thought in the context of AI (or machine learning, in their terminology). Key quote: “To those who are looking to improve the state of machine learning systems, we suggest that, instead of, or in addition to, continuously scaling up the models, more promising solutions will come in the form of modular architectures—built-in or emergent—that, like the human brain, integrate language processing with additional systems that carry out perception, reasoning, and action.”
Ok, now on to the counterclaims.
Counterclaim #1: The science I’ve cited is wrong, therefore my conclusion does not follow
The first counterclaim to my argument is not about artificial intelligence per se, but rather a rejection of my foundational premise, that the science summarized in the Nature article accurately reflects what we know about cognition today. Here, comments ranged from invocation of the Sapier-Whorf hypothesis to skepticism about using neuroscientific evidence to make strong claims about cognition, given the cultural influences on how we think.
Science is always an ongoing process. We don’t do arguments from authority around here, but when three scientists from MIT, Harvard and Berkeley publish an extensive review of their field in the world’s leading scientific journal, they are saying plainly “this is the scientific consensus as the evidence sits today.” I have no doubt there will be pushback from other scientists, and I’ll be reading their commentary closely, but it’s hard to get around the empirical fact that when language is taken away from humans, we can still think and reason.
As to using evidence from neuroscience to make claims about cognition, I don’t see much of a tension myself. Social cognition, theory of mind, emotional processing, cultural knowledge – all of these are vital to human cognition, and nothing in the Nature article suggests otherwise. If anything, the authors underscore how language is the civilization enabler for humans, and that our languages share common features so we can more easily communicate with one another. The article is pro cultural cognition, and not limited to just neuroscientific evidence.
Counterclaim #2: I’m wrong to assume large-language models need to emulate human thought to achieve general intelligence
The second objection to my argument is a really interesting one, both philosophically and practically. In a comment to my post, Sreecharan Sankaranarayanan asks, “why do large-language models need to think ‘like humans think?’ They can achieve the same outcomes through different means.” Echoing this, my friend and long-time collaborator Dan Willingham asks: “Doesn't [Ben’s] analysis presuppose that the only way to get to human-like intelligence is to build a model that thinks as humans do? Given that marquee successes of AI with a domain (chess, trivia) don't solve their tasks as humans do, isn't it possible that the same will be true of generalized intelligence?”
I grant that it’s possible. To reiterate, I am not arguing that AGI is an impossible goal, at least according to cognitive science. My contention is simply that we’ll need additional systems or capabilities to be developed if that goal is to be reached. In fact, and as I allude to in my essay, many of the sharpest minds in AI – including Turning Award winners Yann LeCun and Yoshua Bengio, Francois Chollet at Google DeepMind, Melanie Mitchell at the Santa Fe Institute, Subbarao Kambhampati at ASU, and Gary Marcus – are working toward this in various ways, they all seem to think AGI is achievable…but none of them see the scaling up of LLMs as the way to get there. Me either.
Frankly, I think it would (or will?) be profoundly weird if scaling up of LLMs leads to some form of intelligence that feels equivalent to our own. Yes, there’s a tremendous amount of knowledge encoded in our written texts, but there’s so much we know that is not easily reduced to language – and if you doubt this, think about how you learned how to ride a bike. Further, the complex social-cultural components of human cognition are not ones LLMs are well suited to emulate.
I will invoke again the Good Will Hunting metaphor as the right mental model for LLMs – wicked smart when it comes to book learnin,’ but oh so ignorant when it comes to the vast range of meaningful experience.