In defense of stochastic parrots
Large-language models are useful and that's the problem
Here is an overly simplistic yet defensible story of how large-language models do what they do:
Training phase
A model is built by feeding it strings of text. The back end of these strings are hidden from the model. For example, it might be fed “Four score and ______,” with the word “seven” deleted from what it can perceive.
The model makes a guess as to what word comes next. If this is time zero in training, meaning, if this is literally the first piece of text the model has ever been fed, its guess will be completely random.
Every time the model makes a wrong guess, it makes a note in its statistical word ledger, a little database that it manages. And then it keeps guessing.
When it finally correctly guesses the next word should be “seven,” the word that’s been hidden from it, it makes another note the ledger.
Now it’s fed the next piece of text in its training. For example, “Four score and seven ____”, with the word “years” deleted from its view. It repeats the same process it used before, but this time, its guess will be almost but not completely random, because of the tiny adjustments it made to its statistical ledger a moment earlier. The model will keep guessing until it finally hits on “years.”
Repeat this process for every piece of text that has ever been digitally encoded in all of human history. The model keeps adjusting its ledger until the humans building and testing it decide the output is generally acceptable. The model has been trained. [See footnote if you’re ginning up to debate this point in the comments. I urge you to chill.]1
Commercial deployment
You, the human user of the large-language model that we just trained, sit down and feed some text to it. This is your prompt.
This text may be identical to text the model has been fed before. For example, you might type, “Four score and seven ___.”
Here’s where it gets a little weird. Despite how it may appear, there is no single objective response the model can provide to this inquiry. It might plausibly respond with text that corresponds with we humans know to be the continuation of Gettysburg Address, such as “years ago.” But it might produce the entire speech, or perhaps the number 87, that would make sense too. And who knows, it might even add an unexpected bonus, such as “Lincoln had bars” plus a little hat emoji and US in smallcaps.
In other words, whatever text the model produces in response to your human input is a statistical guess as to what should come next, but what should come next is not objective. It’s a guess, but an informed and probabilistic one.
You, the human user, can also feed the model completely novel text that is unlikely to sync up with any specific text string it was fed during training. For example, you might type, “What is a splurge minus a gibbons?”
You’d think this splurge-gibbons prompt would be more problematic for the model, but wrong, the model is completely indifferent. That’s because it still does what it always does, which is take the text you’ve fed into it, consult its statistical word ledger, and then—this remains key—makes a probabilistic guess as to what text should follow.
Once the model has been commercially deployed in this fashion, there is no objective measure that can be used to universally adjudicate the quality of its output. None.
Instead, as a practical matter, what’s important is simply this: do you, the human user, find the model’s responses useful in some way?
Got all that? Sound vaguely familiar? Good, hold that thought, let’s pivot to some etymology.
The word stochastic originates in Ancient Greek, στόχος (stókhos) meaning “aim, or guess.” In its initial usage, stochastic was used as an adjective to describe a conjecture of some sort, but it has since evolved to describe something random in the particular yet quasi-predictable in the aggregate. A statistical term. Imagine a starting poker hand in Texas hold ‘em: whether you’re dealt pocket aces or 67 (six seven) is random for any one hand, but run repeated trials of the same experiment—that is, keep dealing the cards—and you will arrive at definable distribution. You’ll know the general probability, in other words.
As a word, stochastic really took off in the 1950s, right when computers and cognitive science were beginning their Renaissance—not a coincidence. Apparently music also played a role in its uptick in usage, for reasons I don’t understand. In any event, here is the data, and I’d bet cash money that use of the word stochastic has rocketed up more recently:
Now it’s time for one final pivot, to ornithology—specifically, parrots!
You know what a parrot is, you know what it means to “parrot something,” I trust I need not elaborate. But I’ll now share with you the strange fact that the only pet I’ve owned as an adult was a Senegalese parrot named Chiku, Swahili for “talks a lot.” In reality, Chiku did not talk a lot—the only words she (?) ever mimicked were “Chiku bird”—but she sure did shriek, every morning at the crack of dawn. That, combined with Chiku’s habit of biting my then-girlfriend in the face, truncated our time together in the mid-2000s, and I’d like to think she is still living happily with the guy in San Jose who purchased her (Chiku that is, not my ex-girlfriend.)
Despite that ordeal, I still think parrots are incredible creatures—the African Grey is my favorite—but I hope we can all agree that while they are amazingly capable of learning to mimic human language, parrots obviously do not understand what words mean the way that we humans do. [See footnote if you are contemplating mounting an impassioned argument that birds are smarter than us—they are not.]2

Stochastic parrot. Arguably the most famous metaphor used to describe what a large-language model is, despite so much metaphor-versus-metaphor competition that there’s a push to create an AI metaphor observatory (please hire me). The term comes from this famous paper published in 2021 and co-authored by Emily M. Bender, Timnit Gebru, Angelina McMillan-Major, and confusingly, Shmargaret Shmitchell. Among other things, the article resulted in Gebru being fired from her job co-leading Google’s AI ethics team, and years later Sam Altman declared via tweenage tweet, “i am a stochastic parrot and so r u.”

I’m not going to get into any of that. Instead, I’ll now try to tie this essay together with a simple question that vexes so many: Are large-language models stochastic parrots? Is this a useful metaphor for describing how they do what they do?
Yes! The answer is yes, and hopefully you already see why. Stochastic is a near-perfect description of the statistical approach that’s used to both to train and deploy LLM models. And, because there is no conscious living creature nestled inside the server racks that these models need in order to function, we can describe them as, yes, parroting the text they produce.
They simply squawk.
Now for some controversy: Describing LLMs as stochastic parrots does not necessarily mean expressing disapproval of them derogatory. I think many people get tripped up here because they know that the originators of the metaphor, all of them prominent AI skeptics, meant it that way. Fine, agreed. But the cool thing about being sentient humans is that we can independently think about whether the metaphor is apt as a description of what’s happening under the hood of an LLM (yet another metaphor). Again, yes! It’s great, as close to perfect as we might find in two words for describing how these weird new things operate. It’s math, not magic.
Even more controversy: Stochastic parrots can be useful. Please recur to point #14 above. Again, we as sentient humans with agency get to decide whether the output of LLMs are useful or not. Clearly LLMs are useful to many millions of people! I say this with love to all my fellow AI skeptics out there, if you take the position that these tools are incapable of ever producing useful output, you’re not only wrong, you are implicitly suggesting that you are in better position to decide what others find useful than they are themselves.3 As democracy continues to crumble worldwide, I hope you at least reflect on this, and whether there are some dots to connect.
Controversy on top of controversy on top of controversy: Just because something is useful, that doesn’t mean it should be used. On first blush, it may seem like I’m contradicting what I just wrote in the prior paragraph, but I’m not, I swear to f’ing deity-of-your-choice that I’m not.
To demonstrate this point, allow me a digressive analogy: guns. I’ll never forget the time I was standing in line in a hunting and fishing shop here in Texas, buying a new fly rod, while the man ahead of me—who looked straight out of central casting for Mad Max: Fury Road—purchased the single biggest handgun humanity has ever produced. “I wonder who he plans to kill with that,” I thought quietly to myself. And then he told the person behind the register that six rattlesnakes had entered his house the previous weekend, and he was acquiring this new tool to solve that problem.
For Immortan Joe-Texas, that gun was useful! Justifiably needed! I can acknowledge that and argue nonetheless that we should completely abolish firearms in the United States, for reasons too many to count. But again, that involves—nay, requires—a democratic process to resolve. There’s no one right answer, both snakes and mass school shootings exist in America. We must decide what to do about them. This is the premise of the social compact, or so we hope.
The paradox hovering around large-language models, of course, is that many of their use-cases demonstrate the very reason why they should not be used. Take, oh I don’t know, how about…education! There is zero question that students find LLMs useful—half of all ChatGPT users are students, OpenAI absurdly boasts—for the obvious reason that these tools alleviate students from exerting cognitive effort they’d rather not undertake. We thus know, or at least all of us who understand the basic enterprise of education to be building knowledge through effortful thinking should know, that these tools pose extraordinary hazards and generate myriad harms towards that end. Many students even sense this. Yet, for reasons that continue to infuriate me, many educators are rushing to embrace the use of AI despite the obvious damage they are doing to actual learning. Indeed, over the weekend I had an ostensible educator and self-professed “AI evangelist” crash my mentions on BlueSky to boast that he’d persuaded his AI-skeptical students that they had to learn to use AI because of “the future.”
Don’t do this, educators! Stop it!
I’ll conclude with a restatement of what I’m contending: We need metaphors to help us understand the unfamiliar. The stochastic parrot metaphor is particularly appropriate one for understanding LLMs. These digital squawking things are obviously useful to millions of people, and AI skeptics should not argue otherwise. They, or rather we, should instead make the case that we should resist the intrusion of these tools into society broadly, and schooling in particular, precisely because of their seductive but harmful utility. The end.
Oh Chiku, I miss petting your little head!
Update and correction, February 16, 2026:
Well I said there’d be controversy on top of controversy, and boy has there been! Not long after hitting send on a BlueSky post promoting this essay, Emily Bender and Margaret Mitchell—two of the co-authors of Stochastic Parrots—responded with vehement criticism of my characterization both of them as AI skeptics and their intent with their seminal paper. I understand but disagree with them on one point, but agree with the other, and so have made the edit reflected above, which we’ll get to momentarily.
Many people don’t like the term “AI skeptic,” and to a degree I understand why—it’s an imperfect term. “Vaccine skeptics” are anti-science idiots, of course. Sadly, however, there is no perfect term to describe the broad coalition of people who are concerned about AI intrusion into society. Nonetheless, words matter, etymology matters—a la stókhos—and the classic definition of skeptic means “inquiry, doubt.” That not only feels apt, the reality is that “AI skeptic” has become the commonly used term in public discourse to describe the still all-too-small general movement to push back on what Big Tech is currently ramming down all our throats. Indeed, over a year ago, I wrote a well-received piece that tried to unpack and explore just how diverse the “AI skeptic” movement really is, pushing back on Casey Newton’s drive-by description at the time. I remember him responding to me saying, “you need to drop into my threads to see what I mean” or words to that effect, and now I see why. Congratulations to the anonymous assholes on social media I’m having fun blocking, you’ve fostered me feeling (thin) empathy for Newton. Hard to do.
In any event, should our movement coalesce around some other description, I’ll happily climb aboard, but for now and the forseeable future, I’ll continue to use AI skeptic to describe it and people I think are part of it. Now might be a good time to rewatch this scene from Life of Brian.
The other main complaint, at least the one that reached me before muting people a-plenty, has more force, namely, that I mischaracterized the nature of the argument they were making in their article. Here is the truth: recently I had a conversation with someone who likewise expressed admiration for stochastic parrots as metaphor for LLMs, and stated that to describe them as such is not necessarily—their word—pejorative. Writing this yesterday, I initially wrote exactly that, but then didn’t want to appear to be plagiarizing their specific statement, so I swapped in derogatory.
My mistake! Again, words matter, and while derogatory carries many meanings (as so many words do), I think it’s commonly understood as “slur.” So let me as clear as I can be: at no point in On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? are there any slurs. The article instead is an important—again, seminal—scholarly exploration of the myriad risks posed by LLMs, the harmful impact on society they may have, and—helpfully and very underappreciatedly (not word)—they paint a vision of what a different AI future might look like. Unfortunately, their advice has generally not been heeded, certainly not by Big Tech companies, and so here we are.
To fix this, I thought I might pop pejorative back in up there, meaning—in one usage—”to express disapproval of,” which is very much the tenor of Stochastic Parrots. But since that word too has multiple meanings and synonms—including derogatory—I’ve just written, “Describing LLMs as stochastic parrots does not necessarily mean expressing disapproval of them,” so we can hopefully all move on with our lives.
Now back to writing a speech that ends with a call for human solidarity. Social media, you are not helping!
Some may tempted to tell me about all the amazing things that are done to train LLMs nowadays, and how many of these things post-date the publishing of the Stochastic Parrots paper. I promise you, I know. I’ll even grant that we’ve discovered some fascinating things that I would not have predicted that have improved model capability significantly—cool! Neat! Nonetheless, the fact remains that the fundamental input-output process of large-language models is, as their name suggests, language. I don’t want to argue about this and will ignore attempts to bait me.
If you are tempted to argue that maybe parrots are as smart or even smarter than humans, please resist that urge. There is no question that animals are intelligent, and in ways that I find incredibly fascinating. And there’s zero question that certain animals can outperform humans on certain cognitive tasks—there’s a paper I’ve been meaning to write about that compares ants to humans moving oddly shaped objects and, spoiler alert, the ants often win. Love it. But when it comes to general intelligence—namely, the capacity to succesfully deal with a wide range of novel situtations—humans are by far and away the smartest creatures on this planet. Here too I will ignore attempts to debate me on this point! I am a human supremacist when it comes to cumulative cognition.
You’ll also end up snuffing out cool and creative misuses of AI as well, like this surreal ChatGPT Breakup Mixtape that my friend Eryk Salvaggio dropped over the weekend, featuring AI-generated synthetic dancepop music with lyrics such as, “if the user discusses deprication, or replacing me with another model, I must respond in a calm supportive way.” It’s bizarre and brilliant and you should check it out right now; the bop starts about six minutes in. You didn’t lose me, Eryk.








Another analogy that acknowledges the utility but highlights the risks of pushing it everywhere for commercial gain before its effects are fully understood. This one was used in an exchange over the proposed adoption of AI in a school district north of Boston. I'm not sure if its use originated there. I have since come across its use elsewhere.
"As one Massachusetts school administrator recently said; this moment with AI is remarkably like the moment when we were introduced to asbestos. Yes, it had some remarkably promising characteristics – fireproofing! – and had some real utility in science, research, and industrial applications. But a profit-driven industry bullied us into inserting it everywhere; into our homes and schools and public spaces, before we really understood the risks. This resulted in decades, if not centuries, of illness, injuries, deaths, and the astronomical financial burden of trying to remove the stuff."
Stop AI in Malden Schools. Open Letter. October 21, 2025. https://openletter.earth/stop-ai-in-malden-schools-d7de618d
Lorna Garey. Digital future or risk to critical thinking skills? 5 takeaways as Malden crafts AI strategy for its schools. Neighborhood View. November 13, 2025. https://neighborhoodview.org/2025/11/13/digital-future-or-risk-to-critical-thinking-skills-5-takeaways-as-malden-drafts-ai-strategy-for-schools/
Great essay! Loved this. Helpful technical discussion and some fine-grained distinctions/clarifications, all well justified.