When I started this Substack, I wondered if I’d struggle with finding things to write about. Oops! Not a problem! In fact, I’ve had the opposite challenge, given the constant flood of AI-related content (for good or for ill). I generally try to focus on one topic at a time, but this week it’s a grab-bag.

Bugs—The iNaturalist uprising
Until this past Monday I’d never heard of iNaturalist. “Like Wikipedia, but for snakes and sunflowers,” is how I’d describe it, an online community of outdoorsy types who upload pictures from their hikes and whatnot and then categorize via crowdsourcing what they’ve found. Super cool—I mean that sincerely. What a great way to combine human curiosity with digital technology to create a repository of new knowledge. This is what the Internet once promised.
Which, of course, is exactly why Big Tech is circling, surely hoping to harvest this data to feel the AI beasts of its own making. Two weeks ago, the parent nonprofit organization that manages iNaturalist announced it had received a $1.5 million grant from “Google.org Accelerator: Generative AI” that would “help build tools to improve the identification experience for the iNaturalist community.” Along with the money, iNaturalist would also receive “access to Google staff to advise the iNaturalist team.”
As a former nonprofit CEO myself, I have genuine sympathy for the iNaturalist leadership team thinking this was a nice big win for the organization—philanthropic dollars are always hard to come by, especially nowadays. But it turns out if you tell a bunch of treehugging, frogloving volunteers that the contributions they’ve been making to a trusted community might soon be served up as training data to an AI company, they will object! Repeatedly, eloquently, and powerfully—seriously, scroll through the hundreds of comments here to see what it looks like when a community rises up en masse to say “fuck off” to AI.
I mean, I simply cannot improve upon these three paragraphs:
Whether iNaturalist continues to partner with Google remains to be seen—the organization has already apologized for making its users feel “feel betrayed, disrespected, and without agency”—but however it plays out, I’m deeply inspired by this grassroots resistance to AI.
The question is, why aren’t we seeing this sort of thing in education? As journalist Ketan Joshi aptly observes:
Compare the incredible intensity of the backlash to iNaturalist’s announcement to another recipient in the same funding round: “Day of AI Australia” and the University of New South Wales, both loudly celebrating grants from Google to “build new GenAI tools for students and teachers”. Google Australia said it is “proud to support their efforts to prepare the next generation to not only understand AI, but to actively leverage it for positive change”.
Not only has there been no backlash to this announcement, there has barely been any media coverage at all. Other funding includes a “gen AI-powered tutor” for K-12 students, mental health provider training, training nurses and midwives using a “copilot”, and the provision of chatbot-sourced medical advice in Kenya.
There are extremely clear indications of serious cognitive impacts of relying on a chatbot to replace critical thinking, investigation and reading skills, in addition to their output simply being unreliable because it’s probabilistic rather than the outcome of careful investigation and reasoning. The deep, uncontrolled spread of generative systems in education, medicine and academia should all be cause for much, much more alarm than auto-generated explanations of beetle classifications.
Damn straight. Educators should take a page out of the naturalists’ playbook here. I’m working on something to directly address this—no promises, but also, stay tuned.
Bad Brains (Or rather, bad brain science)
Multiple people have asked for my take on the “Your Brain on ChatGPT” report from something called the MIT Media Lab that’s been pinging around social media (and mainstream media too). You’d think I’d be thrilled to trumpet a study that shows that using generative AI impairs learning and imposes other “cognitive debts,” right? Great to have some new empirical evidence to support what I’ve saying about AI being a tool of cognitive automation, yes?
No. Wrong. Wrong wrong wrong. I’m sorry y’all, this “study” is crappy science—maybe more accurately described as pseudoscience. Red flags here include (a) making the write-up really long and impenetrable so as to give false impression of methodological rigor; (b) including gratuitous and misleading pictures of EEG brain images to make it feel more science-y; (c) advancing overly broad claims based on a very small sample size that got even smaller as people dropped out over the course of the “study”; and (d) using a questionable pedagogical task at the core of their investigation (shout out to Wess Trabelsi for flagging this).
I was gearing up to take this apart in more detail, but luckily I don’t have to—just listen instead to this podcast featuring the brainy couple of Dr. Ashley Juavinett, a neuroscientist at UC San Diego, and Dr. Cat Hicks, a psychologist, wherein they have fun tearing it to shreds. Here’s a snippet from their discussion on the misuse of EEG data, slightly edited for brevity and clarity:
Juavinett: EEG is a measure of brain state, but it’s indirect. And at any given time, you have a lot of different frequencies happening at the same time, because your brain is doing a lot of things at once.
But this paper is doing something not fully accepted in the world of EEG, which is attempting to measure connectivity using EEG. They aren’t actually going in and asking, is this brain region physically connected to this other region? [Instead] it’s asking, are these brain regions kind of correlated…if one is active, is the other active?
Hicks: I’ve seen on social media a lot of posts claiming that brain’s ability to form new connections is damaged. That is not at all what connectivity means, right?
Jauvinett: No, not in this case. They’re using this technique called Granger Causality, which is born out of social science. Take a case where you want to know if the stock market is correlated to something else in the world, you can’t run a causal experiment, so you do Granger Causality to ask, is there enough correlation between these two signals that it seems like one thing is causing the other thing?
That’s ok, but it’s not connectivity. This has nothing to do with synapses. That’s what we really mean when we talk about forming new connections.
Hicks: Yeah, it’s not even measuring people’s learning outcomes. They are saying a lot of thing about cognitive debt, which isn’t a psychology term, it’s just a term they made up for this paper. They are implying there’s a learning loss, but they are not measuring learning loss.
Things go downhill from there. “Garbage claims” is said more than once.
Look, this is simple: Don’t rely on this study. Don’t quote it. Don’t promote it. Rotten research is rotten research, and being an AI skeptic means being skeptical across the board, even when—or especially when—it’s bad science that purports to justify AI skepticism itself.
The Book Pirates of Anthropic
In what seems a lifetime ago, I practiced law in San Francisco, first at a private firm, later as a deputy attorney general for the state of California. And during that time (the mid 2000s), I had several cases in the Northern California federal district court before Judge William Alsup. My recollection: He’s very smart, and as a result had many complex lawsuits steered his way.
He’s also, I learned this week, presiding over current litigation brought by three writers against Anthropic for using their books to train the chatbot Claude. The main legal issue is whether such training is permitted as “fair use” under existing copyright law—on this point, Judge Alsup ruled in Anthropic’s favor, at least when it comes to books that Anthropic acquired legally. But, ahem, Anthropic also pirated seven million books that it used for its “central library” of training materials, and on that front—well, there will be a trial that will assess “resulting damages, actual or statutory (including for willfulness).”
Three quick(ish) comments on this case. First, it always amuses me when the media treats these lower court trial decisions—which are not binding precedent—as “landmark” or “historic” or whatever. In all likelihood this decision will be appealed, and since fair use is a legal question, the judges of the appellate court (here, the Ninth Circuit) will conduct their own analysis and make their own independent determinations regarding what the law requires—cases get overturned, after all. Conversely, if this case isn’t appealed, then some other litigation involving copyright and AI training certainly will be—remember, The New York Times is still suing OpenAI. Basically, despite all the press attention, this decision will ultimately be a bit player in the larger legal drama unfolding.
Second, on the legal merits, so much hinges upon whether we should treat the training of LLMs as different in kind from how humans learn. Here’s a key passage from Alsup’s decision on that front: “Yes, Claude has outputted grammar, composition, and style that the underlying LLM distilled from thousands of works. But if someone were to read all the modern-day classics because of their exceptional expression, memorize them, and then emulate a blend of their best writing, would that violate the Copyright Act? Of course not.”
If the premise were true, perhaps Alsup’s conclusion would follow—but here in the real world, humans can’t memorize all the modern-day classics, or even one classic, for that matter. We remember themes and ideas from books, not the order of each and every word. This is just fundamentally different from how LLMs operate, and I hope lawyers that represent authors and other creatives hammer this point home in future litigation.
Third, regarding Anthropic’s book piracy, this decision provides a glimpse inside how this company operates, and it’s gross. To start, it turns out CEO Dario Amodei authorized the theft of human literature in hopes of avoiding “legal/practice/business slog.” So annoying, that slog we call the rule of law! (Here is where I must remind you that Amodei has decried “constraints from humans” as an impediment to his goal of building Powerful AI that will make us immortal—literally.)
And then things get a little too on the nose, metaphor-for-our-times-wise: Once Anthropic’s leadership realized they might be liable for this obvious theft of intellectual property, they set out to rectify things by buying up hardcopies of books to retroactively digitize. More specifically: “Anthropic spent many millions of dollars to purchase millions of print books, often in used condition. Then, its service providers stripped the books from their bindings, cut their pages to size, and scanned the books into digital form—discarding the paper originals.”
I mean, COME ON. Just imagine being one of Anthropic’s employees—or, more likely, an underpaid subcontractor—tasked with stripping the covers off book after book, razor-blading the pages, feeding them into scanners, and then destroying the husks that remain.
Fascists just burn books; technofascists apparently need to defile them first.
Wonderfully written and nourishing as always — particularly enjoyable somehow when I happen to be doing emails and I can read while the ink is still wet, justifying the distraction as one step closer to the near-mythical inbox zero.
However.
"here in the real world, humans can’t memorize all the modern-day classics, or even one classic, for that matter. We remember themes and ideas from books, not the order of each and every word. This is just fundamentally different from how LLMs operate"
Surely not so! Do you contest that having read every Murakami back to back over a year that the style of my own writing might not fundamentally change? By choice, perhaps, but certainly not by producing the words by rote. I might change my punctuation usage and mood; employ passive voice and increasingly abstract oddities in my turns of phrase. How is this different? The signal that drives these changes comes from assimilating Murakami's semantic maps through his writing in a very similar fashion to how language models do, actually. I can't replicate them verbatim like is often the argument in the 'reproduction half' of this IP debate, our brains are so much more than just 'the semantic assimilation function' that these LLMs represent. No? What am I missing here?
(I'm not drawing a line in the sand with the IP infringement btw, that feels deeply thorny and I can see both sides. Unlike breaking the book bindings and burning the content. That feels deeply revulsive with little to nothing on the other side of the coin.)
“If the premise were true, perhaps Alsup’s conclusion would follow—but here in the real world, humans can’t memorize all the modern-day classics, or even one classic, for that matter. We remember themes and ideas from books, not the order of each and every word. This is just fundamentally different from how LLMs operate, and I hope lawyers that represent authors and other creatives hammer this point home in future litigation. “
Herman Goldstine once wrote of the mathematician and scientist John Von Neumann, “One of his remarkable abilities was his power of absolute recall. As far as I could tell, von Neumann was able on once reading a book or article to quote it back verbatim; moreover, he could do it years later without hesitation. He could also translate it at no diminution in speed from its original language into English. On one occasion I tested his ability by asking him to tell me how A Tale of Two Cities started. Whereupon, without any pause, he immediately began to recite the first chapter and continued until asked to stop after about ten or fifteen minutes.”
If the reason that LLMs are violating fair use laws is that they remember everything perfectly, does that mean that John Von Neumann (or anyone else with an eidetic memory) should not be allowed to read books and use that information to get better at things, like an LLM might?