Dave Karpf had a short take on Open AI on his blog and in The Atlantic earlier this month, which he summarized as " the business model of OpenAI isn’t actually ChatGPT as a product. It’s stories about what ChatGPT might one day become. And, if you read Altman’s “The Intelligence Age” closely, what really stands out is how fantastical the stories really are.
Sounds like Amodei made a play to make sure OpenAI doesn't corner the market in fantastical stories about AGI. It does raise the question of how they pay back their investors, but at least Anthropic is still technically a non-profit.
That's again well said, Rob. Anthropic has fascinated me because while they have at least pretended to take the risks of this technology seriously, there's also been this very cult-like feel to how they operate (OpenAI too but Anthropic more so). I'm starting to realize that while AI snake oil is a major problem, what's really upsetting me is the utopian fantasies that seem utterly devoid of any historical awareness of how social engineering has gone horrifically wrong.
And all of these orgs are non-profit in name only, that's another scandal for another post.
I don't, but I'm curious myself. We had a brief exchange a few months ago that was somewhat telling, insofar as he seems to be suggesting we might trust AI more than humans because AI such as Claude "can bring more transparency" than human reviewers. I noted that Anthropic doesn't disclose the data Claude is trained on, among other things.
What are the basics of the training data debate? I'd always thought the companies wouldn't admit to scraping pretty much everything, including a lot of copywritten stuff (example NYT). They view it, rightly or wrongly, like Uber ("hey we're going to start a taxi company but we'll call it ride sharing because there are stupid rules against starting a taxi company.") I guess my question is - what else could they plausibly "train on?"
Well, the debate with Neerav was simply that to claim LLMs are more trustworthy than humans while also refusing to disclose what data has been used to train them is, um, interesting. The larger debate is whether these companies should be forced to disclose what they train on; policy intiatives and lawsuits a' plenty are underway to address that question. And then there's an even bigger LLM-existential question about whether they can be trained on "synthetic data" created by LLMs themselves, or whether it'll ultimately be better to train them on *less* data that is more specific to particular problems someone wants to solve.
My team at National Library NZ watch the bots from multiple AI tech firms come through our firewall daily, to scrape catalogues and collections that are digitised and funded by taxpayers.
We can block these bots - but there are consequences, such as when the corporation that serves up the planet's "best" search, also wants to feed it's LLM.
I'm fascinated about the enduring value of an "open access" philosophy, in a world in which the scale of data harvesting to feed AI engines ultimately costs cultural heritage institutions whose work is publicly funded. As institutions we are not designed to serve the exponential scale of requests that keep hitting the publicly funded infrastructure that host those collections.
In this world I wrestle with what is the role of gatekeeping and safeguarding collective knowledge and memory judiciously - and making disciplined choices to do so. If we do not, the public good of that knowledge, has the potential to become privatised and sold back to the same public within the envelope of a chatbot or augmented transcription, service or system.
I'm struck by how the default position that the individual is the pinnacle of the totality of the shared human construct - is fundamental to the value proposition of AI.
I always reflected that back in the halcyon days of personalised learning - how that whole construct was built on every student having a single device to call their own. And how we as educators all failed to grasp which players in that game benefited the most from that being the default. In fact we doubled down on that default and placed multiple orders of iPads and Chromebooks to ensure equity... more fool us I guess.
So we design and build for isolation and celebrate the tools abilities to make a difference for the singular user. And wonder why a market exists for managing anxiety, loneliness and desperation in a generation of young people bombarded by all that clicks and diverts their attention.
Of course we know it's when connections fail, so do communities. But we can't seem to unset our defaults can we?
To quote DFW: "That is real freedom. That is being educated, and understanding how to think. The alternative is unconsciousness, the default setting, the rat race, the constant gnawing sense of having had, and lost, some infinite thing.
....
It is about the real value of a real education, which has almost nothing to do with knowledge, and everything to do with simple awareness; awareness of what is so real and essential, so hidden in plain sight all around us, all the time, that we have to keep reminding ourselves over and over:
“This is water.”
“This is water.”
It is unimaginably hard to do this, to stay conscious and alive in the adult world day in and day out. "
Apologies. This probably should have been a blog post.
The eugenics angle is important, and ever present in algorithmic bias and accelerationism. Thanks for drawing the connecting lines. I also felt compelled to waste a lot of words on this nonsense manifesto, if you're interested: https://deeplywrong.substack.com/p/how-i-learned-to-stop-worrying-and
If the powerful AI that Amodei describes somehow becomes available, which as you note may well be very unlikely, how would you recommend we respond? Would you advocate for the technology to be shelved and all research to be paused? Even if it could help us cure horrible diseases that cause immense pain and suffering?
I think if you're asking this question then I failed to make one of my main points, which is that speculative musings about hypothetical futures is the stuff of science fiction.
I also find it interesting that these AI prophets also seem to turn to economics and capitalism while ignoring ecology or climatology. They constrain their views with economics and eugenics at the expense of much broader fields of vision.
It’s time to call out Silicon Valley’s CEOs lost in their own thoughts.
While a year ago it was okay to engage in wishful thinking, today there’s plenty of papers showing the reality.
AI is a long way from AGI. OpenAI is to blame and the industry for not pushing back against hype and straight up lies.
It’s time for a reality check before we put AI in places where it wasn’t designed for. The original design was around natural language not this narrative of all knowing oracle that will take your job.
Dave Karpf had a short take on Open AI on his blog and in The Atlantic earlier this month, which he summarized as " the business model of OpenAI isn’t actually ChatGPT as a product. It’s stories about what ChatGPT might one day become. And, if you read Altman’s “The Intelligence Age” closely, what really stands out is how fantastical the stories really are.
Sounds like Amodei made a play to make sure OpenAI doesn't corner the market in fantastical stories about AGI. It does raise the question of how they pay back their investors, but at least Anthropic is still technically a non-profit.
That's again well said, Rob. Anthropic has fascinated me because while they have at least pretended to take the risks of this technology seriously, there's also been this very cult-like feel to how they operate (OpenAI too but Anthropic more so). I'm starting to realize that while AI snake oil is a major problem, what's really upsetting me is the utopian fantasies that seem utterly devoid of any historical awareness of how social engineering has gone horrifically wrong.
And all of these orgs are non-profit in name only, that's another scandal for another post.
Curious, do you happen to know Neerav's take? He's a prominent thread connecting AI and ed reform.
I don't, but I'm curious myself. We had a brief exchange a few months ago that was somewhat telling, insofar as he seems to be suggesting we might trust AI more than humans because AI such as Claude "can bring more transparency" than human reviewers. I noted that Anthropic doesn't disclose the data Claude is trained on, among other things.
https://x.com/benjaminjriley/status/1827726458439045465
What are the basics of the training data debate? I'd always thought the companies wouldn't admit to scraping pretty much everything, including a lot of copywritten stuff (example NYT). They view it, rightly or wrongly, like Uber ("hey we're going to start a taxi company but we'll call it ride sharing because there are stupid rules against starting a taxi company.") I guess my question is - what else could they plausibly "train on?"
Well, the debate with Neerav was simply that to claim LLMs are more trustworthy than humans while also refusing to disclose what data has been used to train them is, um, interesting. The larger debate is whether these companies should be forced to disclose what they train on; policy intiatives and lawsuits a' plenty are underway to address that question. And then there's an even bigger LLM-existential question about whether they can be trained on "synthetic data" created by LLMs themselves, or whether it'll ultimately be better to train them on *less* data that is more specific to particular problems someone wants to solve.
My team at National Library NZ watch the bots from multiple AI tech firms come through our firewall daily, to scrape catalogues and collections that are digitised and funded by taxpayers.
We can block these bots - but there are consequences, such as when the corporation that serves up the planet's "best" search, also wants to feed it's LLM.
I'm fascinated about the enduring value of an "open access" philosophy, in a world in which the scale of data harvesting to feed AI engines ultimately costs cultural heritage institutions whose work is publicly funded. As institutions we are not designed to serve the exponential scale of requests that keep hitting the publicly funded infrastructure that host those collections.
In this world I wrestle with what is the role of gatekeeping and safeguarding collective knowledge and memory judiciously - and making disciplined choices to do so. If we do not, the public good of that knowledge, has the potential to become privatised and sold back to the same public within the envelope of a chatbot or augmented transcription, service or system.
I'm struck by how the default position that the individual is the pinnacle of the totality of the shared human construct - is fundamental to the value proposition of AI.
I always reflected that back in the halcyon days of personalised learning - how that whole construct was built on every student having a single device to call their own. And how we as educators all failed to grasp which players in that game benefited the most from that being the default. In fact we doubled down on that default and placed multiple orders of iPads and Chromebooks to ensure equity... more fool us I guess.
So we design and build for isolation and celebrate the tools abilities to make a difference for the singular user. And wonder why a market exists for managing anxiety, loneliness and desperation in a generation of young people bombarded by all that clicks and diverts their attention.
Of course we know it's when connections fail, so do communities. But we can't seem to unset our defaults can we?
To quote DFW: "That is real freedom. That is being educated, and understanding how to think. The alternative is unconsciousness, the default setting, the rat race, the constant gnawing sense of having had, and lost, some infinite thing.
....
It is about the real value of a real education, which has almost nothing to do with knowledge, and everything to do with simple awareness; awareness of what is so real and essential, so hidden in plain sight all around us, all the time, that we have to keep reminding ourselves over and over:
“This is water.”
“This is water.”
It is unimaginably hard to do this, to stay conscious and alive in the adult world day in and day out. "
Apologies. This probably should have been a blog post.
No apology necessary Tim, this is heartfelt and profound. “So we design and build for isolation.” Haunting.
The eugenics angle is important, and ever present in algorithmic bias and accelerationism. Thanks for drawing the connecting lines. I also felt compelled to waste a lot of words on this nonsense manifesto, if you're interested: https://deeplywrong.substack.com/p/how-i-learned-to-stop-worrying-and
There’s this wonderful essay written by Ruha Benjamin that reminded me of what much you wrote! https://lareviewofbooks.org/article/the-new-artificial-intelligentsia/
If the powerful AI that Amodei describes somehow becomes available, which as you note may well be very unlikely, how would you recommend we respond? Would you advocate for the technology to be shelved and all research to be paused? Even if it could help us cure horrible diseases that cause immense pain and suffering?
I think if you're asking this question then I failed to make one of my main points, which is that speculative musings about hypothetical futures is the stuff of science fiction.
I also find it interesting that these AI prophets also seem to turn to economics and capitalism while ignoring ecology or climatology. They constrain their views with economics and eugenics at the expense of much broader fields of vision.
It’s time to call out Silicon Valley’s CEOs lost in their own thoughts.
While a year ago it was okay to engage in wishful thinking, today there’s plenty of papers showing the reality.
AI is a long way from AGI. OpenAI is to blame and the industry for not pushing back against hype and straight up lies.
It’s time for a reality check before we put AI in places where it wasn’t designed for. The original design was around natural language not this narrative of all knowing oracle that will take your job.