The cultural transference of language thing is interesting. Although, at first blush, I am not seeing anything in it that precludes there being a UG. More of a belt-and-suspenders thing. Alone, it doesn't seem to explain the robustness of language growth in any given person. Indeed, at least one cultural-based refutation didn't stand up to scrutiny back in the day (https://cognitionandculture.net/wp-content/uploads/nevinsEtAl_07_Piraha-Exce.pdf)
Final thought on Noam: I noticed earlier that you posed Noam as someone who sort of unscientifically dismissed his critics, while simultaneously dinging him for evolving his theories when presented with evidence. Can't have it both ways. ;-)
But, this was a great discussion (at least for me). Thank you!
Nice pointer to the LinkedIn study, I like it! I don't know where you draw the line between evolution and revolution; in my view, everything new idea builds off ideas that came before hand. But the Transformer architecture was definitely a step-change in making language models capable of connecting vasts amount of data -- words, broken into tokens -- in context simultaneously. Combine that with massive data sets and powerful chips that can process simultaneously and lo, you've got tools with human-equivalent competence with the form of language.
I don't disagree with you here (although the date of the citation you mention doesn't predate "the recent advances in AI"...LLMs and transformers go back farther, it's the training based on all the stolen data that is the recent "advance", I think).
And, all of the evolution you mention both in Chomsky's thought and in the critics—those all still exist in Chomsky's sandbox, yes? I mean, NO ONE thinks the infant brain is a tabula rasa, anymore, right?
As an attorney, I am just always sensitive to undermining my own point by using opposing frames. "Functional Competence" (FC) is how the AI/"AGI"-hypesters want to talk about this, because it elevates prep for the job market to the #1 goal of education.
Mixed feelings on this one. I agree w/ any takedown of either Skinner or Sapolsky. Chomaky’s a different category, though.
To me, Noam's position in Linguistics is similar to Keynes's in Economics—they both revolutionized it by cutting through the academic groupthink and posited a self-evident truth. Since the 70s, when the powers-that-be decided to start undermining Education (in every way) each of them became targets for intellectual coups in academe. People started making careers out of throwing shade at them. About every 10 years I read many of the papers that supposedly debunk Chomsky's "universal grammar" and I am never persuaded by them—they always have to insert some intellectual dishonesty to make their point. Same with all neoliberalists trying to say Keynes was wrong.
The problem in each case is that, Chomsky's/Keynes's models work (in practice) and the others' don't. In economics, we hit great recessions and are well on our way to a Depression due to several decades of ignoring Keynes. In that field, there is still a core of experts (Krugman, Yanis, etc) that speak truth. In your field, I would say Bender is among those filling that role.
Take the kid at MIT you cite: Using "functional competency" as a bar for cognition is very similar to what those in the AI-hype camp are doing: they can't validly accomplish what they attempt (understanding, or AGI), so they want to move goalposts by changing definitions. The AI-evangelicals *now* say that we'll achieve AGI when it can do everyone's jobs (never how well those jobs are done, of course). That's just changing the definition.
Same with language. Saying that 'because someone can mimic language, therefore it is competent' is severing language from understanding. When a child achieves the same thing, it is learning/growing because it can use language to check its own work, as it were. Not so with an LLM. Each iteration is 'the first' to it. The Chomsky article you cite (which might've been his last, I think) is of course pretty clearly correct.
I just think it's natural to try to always kick the king of the hill off. In the world of social media replacing the mechanisms of legacy institutions, so-called paradigm shifts no longer occur when a new idea proves its merit, it happens when someone tweaks an algorithm or bankrolls a bunch of influencers...
Thanks for the long and thoughtful reply Mark. Couple of points to ponder:
-- You and others can celebrate Chomsky's contributions to science without committing to his idea of a Universal Grammar. But to say we must believe in UG simply because Chomsky is BFD is not science; in fact it's the opposite of science.
-- I didn't want to go into it in this post, but UG was already taking on heavy water, scientifically, before LLMs came around. The combination of close examination of cultural influences on language development, combined with neuroimagining and other neuroscientific data, indiecated that "language acquisition is powered not by Universal Grammar, but by domain-general processes of sequential learning; processes that use 'statistical' or 'associative' principles to encode information..." Heyes, C. Cognitive Gadgets: The Cultural Evolution of Thinking (2017) p. 185. Note the date on that citation -- this argument predates the recent advances in AI.
-- You say you've read papers refuting Chomsky and not found them convincing, but, well, Chomsky must have, because he's shifted what UG many many times over the years. Originally, he posited that we developed UG via "transformations"; later this became "principles and parameters"; later still, he retreated to a "minimal program" of UG using "merge."
-- The big conceptual confusion I think you're making is that to say LLMs have formal linguistic competence is not, repeat not, to say that they have 'functional' linguistic competence, which would include "understanding" as you're invoking it here. In fact, we have good empirical evidence these two things are distinct -- that's why we get so-called hallucinations, and lots of 'em! But what we don't ever get is LLMs returning text that is grammatically mistaken. That's a remarkable thing. But there's a difference between performance and competence, no doubt.
I stand corrected on the Transformer invention date.
Rhetorical Q, though: Were transformers more of an evolution (a difference in degree) or was it an entirely new technology (a difference in kind)? Seems to be the former... (https://www.linkedin.com/pulse/story-ai-evolution-before-ml-era-transformers-gpt-3-beyond-ghosh/). The foundations for it were laid by Hinton when both of us were still kids, right?
~~
The cultural transference of language thing is interesting. Although, at first blush, I am not seeing anything in it that precludes there being a UG. More of a belt-and-suspenders thing. Alone, it doesn't seem to explain the robustness of language growth in any given person. Indeed, at least one cultural-based refutation didn't stand up to scrutiny back in the day (https://cognitionandculture.net/wp-content/uploads/nevinsEtAl_07_Piraha-Exce.pdf)
Final thought on Noam: I noticed earlier that you posed Noam as someone who sort of unscientifically dismissed his critics, while simultaneously dinging him for evolving his theories when presented with evidence. Can't have it both ways. ;-)
But, this was a great discussion (at least for me). Thank you!
_Mark
Nice pointer to the LinkedIn study, I like it! I don't know where you draw the line between evolution and revolution; in my view, everything new idea builds off ideas that came before hand. But the Transformer architecture was definitely a step-change in making language models capable of connecting vasts amount of data -- words, broken into tokens -- in context simultaneously. Combine that with massive data sets and powerful chips that can process simultaneously and lo, you've got tools with human-equivalent competence with the form of language.
I don't disagree with you here (although the date of the citation you mention doesn't predate "the recent advances in AI"...LLMs and transformers go back farther, it's the training based on all the stolen data that is the recent "advance", I think).
And, all of the evolution you mention both in Chomsky's thought and in the critics—those all still exist in Chomsky's sandbox, yes? I mean, NO ONE thinks the infant brain is a tabula rasa, anymore, right?
As an attorney, I am just always sensitive to undermining my own point by using opposing frames. "Functional Competence" (FC) is how the AI/"AGI"-hypesters want to talk about this, because it elevates prep for the job market to the #1 goal of education.
The paper that described the Transformer architecture came out in 2017 (https://arxiv.org/abs/1706.03762).
And while no one one would contend the infant mind is purely a tabula rasa, around these parts we see *cultural* evolution playing a much larger role in shaping our cognitive architecture (https://www.educationnext.org/cognitive-gadgets-theory-might-change-your-mind-literally/)
Mixed feelings on this one. I agree w/ any takedown of either Skinner or Sapolsky. Chomaky’s a different category, though.
To me, Noam's position in Linguistics is similar to Keynes's in Economics—they both revolutionized it by cutting through the academic groupthink and posited a self-evident truth. Since the 70s, when the powers-that-be decided to start undermining Education (in every way) each of them became targets for intellectual coups in academe. People started making careers out of throwing shade at them. About every 10 years I read many of the papers that supposedly debunk Chomsky's "universal grammar" and I am never persuaded by them—they always have to insert some intellectual dishonesty to make their point. Same with all neoliberalists trying to say Keynes was wrong.
The problem in each case is that, Chomsky's/Keynes's models work (in practice) and the others' don't. In economics, we hit great recessions and are well on our way to a Depression due to several decades of ignoring Keynes. In that field, there is still a core of experts (Krugman, Yanis, etc) that speak truth. In your field, I would say Bender is among those filling that role.
Take the kid at MIT you cite: Using "functional competency" as a bar for cognition is very similar to what those in the AI-hype camp are doing: they can't validly accomplish what they attempt (understanding, or AGI), so they want to move goalposts by changing definitions. The AI-evangelicals *now* say that we'll achieve AGI when it can do everyone's jobs (never how well those jobs are done, of course). That's just changing the definition.
Same with language. Saying that 'because someone can mimic language, therefore it is competent' is severing language from understanding. When a child achieves the same thing, it is learning/growing because it can use language to check its own work, as it were. Not so with an LLM. Each iteration is 'the first' to it. The Chomsky article you cite (which might've been his last, I think) is of course pretty clearly correct.
I just think it's natural to try to always kick the king of the hill off. In the world of social media replacing the mechanisms of legacy institutions, so-called paradigm shifts no longer occur when a new idea proves its merit, it happens when someone tweaks an algorithm or bankrolls a bunch of influencers...
Thanks for the long and thoughtful reply Mark. Couple of points to ponder:
-- You and others can celebrate Chomsky's contributions to science without committing to his idea of a Universal Grammar. But to say we must believe in UG simply because Chomsky is BFD is not science; in fact it's the opposite of science.
-- I didn't want to go into it in this post, but UG was already taking on heavy water, scientifically, before LLMs came around. The combination of close examination of cultural influences on language development, combined with neuroimagining and other neuroscientific data, indiecated that "language acquisition is powered not by Universal Grammar, but by domain-general processes of sequential learning; processes that use 'statistical' or 'associative' principles to encode information..." Heyes, C. Cognitive Gadgets: The Cultural Evolution of Thinking (2017) p. 185. Note the date on that citation -- this argument predates the recent advances in AI.
-- You say you've read papers refuting Chomsky and not found them convincing, but, well, Chomsky must have, because he's shifted what UG many many times over the years. Originally, he posited that we developed UG via "transformations"; later this became "principles and parameters"; later still, he retreated to a "minimal program" of UG using "merge."
-- The big conceptual confusion I think you're making is that to say LLMs have formal linguistic competence is not, repeat not, to say that they have 'functional' linguistic competence, which would include "understanding" as you're invoking it here. In fact, we have good empirical evidence these two things are distinct -- that's why we get so-called hallucinations, and lots of 'em! But what we don't ever get is LLMs returning text that is grammatically mistaken. That's a remarkable thing. But there's a difference between performance and competence, no doubt.