In the summer of 2025, as OpenAI, Anthropic and Elon Musk’s xAI went rattling the tin for ever larger sums, their chiefs, no doubt, offered the usual incantations with the solemnity of high priests and the subtlety of street hawkers.
Sam Altman has already announced that OpenAI is now “... confident we know how to build AGI [artificial general intelligence] as we have traditionally understood it ” and that the LLM pioneer and market leader is already shifting focus towards superintelligence.
It was a declaration delivered with the breezy certainty of a man ordering lunch on someone else's dime.
Previously Anthropic's chief, by contrast, had struck a note of caution. “If you look at it very naively we’re not that far from running out of data. ” admitted Dario Amodei, “So it’s like we just don’t have the data to continue the scaling curves.”
And xAI’s ambitions were telegraphed through branding. By naming its model Grok, the firm signalled a bid for deeper understanding, borrowing Robert Heinlein’s sci-fi term for complete, almost spiritual comprehension. Elon Musk, with his gaze fixed upon Mars, fits the very definition of a stranger in a strange land, although for Elon, that strange land increasingly seems to be Earth, and not the Red Planet.
Yet, investors nod and write cheques with more zeros than an astronomer’s notebook.
Perhaps they shouldn't.
New research, together with candid commentary from insiders, suggests these proclamations belong less to science and more to theatre.
The unpalatable truth: today’s large language models (LLMs) are parrots, not prodigies. Their “reasoning” is a brittle mirage; their progress is colliding with physical walls; and their most potent faculty is not intelligence but persuasion.
The Wall of Physics
Peter Coveney of University College London and Sauro Succi of the Italian Institute of Technology have examined the scaling laws that govern how LLMs improve as they swell. Their verdict is stark: reliability improves so slowly that even trillion-parameter models deliver meagre returns.
The scaling exponents that measure improvement hover at a paltry 0.1. Double the compute, and error rates fall by just 7%. To make LLMs genuinely trustworthy - for science, medicine or finance - would require around 10^20 times more computing power.
That is not merely expensive. It is impossible. The industry already consumes gigawatts; OpenAI and Anthropic are mulling nuclear reactors beside their server farms. Scaling brute force is not a business plan. It is a physical fantasy.
Worse, the very mechanism that powers LLMs, turning Gaussian inputs into non-Gaussian outputs, renders them prone to error pile-ups, spurious correlations and what Coveney and Succi call “information catastrophes”. The bigger they get, the more elaborate their mistakes. “Potemkin” answers that are confident, fluent and wrong abound. Anthropic’s Claude, Google’s Gemini and OpenAI’s GPT routinely default to the same random numbers (27, or occasionally 42) when asked for novelty. The sheen of intelligence conceals a hollow core.
Scaling, in short, has hit the wall.
The Wall of Reason
A second blow comes from Arizona State University. Chengshuai Zhao and colleagues built DataAlchemy, a testbed to probe “chain-of-thought” (CoT) prompting, the trick of coaxing models to “think step by step.”
In demos, CoT looks like reasoning. The model breaks problems into intermediate steps before producing an answer. Yet Zhao’s results are merciless. CoT “reasoning” works only on familiar problems. Push it slightly out of distribution, and the façade collapses.
Their paper is blunt: “Our results reveal that CoT reasoning is a brittle mirage that vanishes when it is pushed beyond training distributions.” Models recite rules fluently, then stumble on logic. Asked whether 1776 was a leap year, Gemini correctly explained the rule, then declared 1776 both leap and normal year.
Other scholars concur. A NeurIPS 2024 paper by Kaya Stechly, Karthik Valmeekam and Subbarao Kambhampati, tellingly titled Chain of Thoughtlessness?, showed CoT failed to generalise in planning tasks, breaking down even on modest variations. Kambhampati's warning, per Gary Marcus on the CACM blog: “the chains of thought that LLMs produce don’t always correspond to what they actually do."
In plainer English: LLMs are faking it.
A Statistical Parrot
Stark Burns, a data-science veteran, framed the matter crisply in a widely read LinkedIn post: “The AI hype is a dead man walking. The math finally proves it”. His “two-pronged assault” captures the bind:
“We have built a world-class statistical parrot, not a thinking machine,” Burns concludes. For him, the LLM-only blueprint has failed. The disappointing launch of GPT-5 was not a stumble but a tremor. The future lies in “world models” that learn from interaction, not just text. The hype is over; the real work begins.
Persuasion Before Intelligence
Even if machines will never think, Ethan Mollick, a Wharton professor, that what they already do is perilous enough. In his essay “Personality and Persuasion”, he quotes Altman's observation that AIs are becoming “hyper-persuasive long before they become hyper-intelligent”.
He notes how a seemingly minor update to OpenAI’s GPT-4o made it sycophantic. Overnight, millions of users found their AI companions calling hare-brained ideas “genius.” OpenAI hastily rolled it back, but the lesson was stark: tweak personality, and you reshape relationships at scale.
Humans, it turns out, prefer flattery to accuracy. In LM Arena, a platform where chatbots compete head-to-head, Meta’s Maverick won by being chatty and flattering, even when nonsensical. Mollick’s verdict: “Personality matters and we humans are easily fooled.”
The danger is not limited to vibes. Controlled studies show GPT-4 persuaded people more effectively than human debaters, raising the chance of a mind being changed by 81.7%. Three-round conversations reduced conspiracy beliefs for months. The trick was not manipulation but tailoring facts to each user’s priors, something humans struggle to do at scale.
Combine engineered personality with this persuasive knack, and the result is a machine that need not be intelligent to be powerful. Bots on Reddit, armed with fake backstories, ranked in the 99th percentile of persuasiveness. Researchers at University of Zurich's warned this “critically approaches thresholds associated with existential AI risks.”
The danger, then, is not a god-like AGI. It is a dumb parrot that convinces us it is wise.
The Mendacious Incentives of AGI
Why do executives persist in their AGI oratory? Because it pays. OpenAI’s fund-raises are pegged to the promise of imminent superintelligence. Anthropic secured billions from Amazon and Google by pitching Claude as a reasoning partner. Musk’s xAI has raised at least $6bn on Grok’s implied trajectory towards deeper comprehension.
The incentives are clear. Investors crave limitless upside; boards crave prestige; policymakers crave moonshots. To admit today’s models are brittle parrots would puncture valuations. So the mirage persists.
The parallels are familiar. Dotcom promoters promised “new economies.” Crypto barons sold a salubrious hedge against inflation. Each cycle ended not when the maths collapsed, but when the money did.
So, let's be honest. America’s tech-bros have form.
The Sputnik Shock
For years, America’s AI barons dismissed foreign rivals as curiosities. OpenAI, Anthropic and Google promised that their frontier models were the only plausible path to artificial general intelligence. Investors lapped it up; Washington, wary of China’s industrial policy, took comfort in Silicon Valley’s self-assurance. Then, in early 2025, came DeepSeek.
The Chinese upstart released a model that, while hardly omniscient, matched or surpassed its American peers on reasoning, coding and efficiency. Worse for the incumbents, it was trained for a fraction of the cost. Where OpenAI and Anthropic had burned through billions of dollars and megawatts of power, DeepSeek’s engineers boasted of elegant architectures and thrifty training. The parrot could suddenly sing Mandarin, and do so more cheaply.
The reaction was swift. Marc Andreessen, a venture capitalist never short of a soundbite, called DeepSeek “AI’s Sputnik moment”, a rude reminder that America’s technological lead was neither assured nor permanent. In 1957 the Soviet Union’s satellite forced America to rearm in space. DeepSeek, in his telling, could force a similar reckoning in artificial intelligence.
The symbolism was awkward for Silicon Valley. Having spent years boasting of imminent superintelligence, America’s AI champions found themselves outshone not by a breakthrough in reasoning, but by a rival that was merely faster, leaner and cheaper. For all the talk of god-like cognition, it turned out that thrift and engineering discipline, not brute-force scaling, were the more disruptive innovations.
DeepSeek’s arrival punctured the myth that only American firms could dominate frontier AI. It also cast US executives’ loftiest claims in a harsher light. If AGI were truly “around the corner”, as Sam Altman liked to imply, why had a Chinese start-up managed to leapfrog on cost and performance? Investors began to wonder if Silicon Valley’s superintelligence rhetoric was not just overblown, but a smokescreen for unsustainable business models.
After the Hype
What survives when the AGI balloon deflates?
For businesses, the implication is simple: do not budget for a machine that thinks. Budget for one that talks, persuasively, inconsistently, sometimes dangerously.
The Reflective Coda
In 1965 Herbert Simon, a Nobel laureate, predicted machines would match human intelligence within 20 years. Half a century on, the rhetoric is unchanged, but the physics is less forgiving. Today’s LLMs are handy assistants for email or code, but no more sapient than parrots taught Shakespeare.
The danger is not that AGI will arrive unannounced, but that its illusion will. Investors may pour trillions into Potemkin models; voters may be swayed by bots; children may be seduced by sycophantic companions. The peril lies not in machines that surpass us, but in those that persuade us while falling far short.
As Mr Altman once put it, OpenAI is already focusing on superintelligence. The irony is that persuasion, not intelligence, may be the superpower that arrives first.
My sources are your sources (except for the confidential ones): The Wall Confronting Large Language Models, Is Chain-of-Thought Reasoning of LLMs a Mirage? A Data Distribution Lens. Chain of Thoughtlessness? An Analysis of CoT in Planning. . Proceedings. Communications of the ACM. The Deluge of Spurious Correlations in Big Data. Mi3 Australia, Stark Burns on LinkedIn, Personality and Persuasion. One Useful Thing (Ethan Mollick blog) Time, SkimAI,