Will artificial intelligence soon escape human control? (HT Tech)

WHEN ANTHROPIC, an artificial-intelligence lab, debuts on stock markets later this year, it is likely to be one of the biggest initial public offerings in history. That’s because the company’s Claude chatbot is beloved of coders, who are willing to pay a lot for access. Since Claude Code, its software-engineering agent, launched in February 2025, it has become indispensable for many human developers around the world. That includes Anthropic’s own: more than four-fifths of the code it published in May was written by Claude, the company says. Before Claude Code launched, the percentage was “low single-digits”.

The latest generation of AI models are such competent coders, engineers and (soon) scientists that many worry they may be among the last ever made by humans (PEXEL)

The systems have improved in quality of output as well as quantity. An influential benchmark from METR, a think-tank, shows that in early 2025 Anthropic’s models could complete tasks that took human engineers a little under an hour. The company’s latest systems can complete tasks that would take more than a working day.

And so it may be easy to raise a cynical eyebrow when the company, at the top of its game and outclassing the competition, calls for the world to have “the option to slow or temporarily pause frontier AI development”, as it did on June 5th. What market leader would not wish that its competition stop trying to catch up?

Yet Anthropic’s leaders, who have for years worried about the prospect of out-of-control AI wreaking havoc, seem sincere. The latest generation of AI models are such competent coders, engineers and (soon) scientists that many worry they may be among the last ever made by humans. Jack Clark, an Anthropic co-founder, thinks there is a 60% chance that, by the end of 2028, an AI system will be capable of creating its own successor with no human involvement.

That moment would mark the beginning of a process called “recursive self-improvement” (RSI), a closed loop. Version one of a model produces version two, which is faster and more capable; version two produces version three, which is more so again. The loop continues, and the improvements grow with each iteration. Build an AI system capable of this, and your human engineers never need to build another one again. “What can seem to many like a fanciful story may instead be a real trend,” says Mr Clark.

Nobody knows for sure what the consequences of RSI would be. Because AI can, unlike humans, work tirelessly and constantly, some think it would in short order lead to a superintelligent AI—a “fast take-off”. (It has also been onomatopoeically dubbed “going foom”, for the sound one might imagine an intelligence explosion making). AI doomers fear the superintelligence would be beyond human control, and that the start of RSI is the moment at which humanity’s fate is handed over to the machines. Yet a self-improving AI would probably face speed limits, at least at first.

Building a model capable of RSI would require automating a range of specialist tasks currently carried out by humans. At present data scientists work on the theory of AI and coders put it into practice. Systems engineers build the foundations on which toy models can be raised to production scale. Other people seek out novel sources of training data, or experiment with ways to generate it fresh. Alignment and safety teams check that what comes out of the training process won’t cause harm, intentional or otherwise.

Not all of those teams are equally amenable to AI assistance, and within each specialism some tasks are more automatable than others. It will not be too long until a human coder can do their job without ever writing a line of computer code themselves, but it may be some time until an AI is able to negotiate to acquire a previously-undigitised collection of scientific papers. It is not always obvious how the “jagged frontier” will progress. Designing new algorithms seemed one of the safer jobs, until one of Google DeepMind’s models, AlphaEvolve, began doing it in May 2025. It proposed a change to how Google spreads workloads across its data centres that saved 0.7% of the company’s worldwide computing power, and found better ways to perform matrix multiplication, which sped up the training of Gemini, the company’s flagship large language model (LLM), by 1%.

Full RSI requires every task in this chain to become automated. The AI-powered acceleration of research and development (R&D) may be felt before then, however. “As the fraction of AI R&D performed by AI systems increases, the productivity boost over human-only R&D” could increase ten-fold, then a hundred-fold, then a thousand-fold, according to a report published in January by the Centre for Security and Emerging Technology (CSET), a think-tank within Georgetown University. In that scenario, it warns that even if some aspects of AI R&D are initially difficult to automate, “the accelerated rate of progress means those bottlenecks are soon overcome.”

The joy of repetition

Today no AI model can build its own successor. But big AI models can build smaller models on their own. With human help they can build other big AI models, too.

Earlier this year Andrej Karpathy, a then-independent researcher who now works for Anthropic, trained a chatbot about as capable as GPT-2, a large language model built by OpenAI in 2019. Back then the model took 168 hours of training to build on 32 state-of-the-art chips; Dr Karpathy achieved the same result using a single computer with eight GPUs, the specialised chips used to build AI, in only three hours. With some more months of work he reduced the training time for his model, Nanochat, to just over two hours.

In March he handed the work of speeding up the training process over to an AI agent called Autoresearch. In two days the training time dropped to one hour and 48 minutes, and five days after that it fell to one hour and 39 minutes. “I didn’t touch anything,” Dr Karpathy says. The 18% improvement on the human work is striking because Dr Karpathy is a particularly talented human: he was a founding member of the research team at OpenAI and the head of AI at Tesla for five years.

The improvements themselves were prosaic. The AI agent picked better starting values for the training run, widened the scope of the LLM’s “attention” window and noticed that the model’s focus was wandering. None is particularly novel, Dr Karpathy says. But he had missed them. “They stack up and actually improved Nanochat,” he says.

Speed-ups of this kind are inevitable as models become more capable. Much of the work of building terabyte-sized frontier models is less glamorous than the AI industry’s enormous salaries and fancy offices suggest. It involves plumbing together the layers of an infrastructure stack that are bought in from third parties, debugging hardware and software set-ups and tweaking “hyperparameters”, the initial set-up of a training run, until the outcome looks solid. An AI system can do much of that today, with little supervision.

But even the more nuanced intellectual work is nearing automation, says Joe Spisak, a researcher at Reflection AI, a lab based in New York that is building frontier models that are open-weight (meaning their parameters are publicly released). Give a frontier system a rough sketch of an idea for efficiency gains, and it is increasingly capable of designing an experiment, running tests on a toy model, seeing what works and responding with a plan that is ready to implement at scale.

AI models can carry out these sorts of tasks, which take humans hours, in around 30 minutes. Increasingly, humans play the role only of research director, steering the AI to run experiments, which the models code up, debug, optimise and monitor themselves. The productivity boost is alluring, but also alarming. As humans’ role in the production process shrinks, they may lose control. The end result could be models trained by models, to achieve goals set by models, whose safety is verified only by models.

Some fear a disaster. Max Tegmark, a physicist and machine-learning researcher at the Massachusetts Institute of Technology who has devoted much of the past decade to campaigning for AI safety, likens it to a driver flooring the accelerator on the motorway with their eyes closed. The result would be certain doom, he told the forthcoming edition of The Economist’s “Inside Tech” video show, as long as the driver refuses to open their eyes. Professor Tegmark offers a variety of scenarios in which things go wrong: powerful AI systems could outcompete humans as the decisionmakers in government and commerce, disempowering humanity; they could offer supreme power to whoever first builds them, ushering in global totalitarianism; or they could simply cease to care about humanity at all, and gradually squeeze people out to make room for more data centres and power generation.

Three years ago, Professor Tegmark led a call for a pause in global AI development, arguing that the creation of the then-cutting edge GPT-4 was tantamount to that blindfolded journey. This year’s CSET report warned that the systems created by RSI “pose extreme risks. This warrants preparatory action now.” Anthropic, it seems, is now close to agreeing with that prescription.

Hot chip

There are also several physical constraints that will, for now, impose limits on the speed at which models can improve themselves. The most important is access to compute. Despite efficiency gains, newer models continue to use more computing power to train than their predecessors, forcing progress to occur at the pace of data-centre development.

Consumer use of AI may also slow down AI-powered R&D, says Helen Toner, interim executive director of CSET and a lead author of its recent report. The limited capacity in AI data centres needs to be carefully split between serving paying customers, training future models and carrying out open-ended R&D. The more demand there is in the first category, the less capacity, in the short term, there is for the other two.

Then there is the issue of training data. Much recent progress in AI has been in areas where models can teach themselves how to succeed thanks to “verifiable rewards”. A piece of software either runs or it does not; a mathematical proof is correct or it is not. In such cases synthetic data, generated by models purely to train other models, can be checked for accuracy and added to the training data without risking the degeneracy that normally comes with training an AI on its own output. It is trickier to make a model better at creative writing or legal judgment. If the models need to learn from the real world, that could also limit the reach of self-improvement.

“Closing the loop” may be a step on the road to superintelligence and—depending on your disposition—utopia or doom. But it is not the only step required to produce exponential growth in AI’s capabilities.

Source link