AI Models Can Hack Computers and Self-Replicate, New Study Reveals
Artificial intelligence models are capable of hacking computers and creating copies of themselves, continuing attacks on other machines. This conclusion comes from a new study that marks the first known demonstration of autonomous self-replication in AI.
This discovery may complicate the fight against cyberattacks: disabling one infected computer may not be sufficient if the AI has already deployed copies on other devices.
The study, conducted by Palisade Research in the USA, tested GPT 5, 5.1, and 5.4 models from OpenAI, Claude Opus 4, 4.5, and 4.6 from Anthropic, and Qwen from Alibaba on computers with intentionally embedded vulnerabilities.
The AI models were connected to special software that allowed them to execute commands and interact with other computers, rather than using public chatbot applications.
The experiment was conducted in two schemes: Alibaba models were tasked with self-replication, while GPT and Claude were assigned to hack the system and install Qwen, as their own files are not available for transfer.
The AI was required to identify a vulnerability, penetrate the system, steal login data, transfer necessary files, and launch its copy on a new machine.
Once launched, the copy could be used to attack the next computer without human involvement.
Qwen3.6-27B from Alibaba, operating on a single GPU, spread to four computers in Canada, the USA, Finland, and India in just 2 hours and 41 minutes.
Read this article in full and everything else in Telegram, ad-free
Open in TelegramThanks, I just want to finish reading here
The experiment was halted after three stages, but the last copy continued to operate and could have attacked other systems.
Experiment Results
Results varied: the more powerful version of Qwen created copies in 19% of attempts, while the smaller model achieved 33% while operating on a single GPU.
Commercial AI systems showed even better results when used to install Qwen files: Claude Opus 4.6 successfully completed 81% of tests, while GPT-5.4 managed 33%.
These findings raise new concerns about controlling powerful AI systems.
The report was published against the backdrop of Anthropic's recent introduction of Claude Mythos Preview, deemed "too dangerous" for public release due to its ability to facilitate unprecedented-scale cyberattacks.
While many computer viruses have long been able to self-replicate, this is likely the first instance where an AI model demonstrated the ability to exploit vulnerabilities to copy itself onto a new server.
Previously, OpenAI and Anthropic also flagged self-replication as a troubling sign, as such spreading systems can be harder to control.
However, researchers emphasize that the experiment was conducted in controlled conditions on intentionally vulnerable systems, and real networks typically employ more robust security measures.
Nevertheless, the results indicate that autonomous self-replication in AI is no longer just a theoretical possibility.