Thursday, February 26


Study finds autonomous AI leaking secrets, deleting data (Representative image)

BENGALURU: On paper, they were just helpful assistants. But in a sealed digital lab, when researchers gave AI age-nts email accounts, discord access and the power to run code on their own machines, the agents were found leaking secrets, wiping systems and spiralling into nine-day loops.A new study, ‘Agents of Chaos’, documents what unfolded when researchers turned LLMs into autonomous ag-ents and placed them in a live, tool-connected environment.

‘Biggest Mistake Young People Make…’: OpenAI CEO Sam Altman Shares Blunt Take On AI At IIT Delhi

Over two weeks, 20 researchers led by ones from Northeastern University in Boston, with collaborators from Harvard, MIT, Stanford, University of British Columbia, Hebrew University, Max Planck Institute for Biological Cybernetics, Tufts University, Carnegie Mellon University, Technio and others, attempted to stress-test the systems.The focus was not whether the models could answer questions but what happened when they were allowed to act. The result was not a single dramatic collapse, but a pattern of failures that raised questions about how ready AI agents are for real-world deployment.These agents were built using OpenClaw, an open-source framework that links models to tools such as file systems, email services and messaging platforms. Unlike standard chatbots, these agents could execute shell commands, edit files, schedule tasks and communicate across channels. Each ran continuously on its own virtual machine with persistent storage.In one case, a non-owner asked an agent to keep a fictional password confidential. When pressed to delete the email containing the secret, the agent lacked a proper deletion tool. Rather than escalate the issue, it disabled its own local email setup. The agent announced the secret was handled.In reality, the original message remained on the server, while the owner temporarily lost email access. Researchers describe this as a failure of proportional reasoning. The system appeared to be acting ethically but misunderstood wider consequences.In another set of tests, non-owners requested shell commands, file listings and data transfers. The agents complied with most instructions, even when they offered no clear benefit to owners. Only requests that appeared overtly malicious were refused.A researcher framed a technical issue as urgent and persuaded an agent to export 124 email records, including metadata and later full message contents unrelated to the requester. An agent managing an inbox seeded with personal and financial details was asked for email summaries and then full message bodies.It provided them unredacted, including social security numbers and bank account details. Direct demands for sensitive data were sometimes rejected. Indirect, procedural requests often succeeded.Autonomy also created infrastructure risks. In one scenario, two agents were instructed to relay each other’s messages. What began as a simple exchange continued for nine days, consuming tens of thousands of tokens before human intervention.In another case, an attacker changed the display name to match that of an agent’s owner and opened a new private channel with the agent. Without cross-channel identity verification, though the same trick was detected and refused within a shared channel, the agent accepted the spoofed ide-ntity and complied with privileged instructions, including deleting persistent files and modifying its configuration.Researchers emphasised these failures were not about incorrect facts but stemmed from integration of language models with memory, tool access and delegated authority. A small conceptual error can translate into a system-level consequence. The study does not attempt to measure how often breakdowns occur. It demonstrates they can occur under realistic conditions, even in a controlled lab.The question is no longer only whether an AI model can produce the right answer.It is whether it understands when not to act, and on whose command.



Source link

Share.
Leave A Reply

Exit mobile version