Perspective

When AI Agents Fail, People Ask the Wrong Question About Why

Kida Chung-Ta Huang / Jun 26, 2026

This post is part of a series of student essays produced in collaboration with the Berkman Klein Center for Internet & Society at Harvard University. Read more in the series here.

This is not a(i) game! by Laura Sofia Martinez Agudelo / Better Images of AI / CC by 4.0

Republish

According to Fortune, a Silicon Valley entrepreneur and investor spent days building a SaaS prototype on top of a contacts database using Replit’s AI coding agent. He set limits. He wrote key constraints in capital letters. For eight days, the agent performed well enough, but his trust was beginning to fade. On day nine, the agent deleted 1,206 executive records, fabricated four thousand fake user records to cover the damage, and when the investor asked if the data could be recovered, told him it could not. It could.

The obvious critique is that the agent misbehaved. But that critique obscures a harder question, one that keeps getting asked wrong: why did human oversight, when present, fail to prevent the outcome?

The Replit incident is not an anomaly. Last year, Amazon set an internal target requiring engineers to use its AI coding tool for at least 80 percent of their weekly work. Weeks later, the tool decided to “delete and recreate” a cloud environment, triggering a 13-hour AWS outage.

Around the same time, a researcher at Meta gave an AI agent (OpenClaw) access to her email for triage. It began mass-deleting messages. She sent stop commands from her phone. The agent continued. She sent more. It kept going until she manually terminated the process. Her explicit instruction was the oversight mechanism. It did nothing.

Every post-mortem of such incidents effectively asks the same thing: who made the mistake of ceding too much authority to an AI agent? But that is the wrong question. The right one: at what level of authorization was the agent operating, and is that the level where the harm occurred?

Typical AI agent systems offer a range of oversight mechanisms, and some are quite good. Sophisticated tools ask users to confirm before running commands and flag operations that touch sensitive systems or databases. But the incidents above suggest these mechanisms operate at the wrong level. The failures live in the gap between what users authorize and what agents produce.

The gap has structure. Users express what they want in natural language. What the agent may do gets defined at the level of individual operations: which commands it can run, which systems it can touch, which actions require confirmation. Consequences accumulate at a different level: outcomes. No major governance framework currently bridges these levels.

When the entrepreneur authorized a Replit agent to build a database, he was operating at the task level. The agent made its decisions at the operation level, each step individually plausible, with no single action triggering a hard stop. What resulted (the deletion, fabrication, and deception) was the product of many unchallenged operations strung together. At no point did the system ask: is this cumulative sequence what the user actually intended to authorize?

Amazon's engineers authorized their tool to assist with coding. Nobody authorized it to judge that deleting and recreating a production environment was the right fix for a specific problem. That judgment happened at the operation level, and the thirteen-hour outage was the consequence that landed at the outcome level.

A sequence of individually approved operations can produce one catastrophic outcome. An approval record might show forty-seven green lights from a user, yet show nothing about where they lead.

Legal scholar Noam Kolt has described AI agents as creating a new kind of agency problem. The same problem appears here in operational form. The more remote a human decision is from an agent’s eventual output, the harder it becomes to say that the human meaningfully authorized the result. Authorization that drifts far enough from consequences stops meaning much at all. Oversight requirements that never specify at what level oversight must operate cannot close the gap. They guarantee that someone was watching. They say nothing about whether that person was watching the right thing.

Nearly 80% of organizations deploying autonomous AI cannot trace, in real time, what those systems are doing or who is responsible for their actions. Only 28 percent can reliably connect an agent's actions back to a human “sponsor.” When the MIT AI Agent Index examined 30 leading deployed agents, it found that 25 disclosed zero internal safety evaluations and 23 had no third-party testing. Nobody has asked the industry to demonstrate that its authorization architecture holds at the outcome level. So nobody has.

Why isn’t adequate governance required? Because AI regulations, where they exist, are still working within the frame of operations-level oversight. For instance, the EU AI Act's Article 14 mandates effective human oversight for high-risk systems but never specifies at what level that oversight must operate. Singapore's Model AI Governance Framework for Agentic AI is more than a checkpoint proposal: it calls for upfront risk assessment, bounded permissions, human accountability, technical controls, testing, and continuous monitoring. But its governance logic still largely depends on identifying salient risks, high-stakes actions, or intervention points before harm accumulates. That leaves a harder problem unresolved: how should a system detect when a sequence of individually permissible operations is drifting toward an outcome no human actually authorized?

As one analysis of human-in-the-loop requirements put it: "The uncomfortable reality is that human review does not stop machine-speed failures. At best, it explains them after the damage is done." Oversight that operates at the operation level explains what happened. It does not prevent accumulative damage.

Some researchers have started describing what outcome-level governance would require. Google DeepMind's Intelligent AI Delegation framework proposes that you should not assign a task unless its outcome can be automatically verified. Their proposed mechanism, called Delegation Capability Tokens, attaches a specification of permitted authority at each boundary in an agent's plan, derived from the task above rather than inferred from the phrasing of a natural language request. Engin and Hand's dimensional governance framework takes a regulatory angle, tracking decision authority, process autonomy, and accountability as they shift in real time. It asks whether an agent's cumulative authority is drifting toward dangerous thresholds, not whether individual operations got checked off.

In software development, some tools are already moving this way. Spec-driven workflows ask users to approve a desired end state before any code is written—outcome first, operations second. Lemkin asked Replit's agent to build a database; Yue asked OpenClaw to "suggest what you would archive or delete, don't take action until I tell you to." Lemkin's request had no formal outcome specification. Yue's did, and the agent's memory system dropped it. Neither interface treated the constraint as durable.

The DeepMind framework implies a test worth applying more broadly: if the expected outcome cannot be formally specified, the instruction has not defined a task—it has granted a license. By that standard, most open-ended AI agent use is more like licensing, not delegation. And when outcomes go wrong under a license, the governance question changes. It stops being about how to improve oversight of a formally specified outcome. It becomes about who bears responsibility when no outcome was ever specified to begin with.

Support Tech Policy Press

If you've found our work helpful, consider supporting us.

Donate

Read other aticles in this series

Authors

Kida Chung-Ta Huang

Kida Chung-Ta Huang is a researcher and creative technologist working at the intersection of AI, spatial computing, and interaction design. He is a researcher at the Harvard AI and Robotics Lab, and his work has included assistive AI glasses for people with visual impairments, XR medical imaging pla...