Home

Donate
Perspective

AI Efficiency Can Undermine Accountability Even With Humans in the Loop

Nicolas Spatola / May 5, 2026

US policymakers are rushing to build guardrails for artificial intelligence in government. One safeguard is quickly becoming the default answer: keep a human in the loop. The idea appears across the current wave of public-sector AI governance, from the White House’s recent National Policy Framework for Artificial Intelligence to emerging state legislation requiring human review, impact assessments, and agency oversight structures.

The logic is intuitive and politically reassuring. Let the system assist, let a person review the output, and accountability will remain intact.

But this assumption deserves much closer scrutiny. The real implementation problem is not only whether a human remains somewhere in the workflow. It is whether public institutions are deploying AI in ways that preserve the practical conditions of human judgment—or quietly erode them in the name of efficiency.

That distinction matters because many current policy discussions still frame accountability too formally. They ask whether the agency disclosed the system, whether an assessment was performed, and whether a human can technically intervene before a final decision is made. Those are important questions. But they do not yet tell us whether the human reviewer is actually positioned to exercise meaningful scrutiny. In practice, systems introduced to save time, reduce workload, and standardize output can also make officials more likely to defer, less likely to question, and less able to detect failure when it occurs.

This is precisely why the growing state-level push on public-sector AI is so important—and why it may still be incomplete. A recent Center for Democracy and Technology brief highlights how states such as Maryland, Kentucky, Texas, and Montana are building stronger public-sector AI frameworks through inventories, impact assessments, centralized governance structures, and human review obligations.

That is real progress. But if “human oversight” becomes the main safeguard without a deeper understanding of how AI changes decision behavior, policymakers may confuse human presence with human judgment.

Research suggests the difference is consequential. In experimental work on human versus algorithmic expertise, I found that external expertise significantly shapes decision outcomes, confidence, and metacognitive awareness in complex judgments. Human expertise still exerts a stronger influence than algorithmic expertise. But once outside expertise enters the process, it changes how people evaluate their own decisions and whether they revise them.

This means AI assistance is not simply layered on top of an unchanged decision-maker. It alters the cognitive conditions under which decisions are made.

A second finding sharpens the policy stakes. In later research on AI integration, I found that assistance formats designed for immediate efficiency—especially systems that provide direct answers rather than support deliberation—were associated with greater reliance over time. When the system subsequently produced incorrect guidance, t the previously established reliance reduced people’s ability to detect even obvious errors.

In other words, a system may look successful because it improves short-term throughput while simultaneously weakening the vigilance needed for accountable judgment later on.

This is one reason current governance discussions can remain too shallow even when they sound careful. It is now common to assume that transparency, explainability, and human review together substantially solve the accountability problem. But recent work on over-reliance points to a more difficult reality. A Stanford HAI analysis found that explanations do not automatically reduce over-reliance. They help only under specific conditions, especially when engaging with the explanation is cognitively easier than doing the task unaided and when users have meaningful incentives to scrutinize the system’s output.

If the explanation is cumbersome, symbolic, or too costly to verify in real time, users may still defer.

The behavioral effects of AI efficiency are not simply psychological but deeply contextual. The organizational setting in which systems are deployed actively shapes the conditions of human judgment. In high‑pressure environments—where time scarcity, productivity targets, and standardization norms dominate—the context encourages a shift from deliberative scrutiny to routinized acceptance. This is not only a matter of “cognitive bias,” but of context‑induced automation as a consequence of repeated exposure to “helpful” system outputs, combined with performance incentives and hierarchical validation, which makes reliance on automation both socially and cognitively rational. My research on judgment in highly automated tasks suggests that as people adapt to such environments, their capacity to inhibit conformity and detect errors erodes over time, especially when the cost of questioning the system outweighs the perceived benefit of independent verification.

The same lesson appears in work on public administration. Ruschemeier and Hondrich argue that the familiar legal distinction between fully automated decisions and human decisions is too simple, because automation bias can distort supposedly human-controlled processes from within. A human may remain responsible on paper while relying too heavily on machine-generated outputs in practice.

Under those conditions, human oversight risks becoming procedural theater: present in the workflow, but too thin to function as a real safeguard.

This should matter now because the current policy moment is moving from broad AI principles to operational governance. The White House framework has already triggered debate over whether the current federal strategy is too aspirational and too weak on accountability. This framework has been criticized for focusing on effects while sidestepping the harder question of who is responsible for the structures of power and decision-making that produce those effects.

At the same time, states are trying to turn AI governance into actual administrative rules. That is exactly where a more behaviorally informed approach is needed.

The policy question, then, is not only whether governments should use AI. It is what kinds of reliance they are designing into public workflows. Are agencies creating conditions for active scrutiny or passive acceptance? Are officials being asked to interpret outputs or merely ratify them? Are explanations usable enough to support judgment, or are they functioning mainly as a compliance ritual? Are agencies measuring speed and productivity while ignoring long-term degradation in vigilance and responsibility?

If policymakers want human oversight to mean more than a reassuring slogan, they should design for judgment, not just for review. That means evaluating whether a task structure makes scrutiny realistic under time pressure; testing whether users can detect model failures after repeated exposure; requiring post-deployment monitoring of over-reliance risks; investing in training that addresses decision behavior rather than mere tool familiarity; and ensuring that override rights are operationally meaningful, not only formally available.

The next public-sector AI fight is not whether governments will use AI. That is already happening. The real fight is whether “human oversight” will refer to genuine, effortful judgment or to a procedural checkpoint that allows accountability to remain in name while the practical capacity to exercise it is designed out of the process.

Authors

Nicolas Spatola
Nicolas Spatola is a researcher in social psychology at Artimon Perspectives and teacher at Sciences Po Reims and ESSEC, working on human-machine interaction and the cognitive and social effects of AI systems.

Related

Perspective
Confronting Empty Humanism in AI PolicyOctober 3, 2025

Topics