Stochastic Flocks and the Critical Problem of 'Useful' AI
Eryk Salvaggio / Feb 22, 2026
The Parliament of Birds, an 18th-century oil painting by Karl Wilhelm de Hamilton Wikimedia
AI technology is advancing. Anyone thinking critically about large language models and their impact on society now faces a more complex challenge: the agentic turn.
In the industry, agentic AI refers to ideal systems that “plan”: generating code that writes more code, executing multi-step actions across apps and models, and adapting autonomously. Agents are sold less as systems that know things but as systems that build things. Rather than just generating text or other media in response to a prompt, agentic systems produce code: custom software designed to take action in the world. Structural innovations, such as producing many more outputs and averaging or verifying the results with other agents, are addressing some of the reliability problems that plagued earlier models. Code shifts LLMs to a domain where failures are presumed legible and correctable.
These developments are producing real improvements in the LLM's user experience — but is that truly a vindication of the AI project? The press has no shortage of tone-shift think pieces arguing that these new models are transformative and demand new conceptual frameworks to consider their impact. Crucially, two of AI’s heaviest hitters, OpenAI and Anthropic, are said to be ramping up for initial public offerings in the coming months; we should, of course, expect an escalation of hype as the rivals attempt to inflate the market's valuation. But we should also expect true advances in the models they’re selling.
These two things can be true at once. The products of these firms feel more responsive and capable, efficiently addressing complex tasks. They can write code to solve problems, and keep rewriting it until the code works. This is not a moment to deny what seems clear to many users, but rather a time to emphasize that the underlying concerns of critical work remains, despite any popular consensus about the technology’s ‘usefulness.’
The critical position
In a foundational 2021 paper, Emily M. Bender, Timnit Gebru, Angelina McMillan-Major and Margaret Mitchell described LLMs as stochastic parrots—systems that reproduce statistically likely patterns from training data. The frame holds. But now these systems are more complex and even more inscrutable, and the temptation to attribute thoughtful intent to text must now be extended to code.
Agentic systems stack these parrots into interacting outputs — a stochastic flock. It is similar to high-frequency stock-trading algorithms, but for the production of code and language. (Appropriately, the plural noun for a flock of parrots is a pandemonium.)
The user experience and applications of the technology may change, but the foundational incapacity for accountability, and the underlying ideological and material infrastructures of the AI industry, remain. The paper’s central question, “Can Language Models Be Too Big?” is as relevant as it was five years ago with massive investments into data collection and processing. Other researchers who laid the groundwork to identify algorithmic bias, track the environmental costs of training, challenge the labor practices of data production, or rightfully resist the uncritical adoption of AI in academia are not simply saying no for the sake of it. They hold forth the minimum human conditions for its deployment and remain relevant. The flock only compounds the concerns of bias, false attribution of mind, and inefficiency.
Distinguishing system critique from model evaluation is not a concession to hype: it means focusing on collective benefits and harms rather than individual uses. We can talk about what models cannot or should not do without denying what they can do.
Implications of agentic AI
Waves of slopware
One product of the agentic turn is slopware: AI-generated software applications produced faster than they can be meaningfully reviewed, often aimed at short-term problems. Code produced this way typically centers the user's needs over all else. Distribute this software to other users, or allow it to interact, and you create a condition similar to an unregulated airspace, with swarms of individual, disconnected decisions creating pandemonium.
Code may seem to work in narrow circumstances through all sorts of hacks that mask underlying mistakes. Unlike malware, slopware is not intentionally disruptive; it disrupts through negligence: it’s the hard-coded variable that makes a single man's household finance calculator work, but leads to overdraft fees when used by a single mom. Designing software requires a soft touch in deciding what values and priorities it encodes and how it distorts, discards, or misrepresents the data it processes. Tracing this requires technical literacy, but also judgment about how code complements or detracts from the problems it tackles.
Compounding technical failures
Despite increased perception of reliability, LLMs can never be truth machines. So-called hallucinations are mathematically impossible to eliminate, and so the perception of increased reliability heightens the risk of overestimating an LLM’s suitability for a task. Where a language model produces persuasive and potentially false text or other media, an agentic system produces convincing code. This code is produced in ways that are harder to interrupt, trace, or audit than a single model output. You can’t reliably document the thought process for code created without thinking, so code must be considered untrustworthy until it’s verified.
Compounding accountability failures
Errors in an agentic system stack up invisibly until something cracks the facade. In sensitive systems, that glitch can harm people. The government is encouraged to use these systems for tasks such as automating benefits decisions, contract analysis, and regulatory review — areas where cascading failures can have serious human consequences. A rural town may use code to write a scheduler for rubbish pickup, only to find that it is sending requests to an Excel file that it deletes nightly. The point of government is not to save time by shifting the burden of labor to a retiree who is wrongly denied benefits.
Pushing computational solutionism
Not every problem is a coding problem. Access to code generation pulls us toward solving policy challenges with new lines of code, and to focus on problems that are legible to machines. This capture of the imagination of policy, industry, academia, and media habituates us to dehumanization. When thinking within the structural needs of a computer system, the blurry edge cases are no longer the soft fibers of a social fabric, but a technical nuisance. There is no simulation of a community that can solve its problems: messiness is a required step of sense-making, and of democracy.
Scaling resource extraction & waste
Agentic systems loop repeatedly, consuming far more resources than more purposefully built software. When a beginner rewrites a single line of code with an agentic system, they don’t just use one model — they activate the entire flock. That scale optimizes for individual output: more, faster, regardless of the code’s efficiency or its effects. This isn’t just an environmental cost. It degrades the information commons and creates problems for anyone downstream who relies on that code. Agential swarms expand individual computational power while compressing vast networks of labor and resource extraction into a single prompt window.
The durable questions
These are just a handful of the problems that derive from agent-based systems under the assumption that they are useful tools. We can oppose large language models on grounds well beyond claims of uselessness. We might examine the AI industry’s political power, its pattern of sloppy deployment based on hype, the dangers of surveillance, the original sin of inhumane data extraction, or in-built biases such as misogyny and racism. It is tempting to add: “and it doesn’t even work!” Online, critics circulate memes of language model mistakes that are good for a laugh and solidarity, but must not be mistaken for users' everyday experiences.
These dismissals evade precisely what needs to be addressed. Systems that don’t work would pose no threat to labor; systems nobody uses would pose no threat to the environment, and systems propped up by a failing industry will collapse — all we have to do is wait. That’s not a principled or rigorous ground for critique; it’s passivity, and it does not correspond to growing perceptions of use in the real world. If agentic AI sets a new direction for tech, AI safety frameworks that center a model’s “intelligence” rather than design decisions are fundamentally insufficient.
What remains urgently in dispute are the boundaries of utility: what usefulness means, for whom, and under what conditions? At what cost and from whom are benefits derived, and how are benefits and risks distributed? What decisions are quietly removed from public deliberation and handed to automated systems controlled by corporations, governments, and other institutions? That people are using language models doesn’t make criticism of them irrelevant. It makes it urgent.
Authors
