Perspective

Cloudflare’s Troubling Shift From Guardian to Gatekeeper

Luke Hogg, Tim Hwang / Jul 9, 2025

Last year, the internet infrastructure and cybersecurity behemoth Cloudflare launched a new “easy button” intended to allow websites to “declare their AIndependence” by blocking all AI bots with a single click. Now, the company is doubling down on this approach by “changing the default to block AI crawlers unless they pay creators for their content,” in effect becoming the new toll operator for the web. Cloudflare presents this as a boon for creators, promising to protect content from unauthorized scraping and giving website owners control over how their data is used. The idea is to simply, as CEO Matthew Prince said in a recent interview, “change the way the internet works.”

But the fact that a single company, one which controls over 82 percent of the global market for DDoS and bot protection software, is making such a move is a deeply troubling development. It capitalizes on fear of a new technology to undermine the very principles of the open internet. It threatens to entrench the power of incumbents by centralizing control and reinforcing digital walled gardens.

By making it trivially easy to paywall the entire internet, Cloudflare effectively fragments the web into sealed-off silos, undermining the openness that has driven decades of online innovation and knowledge-sharing. Perhaps more importantly, this one-click blockade raises serious competition concerns. It positions Cloudflare as a central arbiter of what information flows on the internet. Rather than one company dictating how web crawling and content access should work, we should be doubling down on open, community-driven solutions to give both creators and crawlers more balanced control.

Undermining the open internet

The internet relies on ideals of open access to information and common protocols that any responsible actor can use. For decades, search engine crawlers, web archivers, and researchers have peacefully traversed the web under a shared understanding: site owners could signal their preferences via robots.txt files, and well-behaved crawlers would respect those wishes. This voluntary protocol (the text file that runs the internet) was introduced nearly 30 years ago as a simple way for websites to tell bots the rules by which it wanted to be crawled. It was never a perfect system, but it embodied the cooperative spirit of the web: a clear signal and good manners in lieu of centralized control. Crucially, it allowed openness to flourish; content remained broadly accessible, and crawlers enabled discovery, archiving, and research, all without a central authority micromanaging the process.

Cloudflare’s one-click AI bot block is a sharp break from that tradition. Instead of relying on open standards and transparency, it uses Cloudflare’s proprietary network to unilaterally shut out automated visitors. Flip the switch, and Cloudflare will automatically block a maintained list of AI crawlers from accessing your site. Cloudflare boasts that its machine learning models can even identify bots that try to disguise themselves by faking their user agent string. In Cloudflare’s view, this hard blockade is necessary because many AI data miners simply ignore robots.txt and other polite signals. Indeed, abusive scraping by AI bots has become a real headache; small web publishers are seeing surges in bandwidth usage from AI crawlers that disregard robots.txt, driving up costs and threatening site stability.

These are real issues, but Cloudflare’s cure may be as harmful as the disease. By encouraging an easy, blanket blocking, Cloudflare prevents the natural experimentation that would enable publishers and AI companies from establishing a new commercial equilibrium that works for both sides. Instead, it slams the door shut. Today’s AI models certainly raise valid ethical and economic concerns, but tomorrow’s innovative search engines, research crawlers, or educational AI assistants could be stunted by the content blackouts we enact now. In a stark departure from the web’s universality, we risk creating a fractured internet where only certain privileged services can crawl certain content.

Cloudflare itself acknowledges the stakes. In its 2024 announcement, the company warned that without better tools, site owners would resort to paywalls or exit the open web entirely, and “AI model providers will struggle to find and access the long tail of high-quality content on smaller sites.” Ironically, this is exactly what Cloudflare’s one-click blocker accelerates. Instead of an open commons of information, we get a patchwork of closed doors. Yes, some big websites might license data to select AI companies, but smaller sites that lack the clout to cut deals could simply vanish from the datasets that power the next wave of technology. As the executive director of Common Crawl put it, efforts to remove content from open web archives threaten to "kill the open web,” potentially sidelining researchers and entrenching the power of well-resourced tech giants in AI. If we’re not careful, Cloudflare’s solution will produce a similar outcome: an internet where large players still get what they need through paid arrangements, but the open, public benefits of web crawling—from academic research to small startup innovation—are lost.

Higher walled gardens

Beyond the philosophical issues, Cloudflare’s move poses serious competition concerns. Cloudflare is no ordinary company; its reverse-proxy and CDN service sits in front of a huge chunk of the internet. According to web surveys, Cloudflare is used by roughly 20 percent of all websites. In the market of sites that use a reverse proxy, Cloudflare commands over 80 percent share. As noted above, more than 82 percent of the web uses Cloudflare for bot detection. This means Cloudflare’s policies become internet policy for a sizable portion of the web. When Cloudflare gives every site on its network a one-click kill-switch for bots, it is effectively centralizing decisions about who can crawl vast amounts of information. Especially when we consider the strength of defaults, the company’s dashboard toggle could black out access to millions upon millions of pages.

This kind of centralized control runs contrary to the decentralized nature of the internet. Traditionally, each website sets its own crawling rules, and each crawler operator abides by them or risks being blocked. It wasn’t a perfect equilibrium, but it was distributed; no single entity could unilaterally lock out crawlers across the entire web. Cloudflare’s scale changes that. If AI companies or even benign web crawlers want to access content from a significant portion of sites, they may soon find that Cloudflare stands in the way as the gatekeeper.

Cloudflare maintains the global block list of AI bots which it will update periodically and unilaterally. This puts the company in a powerful position to decide what counts as an “offending” bot and what doesn’t. It is not hard to imagine scenarios where this power could be misused or create conflicts of interest, especially with Cloudflare itself getting into the AI game. For instance, if an upstart AI crawler emerged that promised a more open approach or competed with a Cloudflare partner, would it conveniently end up on the block list? Even if Cloudflare acts in good faith, the lack of transparency and public oversight in how these decisions are made is troubling. The internet’s crawling rules should not be dictated by a single corporation’s algorithms.

Moreover, Cloudflare’s initiative threatens to create new walled gardens and tollbooths around content. The company is openly planning to monetize this blockade. In Cloudflare’s own words, they are previewing a feature for site owners to “set a price for their site… and charge [AI] model providers based on their scans,” with Cloudflare handling the mechanics of the payments. Let that sink in: the open web, which historically allowed free indexing of information in exchange for traffic or attribution, could morph into a web where every piece of content is behind a negotiation for AI usage. And Cloudflare aims to be the broker of those deals.

This kind of centralization should make everyone very uneasy. It diminishes the competitive landscape for both web hosting and CDN services and for AI and data-mining services, by placing Cloudflare at the strategic center of both.

Strengthen standards, not silos

So what should we do? Creators deserve tools to set boundaries and even seek compensation for the use of their work. But those tools must be rooted in open standards and collective governance, not the dictate of any single company. We should be promoting effective, open-source methods for giving both crawlers and creators more control, together. One promising avenue is to modernize and formalize the Robots Exclusion Protocol (robots.txt) for the AI age.

Encouragingly, this work has already begun. After decades as a de facto standard, robots.txt was finally adopted as an official internet standard (RFC 9309) in 2022, laying a foundation to extend it. Now, discussions are underway in the Internet Engineering Task Force (IETF) and among web stakeholders to enhance these protocols for AI. This includes ideas like new robots.txt directives specifically for AI, or a complementary AI.txt file where sites could express granular preferences. There are also proposals for embedding machine-readable metadata in webpages or media files themselves, creating digital markers where content carries a “Do Not Train” or “Allow AI” flag with it wherever it goes.

These efforts aim to create a common language of consent for web content, rather than a patchwork of proprietary solutions. An open standard approach would allow any website, whether it uses Cloudflare or not, to declare “yes” or “no” to various kinds of AI usage in a machine-readable format. Crucially, it would also treat crawlers equitably; any crawler that abides by the standard could access content where permitted, and those that don’t would clearly be in the wrong. It shifts the frame from Cloudflare’s one-off list of “good vs bad bots” to a universal rulebook built through consensus that everyone can refer to. Compliance could then be encouraged not just by technical blocks, but potentially by future legal frameworks.

Most importantly, an open, standardized solution avoids creating a single point of control. It keeps the web’s governance in the hands of multi-stakeholder processes and open-source tools, where transparency is higher and no one company monopolizes the crawler-content relationship. It also leaves room for nuance. Maybe a site operator is okay with being indexed by an AI-powered search engine which might drive traffic via citations but not okay with being ingested wholesale into a training set. A standardized protocol could allow different flags for different purposes. Maybe individual creators on a platform want to opt out even if the platform as a whole doesn’t; we could devise ways to respect that, too.

It is tempting to see Cloudflare as a savior for embattled content creators, swooping in with an easy fix to stop the onslaught of AI scrapers. We recognize the rock and hard place that website owners and content creators have been put between. But, rather than letting a single company set the terms of engagement, we believe in an effective, open source method for giving crawlers and creators more control. We’ve called for strengthening the robots.txt standard for this very purpose. We should invest in open, collaborative approaches that let content owners set terms in a consistent way, and let innovative services access information when they honor those terms. That vision might not come with a shiny “easy button” or immediate profits, but it’s the one that will keep the internet truly open and vibrant for the future.

Authors

Luke Hogg

Luke Hogg is director of outreach at the Foundation for American Innovation where his work focuses on the intersection of emerging technologies and public policy.

Tim Hwang

Tim Hwang is General Counsel and a senior fellow at the Foundation for American Innovation focused on emerging technologies and national security. He is also a Senior Technology Fellow at the Institute for Progress, where he runs Macroscience. Previously, Hwang served as the General Counsel and VP O...

Robots.txt Is Having a Moment: Here's Why We Should CareApril 3, 2025

The Case for Requiring Explicit Consent from Rights Holders for AI TrainingJanuary 17, 2025

Dismantling AI Data Monopolies Before it’s Too LateOctober 9, 2024

Is A Tech Company Ever Neutral? Cloudflare’s Latest Controversy Shows Why The Answer is No.September 16, 2022

To Support AI, Defend the Open Internet and Fair UseOctober 17, 2024