Home

Donate
Perspective

Cloudflare Wades into the Battle Over AI Consent and Compensation

Courtney Radsch / Jul 9, 2025

Courtney C. Radsch is director of the Center for Journalism and Liberty at the Open Markets Institute and a nonresident senior fellow at Brookings, the Center for International Governance Innovation, and the Center for Democracy and Technology, and serves on the board of Tech Policy Press.

Matthew Prince, co-founder and chief executive officer of Cloudflare Inc., at the Semafor World Economy Summit in Washington, DC, on Thursday, April 24, 2025. Photographer: Kent Nishimura/Bloomberg via Getty Images

Cloudflare, a company that handles about 20 percent of all internet traffic, took a significant step in the escalating war between humans and the AI companies that have treated the open web as a limitless, consequence-free training ground. It will now block AI crawlers by default—a quiet but radical reversal of the status quo where bots could crawl first and ask questions never. It also launched a Pay-Per-Crawl marketplace that lets a select group of publishers charge per‑page access fees to AI companies.

Last week’s announcements mark a deliberate turn away from the 'scrape‑now‑ask‑later' model that has underpinned large swaths of generative AI development, embedding consent and compensation directly into web infrastructure. In one move Cloudflare gave publishers back something they’ve been desperately trying to reclaim: control, consent, and compensation.

Cloudflare’s default setting is especially notable: new domains automatically block AI bots unless explicit permission is granted. Existing domains can toggle the setting on but will not have their default settings changed. This effectively makes opt-in the default for crawling, something publishers around the world have sought as copyright exceptions for text-and-data mining for AI turn copyright on its head by requiring opt-out rather than opt-in as the default for AI training. Furthermore, Cloudflare is introducing cryptographic bot verification and transparency dashboards—collectively allowing publishers to see who is crawling their site, how often, and whether it returns referral traffic.

Spoiler alert: referral traffic has plummeted dramatically, and few think that it is ever coming back as generative AI summaries and chatbots become the new interfaces for searching for information and interacting online. The collapse of referral traffic and audience reach will impact not just digital advertising revenue but also conversion rates. I’ve heard from countless publishers and collective management organizations (which represent rights holders) that referral traffic is down significantly, while bot traffic is up.

Cloudflare’s data paints a stark picture of how AI crawlers operate at indefensibly extractive levels: OpenAI’s scraping-to-referral ratio is 1,700:1 and Anthropic is at 73,000:1. Traditional search engines—particularly the market-dominant Google—till drive traffic, but increasingly, initial answers come from AI models that summarize content without redirecting traffic. Google’s crawler now scrapes around 14 pages per referral, down from 6 to 1 six months ago (and 2 to 1 ten years ago) even as its total search impressions increased nearly 50 percent a year after it launched Google AI Overview (and as it was judged to be liable for having an illegal monopoly in search).

These figures crystallize what publishers and creatives have long argued: AI development is built atop uncompensated creative labor and poses an existential threat to those who provide the content that makes the Internet such a valuable public resource.

Bot traffic not only fails to generate revenue but also imposes costs on the target website (think hosting, bandwidth and overage fees), strains their servers, degrades site performance and user experience (think slower load times or broken pages and even sites), and pose security risks. It can even knock entire sites or archives offline. One major sports site reportedly got 13 million AI-bot visits per month but received just 600 human users—a ratio that underscores the unsustainability of current AI crawling.

“The change in traffic patterns has been rapid, and something needed to change. This is just the beginning of a new model for the internet,” Stephanie Cohen, Cloudflare’s Chief Strategy Officer, said in an interview with Reuters.

What the laws have not delivered

Meanwhile, courts and lawmakers around the world are mired in decisions about how and whether to establish a consent-based system for data collection and content usage. More than 70 lawsuits have been filed against AI companies, including OpenAI, Meta, Microsoft, and Anthropic, by everyone from novelists and visual artists to software developers and news publishers. As copyright the lawsuits wind their way through the courts, they have proceeded slowly and with murky standard-setting outcomes. Furthermore, relying on reactive legal rulings rather than proactive consent structures is too slow and risks locking in the current extractive model of AI development.

Meanwhile, lawmakers around the world are debating over how to balance respecting the interests of rights holders with those who claim that enforcing copyright will threaten AI innovation.

Cloudflare’s strategy sidesteps that lag. By baking verification, choice, and payment into infrastructure, it’s creating a live consent protocol that works today—precisely what policymakers have thus far failed to deliver.

“If the Internet is going to survive the age of AI, we need to give publishers the control they deserve,” said Cloudflare CEO Matthew Prince.

Infrastructure as economic arbiter

Cloudflare provides security, performance optimization, and DDoS protection to roughly one‑fifth of the internet, operating at what is often called the infrastructure layer of the internet. Meaning that its defaults play a standard setting role – and we know just how influential defaults can be from recent antitrust trials against Big Tech behemoths like Apple and Meta. Cloudflare’s data shows that less than 40 percent of the top 10,000 domains do not even have a robots.txt file, the machine-readable file that provides basic instructions to crawlers, though they are voluntary and often ignored.

By embedding guardrails at the Content Delivery Network (CDN) layer, the Silicon Valley-based corporation gains sweeping influence over the dynamics of AI crawling and could also set the standard for how much AI marketplace intermediaries will take in the transactions between publishers and AI companies. A Cloudflare spokesperson would not disclose any information about how much it will take as an intermediary.

Danielle Coffey, President and CEO of the News/Media Alliance, which represents over 2000 publishers primarily in the US, expressed excitement to me over the announcement, calling it “an important step towards strengthening an already-robust market for licensed content that compensates publishers for their copyrighted works.” Forthcoming results of research we are conducting at the Center for Journalism & Liberty show that there are more than a dozen companies that have emerged to connect rights holders with AI companies that want access to rights-cleared content and data.

For now, Cloudlfare’s monetization pilot is only available to a handful of major publishers in the US, though Coffey said the move could be “particularly helpful for small and local publishers who lack the resources needed to combat sophisticated AI crawlers on their own."

Many smaller publishers and local newsroom leaders I have spoken with struggle to keep track of the constantly evolving landscape of bots and how to deal with AI crawlers and monetization opportunities. Having a major company like Cloudflare set a new precedent could be helpful both directly and indirectly, especially if companies like Automattic—which uses Cloudflare and owns Wordpress, the publishing platform used by many publishers and content creators—pass through the default protections to their clients.

This is policy in the form of infrastructure: it defines who can enter, under what conditions, and at what cost. In a field where legislative progress has lagged, Cloudflare is stepping into the void—challenging both market and regulatory norms.

Gatekeeping in the AI age

The shift to a permission-based and monetized AI bot marketplace introduces a new form of gatekeeping. Cloudflare isn’t just verifying crawlers and enforcing consent-based extraction—it’s creating the platform through which bot access is negotiated, metered, and monetized. This makes it a kind of regulator-by-default, which raises questions about whether Cloudflare’s new system will entrench its own market position or set de facto standards for the AI economy. On the one hand, all of the AI licensing startups I’ve spoken to are aligned on the need for a permission-based system, but lack the market power to enforce one, especially given the legal and policy vacuum that remains.

But if Cloudflare becomes a primary authority on crawler access, how will this shape the emerging AI data licensing marketplace or the ability of smaller AI licensing startups and API providers to compete? Will they find themselves at a structural disadvantage against Cloudflare’s emerging ecosystem? Smaller players in the AI licensing ecosystem, from startups to academic researchers, may struggle to match the scale or reach to meaningfully participate. Meanwhile, Cloudflare’s ability to set pricing structures or exclude certain bots could shape norms for the entire AI training economy.

Policy implications: consent, power, and the quiet redesign of the Web

Cloudflare’s move introduces a consent-based architecture for AI crawling—something long demanded by journalists, creators, and policymakers alike. But as with any shift at the infrastructure level, it opens up a new set of questions about governance, accountability, and concentration of power. Below are some of the core questions the move raises—and why they matter.

1. Consent vs. neutrality

At what point does an infrastructure provider move from being a neutral intermediary to a policy-setting entity?

Cloudflare’s controversial role in removing service from sites like Kiwi Farms and the Daily Stormer already sparked backlash over infrastructure neutrality. Then, it was content moderation; now, the shift is economic. Either way, the company is clearly asserting policy-like controls over the web’s plumbing.

For technologists, journalists, and policy advocates, this raises urgent questions: Should infrastructure governance require transparency and oversight? If Cloudflare structures consent, should their rules be public, standardized, and accountable? How should the company design its criteria for participation in the marketplace, and will that be publicly available?

For years, Cloudflare insisted that it merely provided a conduit—routing traffic, mitigating DDoS attacks, and accelerating performance without taking a position on what content flowed through its pipes. That position was challenged in 2022 when the company terminated service to Kiwifarms, citing an “imminent threat to human life” and in 2017- when it refused service to The Daily Stormer. It was a watershed moment in content moderation, not because of what Cloudflare did, but because of who Cloudflare is: an infrastructural player in the internet stack, one step below the application layer, where content moderation issues typically take place.

Now, with its AI crawler policy, the company is not just moderating harmful speech—it’s moderating economic relationships. This is a meaningful expansion of influence. Cloudflare is defining who can access content, under what terms, and with what mechanisms for enforcement. That’s no longer just an operational service. It’s a form of private governance—akin to a gatekeeper deciding who gets through the toll booth.

2. Power and centralization

Could Cloudflare’s new system unintentionally entrench its own market position or set de facto standards for the AI economy?

Right now, the AI licensing market is still in its early stages. A handful of startups—like TollBit, SphereAI, and ProRata—are trying to build marketplaces for licensed data. Some large publishers, like Axel Springer and the Associated Press, have signed direct deals with Big AI companies, but this is neither scalable nor desirable for either the longtail of publisher and content creators or the longtail of AI developments or companies that need rights-based access to reliable, cleared content. This especially true of most of the web’s content, which falls into a gray zone: publicly accessible, often scraped, rarely licensed.

By embedding a monetization protocol at the infrastructure level, Cloudflare gives publishers a powerful new tool—but also inserts itself as a key intermediary. If the system works well, it may become the default path for anyone trying to charge for AI access. But if Cloudflare ends up controlling the authentication, verification, and pricing pathways for a significant share of global web traffic, the power dynamic shifts: from a decentralized set of publisher preferences to a unified, infrastructure-mediated framework. That could crowd out smaller intermediaries or alternative business models.

We’ve seen this before. Google’s ad stack consolidated through a mix of usefulness and ubiquity (it was also deemed an illegal monopoly in adtech recently). Cloudflare’s crawler stack could do the same.

3. Standards and interoperability

Will we see a race to build standardized protocols—or a fragmentation of crawling permissions across different infrastructure providers?

Right now, Cloudflare is rolling out its own system. But what happens when Amazon, Fastly, or Akamai launch their own bot authentication systems or monetization frameworks? Will each require separate negotiation? Separate verification? This kind of fragmentation could disadvantage small AI startups and open-source projects that can’t afford to comply with a dozen different systems.

On the flip side, if a unified standard emerges—say, a machine-readable license tag built into HTTP headers—it could make the web more navigable and more accountable. That would mirror efforts like Creative Commons or Fairly Trained, which offered standardized signals about usage rights. But for that to happen, it likely requires coordination among competitors and possibly even regulatory encouragement.

Without such coordination, the risk is Balkanization of the open web: a patchwork of consent signals, enforcement rules, and pricing APIs that reinforce the dominance of companies big enough to pay for scale.

4. Regulatory gaps and government inaction

What does it say that infrastructure companies are doing what lawmakers have failed to? For years, the generative AI boom has proceeded in a legal vacuum. US copyright law remains unresolved on whether ingesting copyrighted content to train AI models constitutes fair use. Courts have yet to rule decisively, and litigation is slow. Policy proposals—from data trusts to licensing collectives—are still on whiteboards.

Into this vacuum stepped Cloudflare. Not with legal doctrine, but with code. The company didn’t need to wait for Congress to pass an AI data rights bill. It simply altered its bot detection protocols, rewrote default settings for crawler access, and gave (some) publishers the choice to monetize. It did in weeks what policymakers have been debating for years.

That’s impressive—but also revealing. When a single company can implement a rights-respecting, consent-based access regime at scale, it’s worth asking why public institutions have failed to do the same. And whether relying on private actors—however well-intentioned—to establish norms is a sustainable path forward. We have seen the repercussions of relying on voluntary industry initiatives over the past two decades as a handful of companies have come to dominate the platforms we use to communicate and conduct business, and more recently, their politicization.

These policy implications are not just technical. They go to the heart of who governs the web in the AI age. Cloudflare’s intervention is meaningful not only for what it enables—publisher control—but for what it signals: infrastructure can now enforce policy choices that lawmakers have not yet codified. That may be good news for consent, but it should also spark a deeper debate about power, protocol, and public accountability in a rapidly privatizing internet.

A moment worth tracking

Cloudflare’s platform-level consent tool could be a serious corrective to unfair data extraction. It also shows what is possible outside legislative channels—especially when legislation is slow to emerge. But it also reminds us that consent isn’t enough: the stewards of that consent need accountability.

In the shifting landscape of generative AI, infrastructure providers are no longer passive conduits—they are hosts, regulators, and gatekeepers.

Authors

Courtney Radsch
Dr. Courtney C. Radsch is a journalist, author and advocate working at the nexus of technology, media and policy. She is Director of the Center for Journalism and Liberty and a nonresident senior fellow at Brookings, the Center for International Governance Innovation, and the Center for Democracy an...

Related

Perspective
Cloudflare’s Troubling Shift From Guardian to GatekeeperJuly 9, 2025

Topics