Protecting Society from Radioactive Data
Trey Herr / Jul 21, 2025Trey Herr, PhD, is senior director of the Cyber Statecraft Initiative (CSI) at the Atlantic Council, and an assistant professor of global security & policy at American University’s School of International Service.
Contrary to the phrase often repeated in industry and policy circles alike, data isn’t oil. In fact, that worn-out phrase, which has dominated discourse in technology circles for at least a decade, obscures more than it clarifies. It has helped lend a veneer of credibility to the rapacious collection of personal data on even the scant belief that data may hold future value. Recent breaches—such as of mobile phone location data from the broker firm Gravy Analytics, or protected health data from a laboratory services firm counting Planned Parenthood as a customer—illustrate the danger of this lazy metaphor.
A better way to think about data is as a radioactive material, a resource both powerful and dangerous enough that continuous action is required to avoid harm from it. This distinction is both clarifying and hugely consequential when you extend the logic.
Shifting this metaphor is not an academic exercise. For a sense of the practical impact and real-world urgency, look at the unwinding bankruptcy of the genetics firm 23andMe. A 2024 data breach already resulted in the leak of genetic data on almost half of the company’s 14 million customers, and its impending breakup has stoked fears of wildcat sales of even more such data. Or consider that large datasets still are a crucial input to generative AI models. Legal battles over how that data can be collected, retained, and protected are escalating.
Magic of a metaphor
Since the late 2000s, the metaphor “data is the new oil” has been popularized in countless policy conversations and documents. Examples include mentions by the EU’s Consumer Commission in 2009 and Commissioner for the Digital Agenda in 2012, from CEOs including IBM’s Ginni Rometty in 2013 and Under Armour’s Kevin Plank in 2016, and from national leaders like China’s Xi Jinping, who in 2017 called data “ a new factor of production, a fundamental strategic resource…" and Kenya’s Information and Communication Technology Principal Secretary, Jerome Ochieng, in 2019. Ochieng’s comment was particularly salient, as it was in response to questions about a novel Kenyan biometric data collection project being challenged in court over a lack of privacy protections.
Data as oil sought to describe the untapped value of human behavior and information recorded digitally. Data was framed as a resource to be harvested and a rich source of value to be contested and competed for, later conceptualized as an asset to be bought, sold, and traded. It led to data being discussed and understood along a spectrum defined almost exclusively by absolute market value, with few negative externalities like burning oil in the real world.
This metaphor captures some truth: data is valuable, and that value grows as different types of data are combined and datasets expand. The capabilities of the current generation of generative AI models—solving complex problems like protein folding, creating rich media content, and interacting with the internet—are rooted in huge volumes of images and words concentrated together.
But we should think about the value of data relative to risk and the cost of mitigating that risk. That’s why it’s best to think of data as radioactive material.
Data holds tremendous power when collected and concentrated, aggregated, combined, and refined. Testing and refining novel drugs requires not just anecdotal patient experiences but huge volumes of health data from tens or hundreds of thousands of people. The latest generative AI models aren’t trained on a single lengthy Reddit thread or even the collected works of James Joyce but something on the order of every book in the library of Congress. That power can be harnessed for significant economic and security gains. It does provide both commercial and social benefit. Nuclear power was a technical inflection point in civilian energy generation and underwrote a paradigm shift in the conduct of undersea warfare. The act of concentration, however, simultaneously creates conditions for catastrophic damage. The scale and sophistication of modern data collection and analysis necessitates a paradigm shift in how we conceptualize data in relation to economic activity, information security, and privacy.
Concentrate at our own risk
The harms of concentrated data have been repeatedly demonstrated in recent years. Here I focus on that which is generated by people, both personal and behavioral data.
- Unintended exposure of sensitive information. The 2017 Equifax breach exposed the sensitive financial information of 147 million Americans—nearly half the US population—because one company had concentrated massive amounts of financial data in a single vulnerable repository. The breach created lasting vulnerability to identity theft and financial fraud for millions who never chose to do business with Equifax directly.
- Abuse to compel, coerce, or unduly influence. The Cambridge Analytica scandal, which came to light in 2018, demonstrated how concentrated social media data could be weaponized for political manipulation, involving the data of as many as 87 million Facebook users and potentially influencing democratic processes. The harm stemmed not just from misuse, but from the combination of different data types and such a significant number of behavioral data points; concentration is what made such misuse possible.
- Deanonymization and illegal sale or use beyond any authorized purpose. Firms like Clearview AI have apparently scraped billions of facial images without consent, creating repositories that fundamentally change the landscape of privacy. By aggregating publicly available images into a searchable database, Clearview created risks that did not exist when those images were dispersed across the internet.
Why the oil metaphor falls short
The radioactivity paradigm helps change a pivotal assumption about data—namely that it can only pose a risk if misused. Too often policy debates in cybersecurity are framed around assumptions treated as ironclad—for example, that every business will collect as much data as possible, and that policy should focus on “best practices” for securing it.
"Data as oil" only encourages this mindset for both businesses and policymakers: if data is the world's most valuable unrefined commodity, why shouldn't businesses capture every drop in case it proves useful in the future?
This creates perverse incentives: companies are rewarded for hoarding data, maintaining control indefinitely, and treating the size of their data stockpiles as a direct competitive advantage. It’s a mindset leading to excessive collection and retention with minimal consideration of the inherent risks, driving an escalating cycle where market leaders accumulate ever-larger repositories. Applying this logic to competition between states only magnifies the perversion; facilitating arguments that the odds of national survival are correlated to the volume of information wrung from society. The result is what Gary McGraw, Dan Geer, and Harold Figueroa recently termed “data feudalism.”
Unlike oil, however, these data stockpiles don't just represent untapped value—they generate exponentially greater risk the larger they get, a distinction the oil metaphor dangerously obscures. The oil metaphor also implies that data can only pose a risk if misused and this fails to acknowledge a critical reality: the very act of concentrating data is a materially risky act. Much like the radioactive material at the heart of a nuclear reactor, continual positive action is required to avoid harm. This isn’t the first piece to argue in this direction, with Cory Doctorow in 2008 and Maciej Ceglowski in 2015 making much of the same case.
Thinking about data’s value and risks together
Thinking of data as radioactive material reframes the value of data into relative, rather than absolute terms. What value is worth the risk and can you, dear data collector, establish a risk-adjusted value of your dataset? Just as radioactive material becomes exponentially more dangerous as it’s concentrated, data aggregation creates exponentially increasing risk. Like radioactive material, data has a "half-life"—its value decreases over time but remains potentially harmful for extended periods. A data breach today can create a vulnerability for years or decades.
Calculating the value of personal data has been a central challenge for Web3 advocates attempting to reengineer the World Wide Web, and it’s a rock on which many effective data governance, privacy, and protection proposals flounder. There are no magical solutions—this is a hard problem to solve, particularly if the goal is a universal solution. Judging the risk-adjusted value of concentrating data must begin with the organization collecting and concentrating data. Just as society demands justification by groups concentrating radioactive material, we should actively question the concentration of data.
How those arguments shake out—in startups, research labs, companies, and universities—can help shape a more consistent, collective, approach. Here are several principles which might help.
- First, don’t reinvent the wheel, strengthen enforcement of existing data governance rules. The principles of collecting data only with a clearly defined purpose, minimizing what is collected to that purpose, providing for consent by those from whom that data is collected are all embedded in various policy models. You’ll find them in the European Union’s General Data Protection Regulation (GDPR), India’s Digital Personal Data Protection Act (DPDPA), and even the Organization for Economic Cooperation and Development (OECD)’s privacy and data flows principles. Data minimization captures these practices and offers a more easily specified, controlled, and overseen means of reducing the risk of concentrated and refined data and it can be applied and enforced more effectively, especially within organizations seeking to collect and concentrate personal data.
- Second, promptly disaggregate and delete data once it has served its intended purpose. Once data has served its intended purpose, it should be systematically disaggregated and deleted. In data privacy circles this is the idea of limiting data retention. This mirrors how modern nuclear reactor designs increasingly rely on smaller amounts of fissile material and nuclear facilities minimize on-site radioactive material to reduce inherent risk. After data has been narrowly collected with consent and used for a specific purpose, it should be deleted. Policy should more aggressively enforce time-bound limits on concentration and incentivize privacy-preserving technologies that deliver insights without requiring dangerous levels of aggregation.
- Third, acknowledge that consent does not travel. Consent can only be given to a specific and time-bound purpose. Users navigating a thicket of legalese just to access a website aren’t equipped or prepared to consent to myriad hypotheticals on how their data might be used for some future and often ill-defined purpose. Yes, organizations must provide data subjects with clear, accessible disclosures about their practices, secure affirmative consent where appropriate, and implement strict controls to prevent unauthorized access or misuse. But user consent should also be sought for any subsequent collection and at the point that the business purpose of collecting that data changes.
- Fourth, recognize consent is not effective protection against concentration. Concentrated data creates societal-level risks that individual consent—no matter how well informed—cannot adequately address. The harms of concentration are collective, consent is individual. Policies must acknowledge concentration as a distinct act. Like radiation containment systems that function regardless of operator intent, data protection must incorporate structural safeguards that limit risk independently of how data is collected. International coordination mechanisms are necessary to prevent regulatory arbitrage through "data havens," just as nuclear non-proliferation treaties prevent fissile materials from flowing to less-regulated jurisdictions.
- Fifth, no data governance regime should recognize the act of concentration as a business purpose. Concentrating data (aggregation, combination, and refining) is an act so impossibly broad as to render consent meaningless. On this, former Meta executive Nick Clegg was correct in 2022 when he wrote, “No value is derived from the mere collection or storage of data,” so it cannot constitute a business purpose. It would be nearly impossible to determine the risk vs. value without applying that data to a particular end. Concentration as a business purpose undermines most contemporary data governance protections while offering brokers and other firms effectively a cheat code to effective data governance. There are few market dynamics or policy incentives that can adequately check the risks of concentration but removing protection for this as a defined purpose would help address a glaring weakness.
Data is an input, one of several like capital and energy, to generative AI. The concentration of data is one of the factors driving the economic bubble and technical hype of an awesome generation of technologies. Not since at least the era of the personal computer has society seen these kinds of changes wrought to economic or social activity, not to mention the looming potential to reshape military affairs.
Data’s importance to these changes demands an all the more urgent and fundamental shift in how we conceptualize data security and privacy. These principles are a foundation for realizing data's value while mitigating inherent risk. Reconceptualizing data as radioactive material ensures that economic competitiveness and technological progress aren't made oppositional to privacy and security; that society can harness data’s tremendous power while protecting individuals and society from the harms of concentration
We don’t store radioactive material on the off chance it’s useful in the future. Would you want to do the same with huge volumes of, especially personally identifiable, data?The atomic age of data is here—evolve accordingly.
Authors
