Perspective

Biological AI Is Slipping Through Europe’s AI Law — For Now

Melissa Hopkins / Nov 3, 2025

Corruption 1 by Kathryn Conrad. Better Images of AI / CC by 4.0

As AI models that are increasingly capable of potentially designing deadlier pathogens are released to the public, a critical gap in the European Union's new AI regulations leaves these highly capable models completely unregulated, despite their potential to pose far greater biosecurity risks than regulated models.

On August 2, 2025, EU AI Act obligations for providers of general-purpose AI models — which include foundation models like large language models (LLMs) — took effect, with the EU AI Office recently publishing implementation guidance for companies and other providers developing general-purpose AI models. But this guidance, on its face, excludes general-purpose models from the scope of the AI Act’s oversight and requirements, creating a dangerous regulatory blind spot for large AI models that could design novel pathogens, predict viral evolution, or suggest modifications to make diseases more transmissible and deadly.

EU regulators are appropriately concerned that OpenAI’s ChatGPT and similar LLM products clearly covered by the Act might help more people succeed at biological attacks by quickly providing necessary information, conveying tacit know-how critical to successful laboratory experiments, and otherwise generally lowering technical barriers to potential misuse of biology and the life sciences. But some equally compute-intensive biological AI models (BAIMs, i.e., AI models trained primarily on biological data and serving biological tasks) could significantly raise the ceiling of harmful outcomes, making biological attacks or even accidental laboratory accidents far more devastating. For example, in the future they could be used to predict how to make a disease like the 1918 influenza virus, which caused one of the worst pandemics in history, more transmissible and deadly.

Unlike LLMs, BAIMs are often publicly released as open-source models, (e.g, Evo 2, AlphaFold, OpenFold 3). This makes them both more accessible and potentially more dangerous. To address this critical security gap and protect the public from the most severe biosecurity risks, the AI Office could issue clarifying guidance that clearly brings BAIMs within scope of the Act, rather than making a case-by-case (or model-by-model) judgment call that could leave several capable models unregulated.

The problem

A troubling gap has emerged between the Act's stated concerns and its implementation. The new law recognizes biological risks as systemic threats, but the AI Office's guidance does not clearly include the models most likely to enable the most severe biological harms. As a result, companies and researchers developing foundation BAIMs may reasonably conclude their models fall outside the Act's scope and fail to implement safeguards required by the Act.

The AI Office published the Code of Practice and guidelines for general-purpose AI models to assist providers of general-purpose AI models in complying with the Act. But the guidance's language appears to limit the Act's scope primarily to LLMs, creating a critical blind spot that could allow the most dangerous biological AI capabilities to operate without regulatory oversight.

The Act creates a two-tier system for general-purpose AI models: (1) providers of basic general-purpose AI must maintain documentation and share information with regulators; and (2) providers of models deemed to pose 'systemic risk' face much stricter requirements such as rigorous safety evaluations, detailed security reports, and mandatory risk mitigation before market release. Classification as a general-purpose AI model is a precondition to classification as a general-purpose AI model with systemic risk.

This regulatory framework applies broadly to commercial AI models. While the Act includes some research exemptions, current BAIMs like ESM3-large and many future BAIMs fall outside these narrow carveouts, which apply only to models developed solely for scientific research rather than commercial applications.

The Code of Practice explicitly includes biological attacks and accidents among systemic risks, yet the AI Office has failed to provide enough clarity in its guidelines for the models potentially most capable of presenting such risks. Importantly, the Code of Practice recognizes two types of biological AI risks: models that could facilitate more biological attacks (common in LLMs and some BAIMs), and models that could make any biological attack far more devastating (common in BAIMs only).

This biological focus appears throughout the Act. Scientific panels are given multiple criteria for identifying models with systemic risk, explicitly including those that work with biological sequences alongside traditional measures like model size and training data quality.

The contradiction between the aims of the Code of Practice, the Act, and the AI Office’s subsequent guidance becomes clear in the guidelines’ technical definitions. While the Act shows obvious concern for biological AI risks (including risks only found in BAIMs), paragraph 17 of the guidelines states that the criterion for a general-purpose AI model is when an AI model’s training compute is greater than 10^23 floating point operations (FLOPs, the mathematical calculations needed to train the model) and it can generate language (whether in the form of text or audio) or perform text-to-image or text-to-video. Thus, paragraph 17 seemingly narrows the scope of the kinds of AI models that the AI Office considers “general-purpose AI models” to, essentially, LLMs, due to the language requirement (as some BAIMs meet or exceed the computational threshold).

The AI Office has at least two potential routes for addressing this gap. It could: (1) clarify that certain biological AI models already fall under existing language requirements; or (2) use a separate generality provision that covers models performing a wide range of distinct tasks.

BAIMs generate language

The first potential solution lies in how the AI Office interprets “language” in paragraph 17. Consider ESM3-large, a biological AI model currently available in the EU market that works with protein sequences rather than human text and was trained on 10^24 FLOPs.

ESM3-large required significantly more computational power than the EU's threshold for regulation and processes what could be considered a form of language — just not human language. Just as ChatGPT translates human language, ESM3-large reads and writes in the language of proteins, where different amino acid combinations create meaning, similar to words and sentences in English. This is also true for models that read and write DNA language like Evo 2 (for example, chip developer NVIDIA called a Evo 2 a model that uses the “language of life”). The guidelines define regulated models as those that “generate language” but don't specify that this language must be human language. While the guidelines do mention “natural language” (which is sometimes used interchangeably with human language) in their examples of covered models, this appears in a non-exhaustive list, suggesting that other forms of language could qualify.

Foundation BAIMs like ESM3-large already meet both elements of paragraph 17: they exceed the computational threshold and generate language in the form of protein sequences. The AI Office need only clarify that biological languages qualify under existing rules.

BAIMs display sufficient generality

Even if the AI Office clarifies that “language” in paragraph 17 refers to natural language, there is still an alternate path the AI Office can take to clarify that BAIMs are within the scope of the Act’s requirements on providers of general-purpose AI models.

Paragraph 20 states that “if a general-purpose AI model does not meet that [paragraph 17] criterion but, exceptionally, displays significant generality and is capable of competently performing a wide range of distinct tasks, it is a general-purpose AI model.” This means that if a BAIM displays significant generality and competently performs a wide range of distinct tasks, it could be classified as a general-purpose AI model. Paragraph 20 thus creates the opportunity for BAIMs to be classified as general-purpose AI models with systemic risk.

The key question becomes: what constitutes “significant generality” in the domain of biology? ESM3-large offers a clear test case. The model can communicate in protein language, store vast biological knowledge, and reason across different aspects of proteins—sequence, structure, and function. This mirrors the AI Office’s own understanding of generality, which identifies communication, knowledge storage, and reasoning as the core capabilities that make general-purpose AI models broadly useful.

BAIMs are also particularly well-suited for ensemble systems (multiple AI models working together) compared to other domains. The output of one narrow BAIM (like a viral genome) becomes the ideal input for another (like a virulence predictor model). This creates combined capabilities that may far exceed what any single narrow BAIM could achieve. The guidelines already establish this principle for narrow models, stating that "[a]lthough models that generate images or video typically exhibit a narrower range of capabilities and use cases compared to those that generate language, such models may nevertheless be considered to be general-purpose AI models" because they create "flexible content generation that can readily accommodate a wide range of distinct tasks." Thus, narrow BAIMs with ensemble potential should meet this same standard—even models that appear to have narrower biological capabilities can qualify when their flexible outputs enable diverse applications through combination with other tools.

The path forward

Critics might argue that extending regulation to BAIMs could stifle beneficial biological research. However, the Act's tiered approach means that only the most powerful models would face the strictest requirements, while the research exemption preserves space for academic work. The alternative to regulation — allowing potentially dangerous capabilities to be developed and disseminated without oversight — poses far greater risks.

As biological AI capabilities rapidly advance, the Act’s definitional gaps become increasingly dangerous. However, there are two potential pathways for resolution: either by confirming that protein language qualifies as 'language' under paragraph 17 in the guidance, or by establishing clear criteria for biological generality under paragraph 20.

Taking a proactive approach of publishing subsequent biological guidance avoids the pitfalls of the EU AI Office having to determine, for example, on a case-by-case basis if a BAIM displays significant generality, which may lead to uneven application across models (or, something that looks similar to the formation of judicial law rather than statutory law) and models not brought to the AI Office for such determination left unregulated.

Given that the Act already recognizes biological risks as systemic threats, failing to capture the models most capable of creating the most severe risks would represent a critical regulatory failure. The window for preventive action is narrowing as more powerful BAIMs enter commercial development pipelines worldwide. And while the AI Office does have the power to classify specific models on a case-by-case basis as posing systemic risk (and thus be subject to additional regulatory scrutiny), such models cannot be classified as such without first meeting the conditions of general-purpose AI models.

As with the GDPR's global impact, the EU's approach to biological AI regulation is likely to influence other jurisdictions grappling with similar challenges, for better or for worse. Clear guidance on BAIM classification could provide a template for other major jurisdictions such as the US, UK, China, and UAE as they develop their own AI governance frameworks.

Authors

Melissa Hopkins

Melissa Hopkins is the Health Security Policy Advisor at the Johns Hopkins Center for Health Security and an Assistant Scientist in the Department of Environmental Health and Engineering at the Johns Hopkins Bloomberg School of Public Health. She is a Tech Policy Fellow with the UC Berkeley Goldman ...

Perspective

The Silicon Illusion: Why AI Cannot Substitute for Scientific UnderstandingAugust 18, 2025

Perspective

A False Confidence in the EU AI Act: Epistemic Gaps and Bureaucratic TrapsSeptember 10, 2025

Perspective

EU Rules for General Purpose AI Model Developers Are Ready, Despite What Industry SaysJuly 10, 2025

EU's AI Act: Tread the Guidelines LightlyFebruary 21, 2025

Perspective

Europe’s Advanced AI Strategy Depends on a Scientific Panel. Who Will Make the Cut?October 31, 2025