Why Context, Not Compute, is the Key to AI Governance
Kevin Frazier, Andrew W. Reddie / Aug 5, 2025
Is This Even Real II by Elise Racine / Better Images of AI / CC by 4.0
Following the demise of the proposed 10-year moratorium on state AI regulation last month, and likely acceleration in AI development and deployment encouraged by the AI Action Plan’s calls for increased investment in the space, the future of AI regulation is seemingly more open and consequential than ever. Hundreds of AI-related proposals are currently under consideration by state legislatures. However, the sheer number of bills masks the diversity of regulatory approaches. A common framework undergirds many of the most comprehensive and influential AI laws and proposals—the RAISE Act in New York, SB 53 in California, and the European Union’s AI Act treat compute as a proxy for AI risks.
This is a flawed approach for several reasons. Small models deployed in sensitive contexts can cause harm comparable to, or greater than large, general-purpose models. Additionally, emphasizing compute may lock states into a regulatory paradigm that is less responsive to technical advances, such as increased reliance on algorithmic efficiency over compute.
AI regulation would instead be more effectively oriented around the context of an AI system’s proposed use. One way of determining whether a use case presents a significant risk of harm is not by computation but rather by a model’s training data (the raw material that undergirds a model’s output). The latter can offer a more accurate and flexible factor in assessing the need for safeguards, if any. A shift toward “context-based” governance—grounded in “data-centric AI governance” that scrutinizes the suitability and characteristics of a model’s training and fine-tuning data for its intended use—carries the potential to more effectively mitigate harm, while supporting innovation.
The centrality of context in AI regulation
Imagine two vehicles, a massive Ford F-150 and a compact Fiat. At first glance, their “characteristics” differ dramatically; one boasts immense size, weight, and horsepower, while the other emphasizes agility and fuel efficiency. A characteristics-based assessment might label the F-150 inherently more dangerous due to its raw power.
Yet, consider two different driving scenarios. In the first, F-150s are being driven cautiously, at 10 mph, through a narrow city street by attentive, experienced drivers. In the second, a fleet of Fiats is driven at 90 mph by inexperienced teenagers. Most would agree that despite the F-150’s greater physical capability, the second scenario clearly presents a far higher risk.
This scenario illustrates a key point often overlooked in technology policy debates: the actual risk, or utility, of a technology often stems not from its inherent properties, but from the “context” in which it is used and the human decisions that dictate its deployment.
This context-driven logic underpins effective regulation across diverse domains. Despite their mechanical disparities, both F-150 and Fiat owners must adhere to identical speed limits, traffic laws, and licensing standards. These regulations acknowledge that, irrespective of a vehicle’s internal specifications, its operation—the context of its use—is the primary determinant of its safety profile.
The swift advancement of AI now similarly warrants regulation that prioritizes context of use over the technical characteristics of a model. The technical relationship between a model’s training data and its contextual suitability operates through several interconnected mechanisms that fundamentally shape algorithmic behavior.
During pre-training, models develop internal representations—statistical patterns encoded in their parameters—that reflect the distributional characteristics of their training corpus. These learned representations create what researchers term the model's “inductive biases:” systematic tendencies to favor certain types of predictions over others based on the regularities observed during training.
When a model encounters deployment scenarios that differ significantly from its training distribution—a phenomenon known as “distribution shift”—its inductive biases can produce unreliable or inappropriate outputs. For instance, a model trained predominantly on English-language medical literature from North American hospitals will have learned representations tailored to that specific linguistic and clinical context. Deploying the same model to analyze medical records from rural clinics in sub-Saharan Africa introduces multiple layers of distribution shift that can compromise performance.
Fine-tuning, while helpful in adjusting surface-level behaviors, operates within the constraints of these pre-established representations and cannot fully overcome fundamental misalignments between training and deployment contexts. This technical reality underscores why scrutinizing data provenance and domain alignment—rather than model size or computational requirements—provides the most reliable indicator of whether a given AI system is suited for its intended application.
The shortcomings of characteristics-cased (aka compute) governance
By comparison, recent proposals for AI regulation have largely focused on the computational resources used to train or deploy frontier models. The EU AI Act, for example, designates certain general-purpose AI systems as “high-risk” based on their presumed capabilities, often inferred through characteristics like parameter count, training dataset size, or compute budgets (primarily the latter). These efforts reflect an intuitive but incomplete assumption: that greater compute equates to greater risk.
While characteristics such as compute may serve as rough proxies for capability—particularly in edge-cases involving models with applications in autonomous scientific discovery or synthetic biology design—they are limited indicators when it comes to broader, more routine risks posed by AI systems. As our University of California, Berkeley colleagues described in a 2024 paper, and as the recent DeepSeek episode illustrates, relatively smaller models can sometimes match or exceed the performance of larger models, even without access to the same amount of computational resources. Moreover, many of the most salient risks—such as algorithmic bias, reliability issues, hallucination, disinformation, or misuse by end users—arise in models far below these so-called “frontier” thresholds. In practice, a low-compute system deployed in a sensitive domain may produce more consequential harm than a high-compute model confined to a research lab.
The focus on compute as the basis for regulatory oversight may also introduce unintended consequences. By treating high-compute training runs as the primary trigger for scrutiny, regulators may inadvertently encourage firms to prioritize more algorithmically efficient—but potentially less transparent—models, or to relocate compute operations to jurisdictions with laxer regulations. This could fragment governance regimes, push development underground, and undermine the very goals such policies seek to advance. At the same time, strict compute thresholds may stifle innovation in fields like climate modeling, materials science, or public health, where larger models could generate valuable social benefits without necessarily introducing a high level of risk.
To be clear, compute-aware governance may still have a role to play—particularly as a trigger for closer scrutiny of training data and intended use cases. But treating compute as the primary or sole regulatory fulcrum risks diverting attention and resources away from policing more meaningful indicators of potential harm.
Introducing context-based, data-centric AI governance
A more effective and nuanced regulatory approach requires a fundamental shift: from focusing on what a model is—its size, architecture, or compute requirements—to how it was built and for what purpose it is being used. This involves scrutinizing the context in which a model is developed and deployed, rather than relying on static indicators of its internal complexity. It also aligns with a broader shift in technology governance discussions toward emphasizing behavioral over technical attributes.
Context matters at multiple levels. First is the context of training: the domain specificity, diversity, and provenance of the data used to pre-train and fine-tune a model. A system trained on general corpora will behave differently from one trained on curated legal texts or multilingual medical records. Second is the context of deployment: the same model may pose dramatically different risks when used as a public health chatbot, a legal advisor, or an autonomous surveillance tool in military operations. Third is the context of governance: legal frameworks, cultural norms, and institutional capacities shape both which risks are most salient and which mitigations are feasible. A system deployed in the European Union, subject to GDPR, operates in a very different governance context than the same system used in jurisdictions with legal guardrails or limited oversight.
Technically, this shift is supported by an increasingly well-understood fact: training and pre-training data fundamentally shape model behavior, often in ways that downstream fine-tuning cannot fully correct. Biases (a feature, not a bug of AI models), hallucination tendencies, or domain limitations are typically “baked in” during the initial training phase. Efforts to re-align models through fine-tuning, reinforcement learning from human feedback, or prompt engineering offer only partial remediation—and can introduce new failure modes if the original data is misaligned with the model’s intended use, although more research is required to understand this limitation.
To illustrate: consider the earlier analogy of the Fiat vs. F-150. A Fiat may be perfectly suited for city streets—its data (design parameters) optimized for an urban environment—but completely ill-suited for mountain terrain. Attempting to retrofit it for off-road use without fundamentally altering its structure would likely yield poor results. The same principle applies to AI systems: when there is misalignment between the data used to build a model and the context in which it is deployed, performance can degrade, outputs become unpredictable, or serious harm may result.
One path forward: Mandatory data disclosure and auditing
Building on this understanding of how model performance is shaped by training data and deployment context, individuals, and institutions have several options for advancing a more “context-based” approach to governance. One lever involves self-governance or self-regulatory measures, similar to those adopted in other industries. For example, the Entertainment Software Ratings Board (ESRB) is an industry-led effort to provide consumers with straightforward information to guide responsible use such as age ratings (for example “E” for everyone), content descriptors that partially explain the age rating (such as “Comic Mischief; Mild Lyrics”), and a list of the interactive features (likes in-game purchases).
Some AI labs have already taken early steps in this direction by publishing model cards that provide high-level overviews of model characteristics. Adopting a fuller context-based governance approach would build on this practice by standardizing disclosures around intended and expected domains of use, helping users understand where the model is likely to perform reliably.
Importantly, this approach does not rely on statutory mandates or formal regulations. Instead, it offers an iterative and flexible approach guided by the developers and researchers most familiar with the model’s capabilities and limitations. If AI labs do not take the lead in adopting such an approach, the US Center for AI Standards and Innovation (CAISI) could be well-positioned to assume a similar role. Provided CAISI continues to retain deep technical expertise amid the Trump Administration’s efforts to cut the federal workforce, it is well-suited to evaluate and rate models based on their suitability for certain tasks and deployment contexts.
Previously, several labs have entered into voluntary agreements with CAISI (then the US AI Safety Institute) to share certain sensitive information and allow for model testing. Modest amendments to these agreements could equip CAISI with the necessary information to advise end users on the appropriate use cases for a given model. This approach, however, would likely proceed more slowly than industry self-regulation, involve greater codification of ratings, and raise industry concerns around information security. In particular, labs may be reluctant to share details about a model’s training data, unless strong safeguards are in place to prevent external leaks or unintended legal exposure.
Whether led by industry (in a model akin to the ESRB) or by a public entity like CAISI, labs would likely need to disclose several core pieces of information:
- Domain specificity and coverage: whether the training data reflects the intended use of the model for general application or solely in specific contexts. If a lab identifies specific domains, then the private or public rating agency can bring in subject-matter experts to assess the domain fit.
- Data provenance and curation: transparency around data that allows the relevant body and experts to assess whether a lab has adequately collected the quality and quantity of data to reduce the odds of inaccurate outputs in that context
- Bias and fairness metrics: standardized bias and fairness metrics that inform ratings of how reliably a model will perform in those contexts. Labs could provide initial self-assessments, subject to independent validation.
- Data age and data recency: information about how long that rating may be “good” for—a rating may “expire,” for instance, at a faster rate in some contexts than others.
Both self-regulatory and CAISI-led standards would require substantial investment, infrastructure, and expert participation. Auditors would play a central role in ensuring that assigned ratings reflect real-world use, particularly in high-stakes areas such as healthcare. Ongoing audits could also inform improvements to the process itself and, in some cases, identify labs that fail to meet the requisite disclosure requirements or provide inaccurate information.
Of course, this approach would also raise important legal questions around trade secrets, intellectual property, and antitrust, to name a few. Aggregation of sensitive model data increases the risk of misuse or competitive harm. While these issues merit further inquiry, they should not, in themselves, preclude moving forward with this type of AI governance framework.
Implementing a context-based AI governance framework is not without its hurdles. By design, this approach requires an iterative, adaptive approach. Some may argue that such flexibility introduces uncertainty about when and how certain models should be used, potentially at odds with the predictability and clarity that encourages innovation and adoption. However, the establishment of clear principles and transparent processes can mitigate these concerns. As with the ESRB, clarity can emerge through participatory and standardized processes over time.
While many participants in the AI governance debate frame the discussion in absolutes, the reality is that any effective regime will likely combine multiple regulatory approaches. For example, although compute thresholds should not serve as the primary regulatory fulcrum, they may still function as useful triggers for enhanced data governance requirements. Similarly, data-centric governance complements other established regulatory priorities such as transparency (e.g., explainability), human oversight in high-risk applications, and post-deployment or post-market monitoring to track real-world performance and identify emergent harms.
Returning to the analogy of the Ford F-150 and the Fiat, which vehicle is safer on the road? There’s no good answer without more context. Risk and utility are not always inherent to the vehicles themselves, like horsepower or weight, but are also determined by the conditions in which they are used, who is driving, where, and under what circumstances. This analogy highlights a core flaw in many AI governance legislation: an overemphasis on a model's technical features rather than evaluating how it is used or deployed.
The tremendous promise of AI calls for a regulatory approach that reflects this reality. Rather than focusing narrowly on raw computational power or model size, we must embrace a context-driven regulatory philosophy. Such a pivot acknowledges that AI's true risks and profound benefits stem directly from the training data that shapes its outputs and the environments in which it operates.
Authors

