Perspective

National AI Ambitions Need a Data Governance Backbone. RDaF Can Provide It.

Eva Campo, Christopher Steven Marcum / Sep 29, 2025

Eva Campo is a research consultant at Campostella Research and Consulting, LLC. Chris Marcum is a senior fellow at the Data Foundation’s Center for Data Policy. The opinions expressed here are the authors’ own and do not reflect the positions of their employers.

AI represents a critical domain for America’s science and technology research and development portfolio. Public and private investment in AI, from frontier LLM models to computer vision for clinical diagnostics to autonomous manufacturing robotics, has quickly become a key driver for economic prosperity. Recently, the National Science Foundation (NSF), the Allen Institute, and NVIDIA announced a $152 million public-private partnership to develop open-source, multimodal AI models trained on scientific data and literature called OMAI. At the same time, the NSF signaled the next phase of the National AI Research Resource (NAIRR), awarding up to $35 million for a large-scale compute center. These moves are more than program news; they are a pivot point for US AI infrastructure.

However, investment in AI infrastructure alone is insufficient to guarantee global leadership in this emerging market. If the US wants trustworthy, efficient, and secure AI, its next investments cannot focus on compute alone. All organizations in the business of developing and using AI need to govern the data that fuels these systems—how it is collected, curated, described, accessed, reused, and audited. The National Institute of Standards and Technology’s (NIST) Research Data Framework (RDaF) is a practical way to do this now, without reinventing the wheel or creating onerous new regulations.

The missing layer in the AI Action Plan

The Trump Administration’s AI Action Plan sets an ambitious agenda, but many implementation paths still treat data governance as an afterthought. From our vantage point—shaped by years of collective experience in evidence-based policymaking and practice in Federal research, statistical, and standards programs–the risk is clear: Without lifecycle data governance, America’s AI strategy will reproduce familiar problems at greater scale, including a lack of transparency, off-target training pipelines, limited reproducibility, privacy and confidentiality risks, compliance uncertainty, and weak accountability for model inputs, outputs, and decision-making capacity.

This concern is not confined to large language models (LLMs). At a National Academies workshop this past August on embedded AI systems (e.g., diffusion models, embodied and autonomous systems, and agents built on sensor and signals data), researchers and defense stakeholders raised concerns about data governance issues in training data sparsity, simulation, and validation for safety-critical contexts. These systems depend on data provenance, metadata, updating, and disciplined access at least as much as generative LLMs do.

Such concerns highlight why strong data governance is needed for the US, or any, national AI strategy. The RDaF is an “off-the-shelf” solution. Developed with broad stakeholder input by NIST, it is a modular, role-based, lifecycle framework that helps organizations plan, generate, process, share, preserve, and retire data with consistent conformity to open standards for metadata, access controls, and documentation. Three benefits make it especially relevant now for AI:

Security and accountability. Documented tiered access, provenance, and usage logs enable tracing of model inputs and outputs—supporting export-control enforcement and responsible sharing across NAIRR’s open and secure environments. The RDaF also provides data governance principles that help mitigate risks across domains, including biosecurity, cybersecurity, and privacy.
Interoperability and efficiency. The RDaF aligns with open standards for data governance, the Findable, Accessible, Interoperable, and Reusable data principles, and existing federal mandates such as the Evidence Act, agency public access policies, and the Privacy Act. It lowers integration costs for public and private organizations alike, and complements international commons efforts (e.g., EOSC, ARDC), improving cross-border scientific collaboration.
Adoptable today. The RDaF is non-regulatory and already familiar to federal science organizations. Organizations and agencies can phase it in through guidance, funding conditions, and training—no new statute required. It is already referenced in the Office of Management and Budget’s M-25-05 implementation guidance for the Open Government Data Act.

Data governance remains one of the most critical, and yet underappreciated, aspects of AI policy today. From access to high-quality data assets for training of LLMs, to management of safeguards for AI systems with debated decision-making authority, to control of information quality safeguards, strong data governance policies and practices protect intellectual property and individual privacy and ensure AI systems are compliant with national and international data sharing laws. Yet, we have seen that many frontier models—especially LLMs but increasingly embedded systems such as computer vision and autonomous robotics—have been developed and deployed without transparent data governance strategies. Consequently, a slew of avoidable copyright infringement and personal injury lawsuits, and a lack of trust in the models and their owners, have polluted the AI landscape.

Leading national AI strategy with strong data governance is ultimately about trust. The public deserves AI systems trained on appropriate, safe, timely, high-quality data; that are auditable, and that ensure public investments strengthen—not fragment—data ecosystems. Where compute brings capability, data governance builds trust.

Adopting the RDaF won’t settle every debate about AI or the data needed to train its models. It will, however, provide capacity at scale for trustworthiness in how data is managed for AI systems. With NAIRR and OMAI entering decisive phases, this is the moment to make data governance a first-order investment, not an afterthought.

Authors

Eva Campo

Eva Campo is a physicist and materials scientist turned research policy expert with extensive experience in research data management, open science, and international collaboration. She conceived and co-leads the NSF-funded project CyberTraining: Cyberinfrastructure Training and Education for Leverag...

Christopher Steven Marcum

Christopher Steven Marcum is an open science advocate. He previously served in the White House Office of Science and Technology Policy as the Assistant Director for Open Science and Data Policy and as Senior Statistician and Senior Scientist in the White House Office of Management and Budget. He is ...

Perspective

Why Context, Not Compute, is the Key to AI GovernanceAugust 5, 2025

Perspective

Beyond Safe Models: Why AI Governance Must Tackle Unsafe EcosystemsMay 1, 2025

Perspective

The Need for and Pathways to AI Regulatory and Technical InteroperabilityApril 16, 2025

Three Strategies for Responsible AI Practitioners to Avoid Zombie PoliciesJune 14, 2024

How To Assess AI Governance ToolsJanuary 28, 2024