Home

Donate
Perspective

Why We Need a Carnegie Moment for the Age of AI

Stefaan Verhulst / Aug 19, 2025

The Carnegie Library in Syracuse, New York. Shutterstock

At the turn of the 20th century, Andrew Carnegie was one of the richest men in the world. He was also one of the most reviled, infamous for the harsh labor conditions and occasional violence at his steel mills. Determined to rehabilitate his reputation, Carnegie embarked upon a number of ambitious philanthropic ventures that would redefine his legacy, and leave a lasting impact on the United States and the world.

Among the most ambitious of these were the Carnegie Libraries. Between 1860 and 1930, Carnegie spent almost $60 million (equivalent to around $2.3 billion today), to build a network of 2,509 libraries globally — 1,689 in the United States and the rest in places as diverse as Australia, Fiji, South Africa, and his native Scotland. Carnegie supported these libraries for a number of reasons: to burnish his own reputation, because he thought it would help support immigrant integration into the US, but most of all because he was “dedicated to the diffusion of knowledge.” For Carnegie, greater knowledge was key to fostering all manner of social goods — everything from a healthier democracy to more innovation and better health. Today, many of those libraries still stand in communities across the country, a testament to the lasting impact of Carnegie’s generosity.

The story of Carnegie’s libraries would seem to offer a happy story from the past, a quaint period piece. But it has resonance in the present.

Today, we are once again presented with a landscape in which information is both abundant and scarce, offering tremendous potential for the public good yet largely accessible and reusable only to a small (corporate) minority. This paradox stems from the fact that while more and more aspects of our lives are captured in digital form, the resulting data is increasingly locked away, or inaccessible.

The centrality of data to public life is now undeniable, particularly with the rise of generative artificial intelligence, which relies on vast troves of high-quality, diverse, and timely datasets. Yet access to such data is being steadily eroded as governments, corporations, and institutions impose new restrictions on what can be accessed and reused. In some cases, open data portals and official statistics once celebrated as milestones of transparency have been defunded or scaled back, with fewer datasets published and those that remain limited to low-risk, non-sensitive material. At the same time, private platforms that once offered public APIs for research — such as Twitter (now X), Meta and Reddit — have closed or heavily monetized access, cutting off academics, civil society groups, and smaller enterprises from vital resources.

The drivers of this shift are varied but interlinked. The rise of generative AI has triggered what some call “generative AI-nxiety,” prompting news organizations, academic institutions, and other data custodians to block crawlers and restrict even non-sensitive repositories, often in (understandable) reaction to unconsented scraping for commercial model training. This is compounded by a broader research data lockdown, in which critical resources such as social media datasets used to study misinformation, political discourse, or mental health, and open environmental data essential for climate modeling, are increasingly subject to paywalls, restrictive licensing, or geopolitical disputes.

Rising calls for digital sovereignty have also led to a proliferation of data localization laws that prevent cross-border flows, undermining collaborative efforts on urgent global challenges like pandemic preparedness, disaster response, and environmental monitoring. Meanwhile, in the private sector, data is increasingly treated as a proprietary asset to be hoarded or sold, rather than a shared resource that can be stewarded responsibly for mutual benefit.

Indeed, we may be entering a new “data winter,” one marked by the emergence of new silos and gatekeepers and by a relentless — and socially corrosive — erosion of the open, interoperable data infrastructures that once seemed to hold so much promise.

This narrowing of the data commons comes precisely at a moment when global challenges demand greater openness, collaboration, and trust. Left unchecked, it risks stalling scientific breakthroughs, weakening evidence-based policymaking, deepening inequities in access to knowledge, and entrenching power in the hands of a few large actors, reshaping not only innovation but our collective capacity to understand and respond to the world.

A Carnegie commitment to the “diffusion of knowledge”, updated for the digital age, can help avert this dire situation. Building modern data libraries, embedding principles of the commons, could restore openness while safeguarding privacy and security. Without such action, the promise of our data-rich era may curdle into a new form of information scarcity, with deep and lasting societal costs.

Libraries for the AI moment

One solution is to empower trusted institutions to steward equitable and responsible access to data, revitalizing the data commons in ways suited to the AI era. These libraries could help break down silos, foster public trust and democratic participation, and empower a much broader range of actors.

Much like Carnegie’s original libraries, these proposed “data for public-interest AI libraries” would serve as trusted, community-oriented institutions; instead of books, they would curate, maintain, and share high-quality datasets for public benefit. Modeled on existing projects such as the Institutional Data Initiative at Harvard, they would be operated by multistakeholder consortia and governed transparently in a manner that ensured broad accountability. Most importantly, they would prioritize access for those currently locked out of the data economy, such as researchers, and public interest actors, who are the most at risk from the impending data winter.

AI libraries could support five key public interest benefits:

  • Enabling alternative AI models: By making datasets more accessible, they would support independent AI models beyond those created by Big Tech and enable innovation from smaller players.
  • Fostering algorithmic pluralism: Public AI libraries could support the development of more varied algorithms that would deviate from dominant Big Tech models. This could allow for more inclusive and culturally relevant AI systems.
  • Addressing neglected social and economic priorities: A wider range of data could generate insights related to priorities raised by underserved populations, such as healthcare access or economic opportunity.
  • Empowering researchers and civil society groups: AI libraries could lower data and insight barriers for academic and nonprofit research, democratizing AI research and safety testing and enabling a wider set of voices and groups to drive policy change.
  • Promoting transparency and ethical AI development: These public libraries could encourage clearer standards for data ethics and transparency. More transparency, in turn, could help build public trust in AI.

Key considerations for building AI libraries

Technology is always a double-edged sword, and this is especially the case with AI.

If we are to advance a 21st century version of Carnegie’s vision of knowledge as a public good, then we must build institutions fit for the AI age. AI libraries will not be buildings of brick and stone, but of protocol, openness, connectivity, and trust.

Achieving this vision will require thoughtful design and long-term commitment in at least four key areas:

  • Governance and ethics: A governance framework for the libraries will need to define responsible access practices, adherence to privacy, and generally ensure that ethics and responsibility are built-in by design, all which is essential to building public trust.
  • Data quality and curation: It is essential that data is rigorously vetted to avoid biases that could, for instance, disproportionately affect marginalized communities. Additionally, curating data to ensure that it reflects diverse perspectives and realities will enable these libraries to be equitable sources of knowledge.
  • Collaborative infrastructure: The individual libraries that made up Carnegie’s project were part of an overall vision, but for all practical purposes they operated separately. Today’s libraries will ride on a shared infrastructure that promotes interoperability, connectivity, and openness. To fulfill their potential, it will be essential to establish clear data-sharing protocols, ensure adherence to interoperable standards, and encourage open-source code and open standards.
  • Global and local alignment: A balance between global cooperation and local relevance is necessary so that these libraries can empower regions while benefiting from a widespread exchange of knowledge and resources.

The challenges of the AI era demand reinvented institutions as ambitious and future-facing as those of the industrial age. Carnegie understood that knowledge was power, and that equitable access to it was the necessary foundation of societal progress. Today, we face a similar inflection point.

In the face of our growing data winter, we must act decisively to build a new generation of public infrastructure: AI libraries that embody principles of openness, inclusion, and global collaboration. These libraries won’t solve every problem posed by AI, much less of the digital era, but they can ensure that the future of intelligence — both human and artificial — is not just the domain of a few, but part of a shared data commons.


Authors

Stefaan Verhulst
Dr. Stefaan G. Verhulst is Co-Founder of the Governance Laboratory (The GovLab), an action research center focused on transforming decision making using advances in science and technology. He is also the Co-Founder and Principal Scientific Advisor of The DataTank, based in Brussels, a think tank tha...

Related

Perspective
A New Age of Trillionaire Philanthropy Is Coming. Democracies Should Be Wary.June 16, 2025
Podcast
Daniel Solove on Privacy, Technology, and the Rule of LawAugust 10, 2025

Topics