Home

Donate
Perspective

In Critical Condition – How To Stabilize Researcher Data Access?

Mark Scott, LK Seiling / Nov 12, 2025

This piece is part of “Seeing the Digital Sphere: The Case for Public Platform Data” in collaboration with the Knight-Georgetown Institute. Read more about the series here.

Despite social media’s omnipresence, we still have little insight into the global platforms mediating a significant part of people’s daily lives.

This lack of understanding of social media has become more pressing against the backdrop of a decline in democratic norms worldwide. Democracies are seeking ways to preserve systems of checks and balances over global tech companies now worth more than many countries’ annual GDPs.

These platforms are not the sole drivers of such geopolitical developments. Social media can offer ways for people worldwide to connect with each other. But it can contribute to negative outcomes if ill-configured, abused by malicious actors, or insufficiently moderated. The effects are felt throughout society by individuals and institutions, which indicates that platform data — likes, comments, views, or shares — can offer insights far beyond the dynamics unfolding within any single tech giant.

Yet, the only legally-mandated platform data access framework, included in the European Union’s Digital Services Act, is in critical condition — only sustained by the hope that it may one day thrive outside the ICU.

The difficult origins of researcher data access

Private and public actors, including researchers, have tried to get their hands on platform data for over a decade. As Kayo Mimizuka and colleagues describe, the initially broad options to access data have narrowed with the introduction of data protection regulations and scandals like the one surrounding Cambridge Analytica.

As a result, platform data access mechanisms have implemented stricter vetting procedures, making data less available — except in the case of industry-academic partnerships, which have been only open to a few select researchers, mostly from the US.

The arrival of commercially used large language models highlights the value of human-generated data for model training. Much of this collected data originates from the open web, and as platforms compete to dominate the current global AI race, they have sought to secure their competitive advantage by further limiting access to their data or making it prohibitively expensive.

Amid dwindling opportunities for independent data access, the European Union passed the Digital Services Act (DSA), a broad legislative effort to hold companies accountable for their internal transparency rules around online safety.

The DSA gave birth to a new form of data access: one not reliant merely on platform goodwill, but mandated through regulation to serve the public interest in understanding how social media affects society. The rules oblige platforms that distribute content to more than 45 million European citizens to provide data access for researchers who meet specific criteria, such as being part of an academic institution and not receiving funding from companies that could be considered economic rivals.

The DSA includes two forms of data access. The first focuses on publicly accessible data, information that anyone can view, like the content and engagement on public user profiles. The second includes access to non-public platform data, such as individual usage behavior, as well as internal platform documentation.

While access to publicly available data is open to a wide range of non-commercial researchers, access to non-public data is, rightly, limited to researchers affiliated with a research organization — provided they investigate systemic risks in the EU, and can ensure that appropriate protection of data subjects’ rights and platforms’ business interests.

While access to public data is already possible, privileged access to non-public data was only specified this summer and has only recently come into force. Across the English Channel, the United Kingdom is now implementing its own mandatory data access framework, which is expected to come into force in 2027, at the earliest.

On paper, these data access regimes, empowered by specific online safety legislation, transform researchers from passive recipients into active investigators. They can be equipped to probe the mechanisms of platform power, and to explore how these systems might be steered toward more beneficial societal outcomes.

The DSA and the critical state of researcher data access

The conditions for regulated data access are largely shaped by the hostile environment into which it was born. Research data access was already on the back foot before the DSA came into force, and platforms faced little pressure to do better. After two years of Europe’s digital rulebook, companies have established a baseline that has constrained data access at every stage.

Under the public data access provision, platforms themselves conduct researcher vetting based on self-designed application forms.

These forms vary widely and have created unnecessary hurdles for researchers not required by the DSA. X’s form, for instance, has just 11 required fields, while Meta’s form requires responses to 53 questions, including one about the applicant’s date of birth.

Two tactics stand out. Some forms are deceptively simple, inviting incomplete submissions. Others are so complex that they drastically increase the burdens on researchers, especially those working on cross-platform research. TikTok further restricts access to researchers based in the United Kingdom, the United States and the European Economic Area — despite the DSA allowing researchers worldwide to apply, provided their work focuses on EU-related topics. To apply, many platforms also require researchers to accept terms of service that fundamentally contradict research freedom. These include prohibitions on deriving insights into platforms’ “usage, revenue,” or any other business aspect (YouTube), disclosure of “aggregated data” (TikTok), or the “monitoring [of] sensitive events” (X).

Before any data can be accessed, platforms must vet researchers’ applications. Data collected by the DSA40 Data Access Collaboratory indicates that most applications are decided within one or two months. It is already questionable if this constitutes the “access without undue delay” mandated within the DSA. That does not include instances where platforms took over 200 days to respond to a request, based on researcher submissions to the Collaboratory.

Even after researchers are granted access to publicly accessible data, pain points persist. Often, they receive incomplete or inaccurate data, as revealed through comparisons with independently scraped platform data. Ironically, however, public-interest scraping is prohibited under the terms of service of X, YouTube, Meta, and TikTok, respectively.

The fact that regulated researcher data access was not dead on arrival is thanks to the collective effort and sustained attention of regulators, civil society, journalists, and academics. Continued and expanded collaboration among these actors is the only way through which data access stabilizes, but also is invigorated.

As the authority responsible for enforcing data access under the DSA, the European Commission must ensure the rulebook’s guarantees for researcher access are treated not as optional commitments, but as enforceable rights. The initial investigations into AliExpress established a useful baseline for researcher data access, including multiple pathways for data access. However, most investigations into US-based platforms — including Meta and X, which are under EU scrutiny over issues such as data access compliance — are still underway.

Another must-have for improved data access is standardization, especially in regard to the vetting processes for researchers. Harmonized application templates across social media platforms, as well as clear procedural guidance on what information is required, how decisions are justified, and in what timeframe they must be communicated, can prevent platforms from forestalling data access.

Standards could also help establish a common baseline on what is considered “public data,” given that platforms currently do not interpret this term consistently. The paper “Better Access: Data for the Common Good,” authored by multiple experts and coordinated by the Knight-Georgetown Institute, offers a good starting point for these discussions.

In fact, the DSA explicitly foresees the development and implementation of standards by European and international standardization bodies to facilitate compliance with the data access obligations — a process that should meaningfully involve researchers, regulators and platforms.

Improved data access also can be ensured through researcher-wide infrastructure and capacity building. Only a well-informed research community that can responsibly handle platform data is best positioned to make effective use of data access frameworks. This requires greater education on best practices in data protection and security protocols, as well as resilient and independent infrastructures and solid technical tooling. These could include data validation services, technical data security solutions, and data repositories, which give researchers an overview of what data is available.

Still, none of this can succeed without a strong, well-funded researcher community. Researchers from civil society and academia must coordinate across disciplines to advocate collectively for functional data access. This includes sharing information about data access opportunities, co-developing legal and technical resources that support both applications and data access itself, as well as ongoing community building to widen the circle of researchers who benefit from regulated data access.

Reimagining the future of researcher access

Data access is in critical condition. But even if current mechanisms were to fail, that would not necessarily mark the end of research data access. Other forms of platform transparency — including independent data access mechanisms like public-interest scraping and direct data donations from the public to researchers — represent an untapped resource that can supplement the slow and uneven rollout of mandated data access regimes popping up in Europe.

There is also a more radical alternative.

If the DSA’s vision was to shed greater light on online discourse within the EU, its sibling regulation, the Digital Markets Act (DMA), was designed to treat the underlying market dysfunctions — the concentration of power among a few global tech giants — that hinder the emergence of fair and open digital environments.

Like the DSA, the DMA is still in its early stages. But if regulators succeed in leveling the playing field so that smaller platforms can compete more effectively with larger incumbents, this may create space for new providers to flourish in ways that allow researchers data access from the start. Bluesky, for instance, offers wholesale data access to external parties, recalling the early days of Twitter, whose pre-Elon Musk transparency practices were considered best in class.

Moreover, if interoperability becomes reality, as envisioned under the DMA, European providers could fill the gap left by non-compliant foreign platforms. In this scenario, research access would not disappear but could be rebuilt on more favorable terms, designed from the ground up to integrate meaningful transparency, accountability, and access for independent researchers. In short, the DSA offers the blueprint for building better systems, while the DMA mandates the tools to make rebuilding possible.

In this way, a “rebirth” of research data access could emerge not just from the remains of existing incumbents, but from a new generation of decentralized platforms that treat data transparency not as a regulatory burden but as a civic and scientific good.

Whether such a transformation may take hold will depend on whether funders, regulators, researchers, and civil society can turn the current failing state of researcher data access into an opportunity to reimagine what “data access” means beyond compliance.

Instead, they should see it as a cornerstone of democratic digital governance.

Authors

Mark Scott
Mark Scott is a Contributing Editor at Tech Policy Press. He is a senior resident fellow at the Atlantic Council's Digital Forensic Research Lab's Democracy + Tech Initiative, where he focuses on comparative digital regulatory policymaking topics. He is also a research fellow at Hertie School's Cent...
LK Seiling
LK (Lukas) Seiling is responsible for the coordination of the #DSA40 Data Access Collaboratory, a joint project by the European New School of Digital Studies (ENS) and the Weizenbaum Institute. Their academic background is in psychology, cognitive systems, and human factors. Since 2020, they have be...

Related

Perspective
Why Commercial Tools Can Scrape Social Media But Researchers Can'tNovember 11, 2025
Podcast
Why Independent Researchers Need Better Access to Platform DataNovember 9, 2025
Analysis
Determining Which Researchers Can Collect Public Data Under the DSAOctober 27, 2025

Topics