Analysis

Unpacking TikTok’s Data Access Illusion

Daniela Alvarado Rincón, Ognjan Denkovski, Salvatore Romano, Martin Degeling / Jun 12, 2025

Access to platform data is one of the most crucial pillars of the European Digital Services Act (DSA). Under Article 40 (12) of the DSA, researchers, including those affiliated with civil society organizations (CSOs), can gain meaningful access to publicly available and real-time platform data without undue delay. This measure is fundamental for detecting, identifying, and understanding systemic risks in the context of the DSA.

So far, most of the debate around Article 40 (12) has focused on the front-end barriers to access: including concerns of a burdensome application process for data access, lengthy delays, problematic terms of service clauses (such as requirements for mandatory pre-publication reviews or obligations to refresh or delete data, and conditions that could expose researchers to legal liability), as well as arbitrary rejections. Far less attention, however, has been paid to what actually happens once access to data is granted. Is the data useful? Is the infrastructure functional? Or is it simply a box-ticking exercise?

Since January 2025, Democracy Reporting International (DRI) Europe has been granted access to TikTok’s Virtual Compute Environment (VCE), the data access mechanism the platform offers to civil society organizations in lieu of API access, which otherwise is limited to academics only. This analysis builds on our collective findings, drawing from DRI’s exploration of how the VCE could be used to monitor the online landscape ahead of Germany’s federal elections in February 2025, and from AI Forensics’ collaboration with academic researchers to simultaneously evaluate TikTok’s Research API.

What we encountered were two dysfunctional and restrictive tools that fall short of fulfilling the spirit—let alone the letter—of the DSA. Both the VCE and the Research API suffer from significant conceptual and practical shortcomings, making them difficult to use meaningfully for research on TikTok.

Why TikTok’s VCE fails as a research tool for civil society organizations

TikTok’s VCE design envisions researchers submitting queries through a two-stage process within a virtual cleanroom—a secure environment specifically designed for conducting research on sensitive data. In the first stage, known as the testing stage, researchers can explore the VCE by running queries limited to a daily sample of up to 5,000 individual records, drawn only from accounts with at least 25,000 followers. Importantly, this data cannot be downloaded and is accessible solely through the VCE interface. In the second stage, researchers submit scripts to query the full set of publicly available data. However, instead of just receiving raw data, researchers need to include their analyses—such as topic modeling or network analysis—directly within the data request script they upload. TikTok reviews these scripts to ensure that only aggregated results—never individual-level data—are shared. Once approved, researchers receive a link via email to download the aggregated results.

This process is problematic for several reasons.

First, the restriction to aggregated results is fundamentally incompatible with how meaningful research on social media platforms works. Whether the goal is to merge platform data with external datasets, conduct cross-platform comparisons, or identify emerging patterns, researchers need direct and granular access to raw data. Research exploration, iteration, and methodological transparency all depend on access to raw data. This restriction also contradicts the wording and spirit of Article 40 (12), which calls for researchers to have access to publicly available data. Although the DSA does not explicitly define “publicly available data”, our assessment is that the provision applies to all data visible to an average platform user, not just “selected” data. This interpretation aligns with the DSA’s transparency goals.

A deeper issue lies in TikTok’s lack of clarity about when and why certain data points may be excluded from the datasets it provides. Because researchers receive only aggregated outputs, they cannot verify which individual data points have been omitted, and consequently, from which data their analysis and insights are derived.

Second, by requiring CSOs to submit aggregated data requests through a virtual cleanroom—a setup typically recommended for high-risk processing of sensitive data—TikTok introduces platform-level gatekeeping under the pretext of data protection. While everyone agrees that data protection is important, this process, which applies only to CSOs, is unreasonable and draws an artificial distinction between CSOs and academic researchers. When applying for data access, both CSOs and academics must submit a data protection framework and adhere to data protection standards. However, while academic researchers can directly access publicly available information through the researcher API, CSO researchers need to wait for TikTok’s approval. So far, TikTok is the only platform that has taken this “middleman” approach.

Finally, TikTok’s implementation of a second stage, requiring prior review of the data request script, means that data is not shared immediately but only after it has undergone internal review. This process significantly slows down data access.

During its active engagement with the VCE (from January to February), DRI never received a single results file, with queries stuck in limbo for months, and no updates or estimated timelines from TikTok. There is no transparency regarding processing times, and no mechanism is in place to track or escalate delayed jobs. This unpredictability renders the VCE functionally unusable for real-world research workflows, where deadlines are tight. By contrast, other platforms, such as YouTube and Meta, offer APIs that, despite their own issues, provide more timely and stable access to data.

Beyond these high-level issues, DRI also faced significant challenges stemming from the VCE design and the lack of clear documentation on how the system works. For example, the search functionality for accounts is severely limited, relying solely on basic keyword matching. This restriction significantly hampers researchers’ ability to robustly identify accounts of interest using more advanced text processing techniques. The January 2025 version of the VCE had endpoint design choices that made it virtually impossible for DRI to investigate inauthentic political accounts on TikTok—an issue DRI regularly reports to TikTok. While these aspects and the documentation (which now clearly explains aggregated data: see the old version and the improved version) have since improved, such issues suggest a rushed rollout.

Administrative barriers and lack of support

The problems with TikTok’s research tools don’t start with the data, they begin much earlier: in the complex and opaque application and access process. For example, TikTok VCE provides only a single login credential per organization—a limitation they do not disclose upfront—making collaborative research virtually impossible. What makes this even more confusing is that DRI had already submitted the names and CVs of our research collaborators during the initial application, as requested. Yet, after gaining access to the API, DRI was asked to provide the same information again when requesting additional credentials. Even worse, TikTok never responded to DRI’s January 2025 request to add collaborators or clarify the process.

Most recently, DRI's VCE credentials appear to have been revoked by TikTok. This action was taken without explanation and falls outside the agreed-upon data access timeline specified in DRI’s original application. DRI’s attempt to contact TikTok’s designated support for VCE-related inquiries has not yet yielded a response.

Why TikTok’s research API wouldn’t solve the problem

Since civil society research is limited to the VCE, which has demonstrated significant limitations in functionality, it seems reasonable to demand access to the less mediated Research API, which is only available to academic researchers. However, investigations led by AI Forensics, in collaboration with researchers from Vrije Universiteit Brussel, reveal that this tool also hinders researchers from obtaining an accurate view of the platform.

While TikTok has fixed earlier issues, such as the API returning inaccurate information about view and follow counts, additional errors persist, making it difficult for researchers to trust the API responses.

Incorrect documentation and a lack of transparency from the API make it difficult to determine whether a video does not exist in the API or is restricted in some way. While TikTok lists some limitations to the data available via the API—such as excluding metadata for videos published within the last 48 hours, featuring minors, or posted by users in Canada—there appear to be additional, undocumented and seemingly arbitrarily left-out videos that are publicly available through the website and app of TikTok, but not through the research API.

In a report published today, AI Forensics finds that TikTok’s research API does not allow access to information on videos that have been posted by TikTok itself, videos that are posted as ads, as well as videos from an estimated 1.5% of accounts that do not fall in any obvious category. Moreover, AI Forensics' research indicates that prevalent algorithmic audit methods, specifically data donations and sock-puppets, face significant limitations and are ultimately compromised.

Researchers are left in an unsustainable position

Civil society organizations and academic researchers play a crucial role in identifying systemic risks on platforms like TikTok—whether that involves tracking illegal content, detecting inauthentic behavior, or understanding the spread of disinformation. These efforts are especially vital during high-stakes moments, such as elections, where timely and reliable access to data is essential. Yet, based on our assessment, TikTok’s Virtual Compute Environment and Research API fail to provide the transparency, functionality, or responsiveness that rigorous research demands.

Based on our experience, TikTok’s VCE functions more as a nominal data access mechanism than one that fully meets the intent of Article 40(12). The system appears slow, tightly controlled, and lacks sufficient documentation. While it may represent an attempt to meet legal obligations, its current design significantly limits effective access. The research API, in principle, aligns more closely with the requirements of Article 40(12); however, its current implementation contains numerous errors that hinder both the accuracy and reproducibility of research.

Access to platform data is essential for carrying out our work, often under the pressure of tight deadlines and deliverables tied to donor-funded projects. When meaningful access through official platform channels is blocked or dysfunctional, we are left with two problematic options: manual data collection, which is slow and resource-intensive, or scraping, which poses technical challenges and potential legal risks.

What needs to change

TikTok should provide all researchers with access to a reliable, fully functional API from the outset. It is technically feasible to share individual publicly available data while protecting users' anonymity. Researchers must also be able to work with this data efficiently—requiring review and approval of data requests creates unnecessary delays, which run counter to the principle of providing data “without undue delay.”

Additionally, support for multiple credentials and collaborative access is crucial for enabling team-based research.

Crucially, access to individual-level publicly available data should not be limited to academic researchers. The DSA makes no such distinction in Article 40(12), and TikTok should ensure that civil society organizations are equally empowered to conduct independent scrutiny.

The authors are grateful to Anna Katzy-Reinshagen (Analyst at ISD, Germany) and Beatriz Saab (Digital Methods and Policy Manager at ISD, Germany) for their valuable contributions and thoughtful discussions, which significantly helped shape the arguments in this piece.

Authors

Daniela Alvarado Rincón

Daniela Alvarado Rincón is a Policy Officer at Democracy Reporting International’s Digital Democracy team, where she ensures the team’s research is effectively integrated into EU’s digital policy discussions. Daniela has a background in law and public policy.

Ognjan Denkovski

Ognjan Denkovski is the Research Coordinator for Democracy Reporting International’s Digital Democracy team, leading its research output and activities. Ognjan’s background is in computational social science as applied to issues of disinformation, P/CVE and political consulting.

Salvatore Romano

Salvatore Romano is the Head of Research at AI Forensics and an Industrial PhD candidate at the Interdisciplinary Internet Institute (IN3) in Barcelona. His research focuses on platform accountability and algorithm audits, investigating how digital systems impact society.

Martin Degeling

Martin Degeling (PhD) is a senior researcher at AI Forensics. Previously, he worked on the tiktok-audit.com blog.

News

Experts Echo EU Concerns Over TikTok’s Ad Transparency GapsMay 16, 2025

Researcher Data Access Under the DSA: Lessons from TikTok's API Issues During the 2024 European ElectionsSeptember 24, 2024

The Urgency of Social Media Data Access for Electoral IntegrityMarch 4, 2024

Data Sharing and the Delegated Act of Europe's DSADecember 11, 2024

Non-Public Data Access for Researchers: Challenges in the Draft Delegated Act of the Digital Services ActDecember 5, 2024