European Commission Issues Report on Researcher Access to Online Platform Data

Gabby Miller / Apr 9, 2024

On April 5, the European Commission issued a report summarizing platforms’ data access mechanisms currently available to academic and civil society researchers across Europe and the US. Its aim is to showcase the existing modalities and encourage the use of these mechanisms, which scholars have historically used to study issues ranging from the societal impact of an online platform’s design to the mental health impact of social media on youth.

The status report follows the implementation of the Digital Services Act, which promises to require providers of Very Large Online Platforms (VLOPs) and Very Large Online Search Engines (VLOSEs) to increase transparency into their services. It also comes amid growing uncertainty around changes to platforms’ data access policies and difficulty obtaining the data necessary for completing research on the information environment.

The DSA, or the European Union’s “rulebook” for making the internet safer, fairer, and more transparent, includes a provision guaranteeing vetted researchers the ability to request data from VLOPs and VLOSEs to conduct research on systemic risks in the EU. However, questions remain about the extent researchers can rely on the provision, Article 40.12, especially for research that involves automated data collection like scraping, which is often against platform terms of service. Others have raised concerns over what exactly effective access looks like.

X (formerly Twitter) is one such VLOP whose recent data access policy changes are particularly burdensome for researchers. Under Elon Musk’s leadership, in early February 2023, the Twitter Development team announced it would end free access to the Twitter application programming interface. And after nearly two months without clarity, the company announced a new three-tier API price structure, which includes a limited free version, ‘basic’ access for ‘hobbyists,’ and a costly enterprise tier for scaled commercial projects.

In another instance, TikTok launched a new researcher API in February, which expanded access for researchers to once-inaccessible data. However, critics point to the API’s terms of service, which limits restrictions on data retention and sharing as well as licensing agreements, as incompatible with the research process. The TikTok Research API also has a limit of 1,000 requests per day and is limited only to researchers in the EU and US, the report notes.

The report is a product of the high-level principles announced at the EU-US Trade and Technology Council (TTC) Ministerial Meeting last May, reflecting a “shared commitment” to advance data access from online platforms for researchers. It was also a starting point for discussion around last week’s TTC Ministerial Meetings in Belgium, which included workshops on opening up platforms’ black boxes as well as solutions against technology-facilitated gender-based violence.

The report is broken down into two main parts, which includes sections on the mechanisms for providing publicly accessible data, like researcher programs and web scraping provisions, and platform advertising repositories. Both sections demonstrate the discrepancies in how data can be accessed across and used online and under what conditions, depending on the platform.

Researcher access to platform data is often predicated on geographic location, institutional affiliation, and the specifics of the research proposal, as well as other concerns around protecting users’ data and avoiding monetary conflicts of interest. Access is also usually determined at the sole discretion of the platform for nearly every mechanism the report analyzed. The Meta Content Library and API, however, is an outlier in that it partners with the Inter-university Consortium for Political and Social Research (ICPSR) at the University of Michigan, which independently reviews all applications and provides users access.

Regarding geographical access, TikTok’s Research API is only available to researchers in the EU and US. Google Request Records is even more narrow, with a program only available to EU-based researchers. Most platforms offer fewer specifics on geography, but many did mention that applications will be considered “in accordance with the Digital Service Act,” according to the analysis.

Platforms’ preferences around what types of institutions they partner with also varied. Both the TikTok Research API and the YouTube Researcher Program mainly focus on working with academic institutions, whereas Meta’s Content Library and API encourages applications from researchers at both academic institutions or nonprofit entities whose primary purpose is scientific or public interest research. A researchers’ experience is also considered in some cases. Meta’s Content Library and API application requires evidence of skills in coding or querying language. Other applications ask researchers to demonstrate that their host institution is capable of fulfilling data security and confidentiality requirements.

One near universal requirement for data access, across both platforms and mechanisms, is specifics on a proposal’s research design and methodology. TikTok, which has a more stringent set of research design requirements, asks applicants to explain their hypotheses and submit an accompanying literature review. Other platforms, such as X, want to know why the data is needed and can’t be easily accessed through other sufficient means. Many also want a list of keywords associated with the project or examples of specific content that the researcher plans to analyze, such as the Reddit Researcher Access Request, which asks researchers for the subreddits under consideration.

The report includes an appendix of every platforms’ data access mechanisms, such as researcher programs and content libraries, alongside information about who can access them and whether an application is required. A separate appendix lists out all the platforms’ public ad repositories, which do not require applicants to be a researcher. TikTok and LinkedIn do, however, require users to apply for access with basic information about their area of expertise or a project description.

All information in the table is based on publicly available information published by service providers, the report says.

Related Reading:


Gabby Miller
Gabby Miller is a staff writer at Tech Policy Press. She was previously a senior reporting fellow at the Tow Center for Digital Journalism, where she used investigative techniques to uncover the ways Big Tech companies invested in the news industry to advance their own policy interests. She’s an alu...