Researcher Access to Platform Data and the DSA: One Step Forward, Three Steps Back

Mateus Correia de Carvalho / May 31, 2024

Illustration adapted from a vintage Spanish-language microbiology book. Calsidyrose/CC by 2.0

The EU’s Digital Services Act (DSA) promises a lot. It promises to institutionalize a modern legal framework that ensures the safety of users online, protects their fundamental rights (recitals 3 and 9, DSA), “and maintains [a] fair and open online platform environment.” It also promises to tackle the societal risks associated with the design and functioning of platforms’ services (recital 9 and art. 34(1) DSA). At the core of the DSA’s promise is its so-called ‘systemic risk management framework’ (arts. 34 and 35 DSA). According to it, very large online platforms and search engines (VLOPs and VLOSEs) commit to iteratively identify, assess, and mitigate emerging systemic risks of, amongst others, dissemination of illegal content, gender-based violence, and negative effects on individuals’ fundamental rights, health, and the integrity of democratic processes such as elections and civic discourse.

Many scholars (see here and here), technical reports, the Commission, and the language of the DSA itself (recitals 90, 92, 96-98, 137) highlight that the success of this systemic risk management framework depends on the extent to which civil society structures and, specifically, independent researchers are able to review the risk management choices taken by VLOPs and VLOSEs. Researcher scrutiny of platforms’ actions can be a driver of platform accountability by bridging the informational asymmetries on the design and functioning of platforms that naturally favor big tech (recital 96 DSA). Researchers may also help to contextualize DSA enforcement discussions by shedding light on the social impact of the risk management decisions of platforms (recital 90 DSA and pointed out by the Commission here in p. 15, 21-22).

In this sense, the DSA took one major step forward. According to article 40 DSA, researchers (including those integrated in non-academic institutions) who are vetted by national public authorities shall be given privileged access to data on the design and functioning of platforms’ services that are necessary to conduct research on the societal impacts of platforms, including their risk assessment and mitigation decisions (art. 40(4) and (12) DSA). Platforms shall give access to requested data “without undue delay” (art. 40(12) DSA), facilitating such access through appropriate interfaces (art. 40(7) DSA).

So, if researchers’ scrutiny of platforms’ actions and decisions (and therefore the DSA’s overall regulatory promise) depends on access to information, it is worth asking the following very practical question: how is access to information of vetted researchers for DSA purposes governed now?

This post answers this question by focusing on a recent status report on researcher access to information published by the European Commission in early April 2024 (hereinafter, the Researcher Access to Data Report) that was authored in support of a recent Ministerial Meeting of the EU-US Trade and Technology Council. (It should be clarified that the Commission did not author the report, and the latter does not represent the Commission’s official position. Instead, the report builds on the work of The George Washington University Institute for Data Democracy & Politics and Dr. Mathias Vermeulen). This status report focused on the mechanisms set up by platforms to provide access to information to independent researchers (for a previous independent report that covers in detail the DSA’s requirements for access to information processes and the award of the status of ‘vetted researcher’, see here).

In the next lines, I unpack my main finding when studying the Researcher Access to Data Report: most platforms have created a de facto new set of rules on data access, adding to DSA provisions, through the imposition on researchers of a requirement to fill out an application form. These platform-designed application forms contain several requirements that excessively limit the ability of researchers to access information on platforms’ services and, therefore, run contrary to the DSA’s objective of enhancing the scrutiny over platforms’ risk management actions through evidence-based research. For the one step forward that article 40 of the DSA represents, some platforms' excessive access to information requirements take more steps back. In the next section, I highlight three of them.

Three steps back

The Researcher Access to Data Report aimed at showcasing the current modalities of researcher access to platform data in order to gauge how VLOPs and VLOSEs are currently complying with the DSA’s access to information requirements.

The report's main finding is unsurprising: in the absence of centralized rules on how to process researchers’ access to information requests, each platform is taking a different approach. Technically speaking, and in line with art. 40(7) DSA, platforms are facilitating access to data either through APIs or by allowing researchers to scrape data from platforms’ online interface. Importantly, however, most platforms covered in the report (except for Booking.com and Wikipedia) condition researchers’ access to information to the filling out of an application form.

Generally speaking, these application forms go far and beyond the legitimate practice of asking researchers to specify what type of information they need for the purposes of their research. They contain a series of control questions and requirements that create a de facto new set of rules imposed on researchers for them to get access to information within the DSA framework. Most rules and control questions posed by platforms (e.g., on the affiliation of the researcher to a research organization, on the connection of the research to DSA systemic risks, or on the researcher’s data security and personal data protection plans) represent a duplication of the control of the researcher’s integrity and ability to safely perform the intended research that is already done by DSA national supervisory authorities – Digital Services Coordinators, or DSCs – when awarding the status of vetted researcher pursuant to art. 40(8) DSA. As this process of researcher vetting is still in the early stages of implementation, it was expected that platforms would try to duplicate such control. However, to avoid a too burdensome duplication of researcher control, an expedited process of researcher access to information should be put in place by platforms for those who will increasingly be given the status of vetted researcher by DSCs.

At the same time, some other control questions and requirements featured in these application forms are not conditions for researcher access to information according to the DSA and run contrary to the legislative objective of granting researchers wide access to information. Three requirements imposed, more or less frequently, by certain VLOPs and VLOSEs should be highlighted as three major steps back when it comes to concretizing the DSA’s access to information objectives.

First, some platforms limit the eligibility of researchers based on their geographic location (e.g., Google only grants access to information to researchers based in the EU, while TikTok only considers researchers based in the EU and US as eligible). This runs counter to the DSA’s requirements, as the only geographical limitation imposed by the regulation on researcher access to information relates to the focus of the research (it must relate to the study of systemic risks in the EU) and not to the location of the researcher. It also hampers capacity-building of civil society (the more researchers involved in DSA research, the merrier).

Second, some platforms ask that researchers share with them excessive details on their research projects. This platform request is done under the heading of controlling whether the research relates to DSA’s systemic risks. And while some platforms rightly ask this in a straightforward manner (through a closed or open-ended question on, respectively, whether or how the research relates to DSA systemic risks), most platforms ask researchers to extensively describe their research design and methodology. Beyond a general description of the scope of the research, it is excessive and too burdensome for researchers (especially at early stages of research) to ask information such as research questions, hypotheses, methodology, or even a summary of the literature review to which the project relates (this last one is from TikTok). These control questions not only represent a duplication of the vetting process of researchers but also create undue obstacles to the objective of wide access to information that the DSA institutes.

Third, finally and most strikingly, some platforms (YouTube, TikTok, Meta, and LinkedIn) require researchers to agree to platforms’ terms whereby they must provide the VLOP or VLOSE in question with a copy of their research output before publication. This pre-publication disclosure of research is described either as a “courtesy notice” or a step needed for platforms to review the existence of confidential information or personal data in the research output. Despite the stated aim of this pre-publication disclosure or review, this is not an acceptable practice within the DSA framework. Not only is it not legally required, but it might also put undue pressure on researchers and, therefore, curb the academic freedom with which they should be able to conduct their work.

All the described platform-imposed de facto rules excessively limit researchers’ ability to conduct DSA research based on meaningful access to information. They should be removed from platforms’ application forms as soon as possible. At the same time, public authorities (and especially the Commission) should take into account and render unacceptable the existence of these platform practices when developing centralized rules on researcher access to information. In the next and final section, I briefly discuss how those rules might come into existence.

How do we move forward from here?

There is, indeed, a need to recover from the platform-induced steps back mentioned before and to concretize the DSA’s promise of wide researcher access to information. In short, we need a new uniform set of rules specifically setting out what platforms may and may not require from researchers in their access to information mechanisms. Since wide access to information is within the spirit of the framework instituted by article 40 DSA, these platform requirements should be limited to what is strictly necessary to confirm that, on the one hand, the envisaged research is connected to the study of the DSA’s systemic risk management framework and, on the other hand, that personal data of users is protected and safely managed. It is imperative to have an information access framework that does not excessively obstruct nor compromise the freedom of independent research on the emergence, assessment and mitigation of DSA-relevant systemic risks.

How can this be done? The first, best, and expected option is laid down in art. 40(13) DSA: the Commission shall adopt a delegated act laying down the conditions under which VLOPs and VLOSEs are to share data with researchers according to article 40. To be clear, the adoption of this delegated act is not optional; the Commission shall do it, and is expected to do so by this fall. A second option to be considered if a delegated act takes longer to pass is the drawing up of a voluntary code of conduct (an option per art. 45 DSA) whereby platforms commit to a uniform set of practices when it comes to shaping their mechanisms of researcher access to information. This could be an interim and/or complementary solution to a future delegated act.

At the same time, the processes of researcher vetting by Digital Service Coordinators must move forward steadily (with some DSCs very recently or not yet instituted, this might prove to be a hard target in the short term); and it must be made clear to platforms that vetted researchers should not be subject to a de facto double control through excessively burdensome application forms.

All in all, we need new centralized rules that do not compromise or limit DSA systemic risk research as the current practices of some platforms do. And even if the cavalry is coming (with a new delegated act and the increasing designation of vetted researchers on the way), this is already too late to some extent. With the European Parliament elections set to take place in early June, researchers might have already missed important opportunities to conduct timely and valuable research on systemic risks emerging in this particular electoral context (the importance of which the Commission recognized here, in paras. 29-32). How much longer are we willing to wait until we resume walking forward and concretize the DSA’s promise?


Mateus Correia de Carvalho
Mateus Correia de Carvalho is a Ph.D. researcher at the European University Institute, in Florence. His research focuses on the EU’s risk regulation of artificial intelligence and digital platforms, and its relation to the traditional structures of EU governance and the European system of fundamenta...