Survey: New Laws Mandate Access To Social Media Data, But Obstacles Remain

Mark Scott / Apr 8, 2024

Mark Scott, currently POLITICO's chief technology correspondent, was a fellow at Brown University's Information Futures Lab where he spent a year and a half researching best practices for social media data access.

Anne Fehres and Luke Conroy & AI4Media / Better Images of AI / Data is a Mirror of Us / CC-BY 4.0

Getting independent researchers access to social media data is the broccoli of digital regulation. Everyone says it’s important. But when pushed, few people other than researchers understand why.

I’ve spent the last 18 months trying to unpack why better access to the treasure trove of social media data — the type of data that makes it possible to study everything from tracking influence operations on Facebook to how TikTok videos go viral — is central to our understanding of the online information environment. That’s especially true in 2024, a year of elections in dozens of democracies where citizens still rely heavily on social media for news and information.

As a visiting fellow at Brown University’s Information Futures Lab, I interviewed more than 50 regulators, public health authorities, tech executives, civil society groups, and academics about how they accessed data in their efforts to untangle what happens on social media. This project represents one of the largest qualitative surveys of those with real-world experience on how social media data access works — and why this wide variety of actors wants to boost transparency and accountability for social media platforms.

Below is a snapshot of the results. It comes as the European Union is expected, within weeks, to outline details of its new social media data access regime, a central pillar of the bloc’s Digital Services Act (DSA), and on the heels of a transatlantic agreement between Washington and Brussels to promote greater accountability for social media platforms via outsider data access.

Other countries are now looking at what the EU has started. They want to move beyond existing voluntary commitments from social media platforms to some form of mandatory data access for outside groups. First in Europe and, most likely, soon in other liberal democracies, regulators and policymakers hope to use such independent insight to unpick what happens on social media, all while protecting people’s privacy and free speech rights.

Yet to get to that end result, we first must understand how data access currently works; what limits exist, both within and outside social media platforms; and the demands from independent researchers to better accomplish what everyone says is vital: greater transparency about how social media affects the offline world.

What follows is based on those more than 50 interviews, which were conducted anonymously to permit officials, executives, and others to speak freely about their experiences. My goal is to provide useful insights and benchmarks from which digital regulation to boost social media data access can flow.


With new social media regulatory regimes coming online across multiple liberal democracies, authorities now have new powers to demand social media companies hand over in-depth data, often to substantiate how platforms comply with this new legislation. But, based on responses from the regulators I surveyed, it remains a challenge to know what to ask for from companies that are inherently skeptical about giving government agencies access to some of their most prized corporate secrets. Several officials complained that, without an index of data from which to extract such insider information, it was still akin to looking for a needle in a haystack when making regulatory data access requests from social media firms. “It’s like whack a mole,” said one official. “You think you’ve nailed what you’re looking for, and then you realize there’s another dataset that you didn’t even know existed.” A lack of awareness of what data was available — even to those officials with industry experience — routinely hamstrung efforts to ask for social media data associated with regulatory investigations.

The second major obstacle for regulatory data access was widespread pushback from companies unwilling to provide comprehensive datasets that may implicate them in some form of wrongdoing. With many countries’ social media rules still in their infancy, platforms took an overly conservative approach to providing data to regulators. At least two agencies I interviewed had to rely on legal demands to eventually force companies to provide the requested information, and even then, it was unclear if that data was comprehensive. When faced with such corporate skepticism — often framed, legitimately, as caution over infringing users’ privacy — regulators were left in the dark about how much of the data was provided. “The stock initial answer is always ‘no,’” said another official. “From there, you start negotiating.”

Public health authorities:

This constituency — primarily made up of government-backed public health agencies and academic institutions with ties to health policymaking — is often overlooked in its need to access social media data. Yet during the recent Covid-19 pandemic, so-called infodemic management, or the ability to make sense of the online information environment related to an acute public health crisis, became a vital pillar in countries’ response. Almost all the officials and academics I surveyed, however, were unaware of ways to access social media data for their work, often relying on adhoc workarounds to make sense of the digital world’s impact on what was happening offline. One academic, for instance, turned to crowdsourcing information from a closed, doctors-only Facebook at the peak of the COVID-19 crisis, because of his inability to access social media data in more quantifiable ways. “We used what we had available,” said the public health expert. “You have to be resourceful when you don’t have many resources.”

Public health authorities’ needs for social media data also contrasted heavily with other groups. These individuals primarily wanted access to granular geographical information associated with how potential false and malign information spread online. The goal: to geolocate the appropriate public health response, based on where such content was having the most impact. There was also an urgent need for demographic information on social media users, including age, socio-economic identifiers and education levels, to pinpoint the most effective information strategies. “The ability to overlay our public health experience onto granular demographic information would be a gamechanger,” said one official. Such granular data access, however, ran contrary to platforms’ privacy policies, leaving a divide between what information public health authorities needed to do their work and what was legally available to them.

Social media platforms:

From the outside, social media companies look like well-oiled machines. Yet tech executives tasked with collating data, both for internal and potentially external use, frequently highlighted the lack of uniformity in how such information was stored and managed inside these firms. Given the meteoric rise of these tech companies, many data repositories were built on shoe-string budgets and without forward planning on how best to integrate often duplicative datasets when a social media platform suddenly grew from a digital minnow to a tech whale. A lack of centralized planning and indexing of what data was available, who could access it, and what safeguards were in place to combat abusive behaviors was endemic across those platforms surveyed. “People on the outside think that everything runs smoothly,” said one tech executive. “That just isn’t the case.”

Within companies’ trust and safety teams — primarily the individuals most empowered to promote outsider data access — there was a frustration that legal risk was placed above companies’ commitments toward accountability, responsibility and transparency. At least two respondents claimed their efforts to engage with outside researchers were thwarted by their own companies’ legal teams, which were afraid that such information sharing represented too much of a risk to justify the ongoing relationship. “The first question from legal is always: do we have to do this?,” said another tech executive. “They are looking for reasons to say no.”

Academics and civil society groups:

Arguably the largest and most diverse group surveyed, these independent academics and civil society groups represent the cornerstone for what the European Union hopes will be a tidal wave of accountability and transparency fueled by social media data access.

But among those surveyed, a division persisted between those who favored relying on so-called application programming interfaces (APIs), or technical links to data repositories provided directly from social media platforms, and those who preferred so-called scraping, or the ability to independently collect social media data without having to rely on company buy-in. Both options have pros and cons. APIs offer easily-accessible tools for data collection, but they may not provide information that is representative of what people actually see on social media. Scraping provides a higher degree of independence in how data is collected, but is associated with potential privacy concerns that have led some social media companies to threaten legal action if outside researchers pursue such tactics. “It’s not that one way is better than another,” said an academic. “Both come with downsides. We need to be honest about that.”

The survey also suggested that not all academics and civil society groups are treated equally. Those with direct ties to social media platforms — particularly prestigious American universities — received preferential access to data. That was often based on signing non-disclosure agreements with companies that could be perceived as undermining the independence of such accountability work. Individuals without such ties, particularly those outside of the US, were either cut out from any form of data access or had to rely on intermediaries like costly social media analytics firms to conduct their work. Such asymmetry, according to all those surveyed, undermined the effectiveness of data access by limiting the work to the few, not the many. “If you have direct access to the platforms and they trust you, you can really do good work,” said another academic. “But if you’re not in the inner circle, good luck.”

Where to go next?

Each group surveyed had their own demands for how data access regulation should be developed. But across the board, two requests stood out.

First, respondents pointed to the need for comprehensive technical infrastructure to make accessing social media data as simple and as cheap as possible. Currently, individual researchers mostly reside in silos, often tapping into datasets separately. These circumstances increase costs and limit information sharing. An independent one-stop-shop for social media data — akin to what Meta’s soon-to-be-shuttered CrowdTangle analytics tool currently offers for that company’s platforms — would maximize impact by providing data access, at scale, to those who need it most.

Second, the ability to analyze data from across multiple social media platforms was hampered by a lack of common methodologies and standards from which to compare different datasets. No one suggested TikTok videos and Facebook posts were identical. But a universal approach to how social media data can be accessed – including mandated definitions for how datasets were provided and on which outside researchers could rely — was seen as critical in tracking the interplay of multiple social media platforms across the online public debate.

These inside baseball discussions about social media data access will soon have real-world consequences.

Meta recently announced it would shut down CrowdTangle, its well-used data analytics tool, as it shifts outside researchers toward a new, arguably less comprehensive means to access its data it now offers via the University of Michigan, whose Social Media Archive represents a broader repository of data also collected from the likes of Reddit, X and YouTube. Meta provided a $1.3 million gift to support the archive. The EU’s mandatory data access regime is still in its infancy, with limited resources, funding and expertise hampering the development of the world’s first attempt at mandating such independent accountability. Other countries, including the US, are considering similar obligations — borne out of what Europe is implementing through its Digital Services Act.

In a year of global elections, the ongoing lack of understanding about what happens on social media — often the vehicle for disinformation campaigns, partisan attacks, and political polarization — is worsened by the limitations to outsider data access that I came across during my 18-month fellowship. The upside is that it’s not too late: Brussels is working on technical specifications on how to jumpstart such independent accountability, and scores of social media platforms are now offering some form of data access, based on their EU-based obligations.

What’s required, now, is a coordinated response between the actors that I interviewed to make the most of this opportunity — to boost awareness of what takes place on social media and its effects on global democracy.


