Dataset of 183 million Parler posts comes with warning: "content might be toxic, racist and hateful and can be overall disturbing"

Justin Hendrix / Jan 30, 2021

While it has been widely reported that the entire public contents of the social media site Parler were recently preserved by a self-described hacker before the platform was removed from Amazon's servers earlier this month, researchers were already gathering massive quantities of publicly available information on it and its users thanks to the site's design.

In An Early Look at the Parler Online Social Network, published January 19th, researchers from NYU, SMAT, Binghamton University, University College London, Boston University, and the Max Planck Institute for Informatics analyze a dataset of 183M Parler posts made by 4 million users between August 2018 and January 2021, as well as metadata from 13.25 million user profiles from August 2018 through January 2021. The authors "warn the readers that we post and analyze the dataset unfiltered; as such, some of the content might be toxic, racist, and hateful, and can overall be disturbing."

The authors detail a range of features on the site, from its data structure to its badge system, follower dynamics, growth drivers and key hashtags.

Parler's growth was driven by events. Its first key moment came in December 2018, when conservative activist Candace Owens tweeted about it. Another jump in user figures came in June 2019, when a large number of accounts from Saudi Arabia joined. In June 2020, conservative commentator Dan Bongino purchased an ownership stake in the platform and it received endorsement from Brad Parscale, at the time the manager for Donald Trump's 2020 re-election campaign. The election and dissatisfaction with mainstream social media platforms drove further growth a major spike following Donald Trump's defeat through to the days after the seige on the US Capitol.

Analysis of the content of the site indicates most users support Donald Trump and many engaged in conspiracy theories. Top hashtags are a mix of MAGA and Trump messages with QAnon and "Stop the Steal" phrases.

Top 20 hashtags on Parler
Top 20 words and phrases on Parler

A new paper from the Stanford Internet Observatory, Contours and Controversies of Parler, published January 28th, presents findings from an analysis of the dataset, which it considered alongside a collection of its own. Reinforcing some of the above findings, the Stanford team also notes a significant amount of chicanery on the site, ranging from the automation of content posting using simple RSS feeds to networks of fake accounts promoting "Trump coin scams and OnlyFans profiles."

The Stanford team also detail the site's challenges with content moderation, which led to its removal from Amazons servers following the events on January 6th.

Excerpts of violent Parler posts from Amazons court filing
Top Parler users by follower count as of January 9th, 2020, the day Amazon removed the site from its servers

The Stanford team digs in to the curious arrival of a large number of users from Saudi Arabia, behavior that mirrors similar growth in Saudi accounts on Twitter that resulted in that platform removing nearly 90,000 accounts. Brazilian, Chinese and Japanese "users" also made forays onto the site, but in limited numbers: "It seems that Parler may have been more popular on Brazilian Twitter than on Parler itself," the authors note.

The researchers find that many of Parler's top users relied on automated posting rather than actually engaging on the site, Indeed, "while some of the most prominent right-leaning media figures and outlets created accounts on the site, they largely did not cultivate their Parler audience separately from other social media audiences. Instead they relied on integrations to automate their posts ." The team also found Parler rotten with "spam, financial fraud and porn accounts" that proliferated across the platform, taking advantage of lax moderation practices and system design.

Following the events of January 6th, 2021 at the US Capitol, the Parler data set is not only a unique snapshot of a social media platform, it is also a crime scene. Researchers- and perhaps law enforcement- will study its first incarnation for years to come.

Note: one of the authors of the Stanford study, Renée DiResta, is on the Tech Policy Press advisory board.


Justin Hendrix
Justin Hendrix is CEO and Editor of Tech Policy Press, a new nonprofit media venture concerned with the intersection of technology and democracy. Previously, he was Executive Director of NYC Media Lab. He spent over a decade at The Economist in roles including Vice President, Business Development & ...