Examining the Use of the Words “Health” and “Toxicity” in Content Moderation

Prithvi Iyer / Jan 9, 2024

Adapted from Jamillah Knowles & Reset.Tech Australia / Better Images of AI / CC-BY 4.0

Words like “toxic” and “healthy” are often used to describe social media discourse and to justify content moderation policies. However, the deeper meaning and context around the use of these words is rarely scrutinized.

A new research paper by Anna Gibson, Niall Docherty, and Tarleton Gillespie argues that researchers and policymakers should be “skeptical of the comfort, translatability, and traction that terms like health and toxicity provide.” Rather than accepting these terms at face value, we ought to “question how they have come to be so prevalent, what they allow us to say and what they do not, and, crucially, in whose interests circulating these terms serve.” Much academic research into platform accountability and governance uses these to justify the policing of online spaces, often based on subjective interpretations of what the words mean.

The authors say they are not interested in providing a universal definition of online health or toxicity (a task which they agree is virtually impossible). Rather, they aim to examine how their uses play out in the discourse around social media and content moderation. They do so by analyzing interview data from an ethnographic study of volunteer moderators of Facebook groups. While moderators have the authority to ban users and remove content, Facebook provides no training or guidance on making these decisions. Through their analysis, the authors investigate how conceptions of health and toxicity serve as a rationale for governing online spaces. This is important because both platforms and moderators alike must explain why they choose to remove a specific post.

The authors argue that catch-all terms like “toxic” often serve as crutches to justify content removal. Toxicity, in particular, “works as an umbrella term for the array of anti-social behaviors that plague online communities and social media platforms.” Moreover, big technology companies have also championed using these terms to promote their content moderation practices. Labeling some online behaviors as toxic “allows platforms to demonstrate the depth of their concern, while positioning themselves as benevolently diagnosing the problem – rather than being framed as responsible for it.” Framing “toxic” content as a scientific and quantifiable construct rather than an ambiguous label allows platforms to reduce complex socio-technical problems into something that can be solved via machine learning.

A clear example, the researchers say, is Google Jigsaw’s release of machine learning classifiers to detect toxic speech. Jigsaw trained its classifier on example posts rated by human moderators tasked with labeling a post as healthy or toxic. In this case, “toxic” speech was defined as “a rude, disrespectful, or unreasonable comment that may make you leave a discussion.” While this definition of toxicity may seem reasonable, the analysis presented by the authors shows that each person interprets its meaning based on their unique values, experiences, and beliefs, undermining the seeming objectivity of such classifiers.

Key Findings

  • Semantic Flexibility
    • A key takeaway from the interviews is that moderators did not have a consistent definition of healthy or toxic content. Toxicity could refer to an interaction, a user, or the group as a whole. This “semantic flexibility” also afforded moderators “a more respectable language for decisions that are, in the end, subjective judgments of what is ultimately good or bad for a community.”

  • Justifying Intuition and Expertise
    • Moderators develop intuition and expertise regarding what types of conduct can lead to undesirable outcomes. However, many could not provide specific answers when asked how they made these decisions. For instance, one interviewee said, “I cannot explain why this happens ... there are certain posts that we just know will go bad so we automatically refuse them.” Such intuitive expertise can effectively maintain healthy online communities but is difficult to justify as impartial arbitration. The authors find that framing these decisions with terms like “healthy” vs “toxic” content helps “cloak that deployment of expertise.”

  • No metaphor is universal
    • While this research paper specifically focused on the words “healthy” and “toxic,” interviews with moderators revealed a vast array of other terms used to justify decisions in online content governance. The authors found that the popularity of metaphors was based on the individual moderator’s cultural background. For example, one moderator from Mexico felt that the term “toxic” was heavily linked to masculinity while another moderator found the term confusing as he only understood its usage to describe romantic relationships. Moderators frequently used words like civility, fairness, and safety to justify their decisions.

This research paper indicates that terms like “toxic” or “healthy,” when applied to online spaces, are not objective or quantifiable constructs. Instead, they act as metaphors, with their meaning rooted in individual moderators’ unique cultural backgrounds and experiences. The authors urge policymakers and academics to realize that while these metaphors may be useful to justify decisions that keep online communities safe, they also serve the interests of the platforms.

This is because shifting the burden of determining what is toxic onto individuals allows the platform to situate “moderation dramas in terms that do not question their position as capitalist arbiters of the theoretically collective public sphere, or their shared responsibility for the strife that has been dubbed ‘toxicity’.” These metaphors allow moderators and platforms alike to justify their interventions using the language of “diagnosis and care rather than policing, granting legitimacy to interventions that may otherwise not be as rigorous as they may seem.” Ultimately, they argue, “Talking about and acting upon toxic behavior in the moderation of social media is thus always a situated proposition, revelatory of the hopes and fears of particular historical political struggles, and justificatory of specific forms of regulatory action as appropriate to their solution.”


Prithvi Iyer
Prithvi Iyer is a Program Manager at Tech Policy Press. He completed a masters of Global Affairs from the University of Notre Dame where he also served as Assistant Director of the Peacetech and Polarization Lab. Prior to his graduate studies, he worked as a research assistant for the Observer Resea...