What a New Study Reveals About Content Moderation in Tigray
Prithvi Iyer / Apr 21, 2025
London—October 19, 2021: Protesters hold signs and flags at the "Tigray Genocide Protest" outside 10 Downing Street. Credit: Loredana Sangiuliano/Shutterstock
The civil war in Tigray, the northernmost regional state in Ethiopia, that raged from 2020 to 2022 has been deemed the deadliest conflict of the 21st century to date. Violent rhetoric flooded social media platforms, and content moderation policies failed to curb hate speech and its offline impacts effectively. While activists and victims of genocide alike have tried to sue tech companies for their role in spreading violence, little has changed in terms of platform policies or design. Civil society advocates have repeatedly urged major social media companies to enhance their content moderation efforts by empowering local communities with the necessary linguistic expertise to assess nuanced content.
But that alone is not enough. A new paper from researchers at the Distributed AI Research Institute (DAIR) and multiple universities demonstrates that effective moderation also requires in-depth cultural knowledge and familiarity with dialects—forms of expertise that platforms often overlook. The study found significant disagreement even among experts when labeling harmful content pertaining to the Tigray conflict, with 71% of posts sparking debate. After structured deliberation, the number dropped to 40%, highlighting the value of collaborative judgment in complex moderation decisions. Despite alerting platforms to moderation failures during the Tigray war in 2020, the situation remains dire. In a public statement, the authors warn that “we are seeing an acceleration of the same type of warmongering on social media platforms that we documented at the beginning of the catastrophic Tigray war in 2020.”
The researchers note that previous work on this topic has mostly focused on the Western context and thus cannot be applied to cases occurring in the Global Majority. Even in cases where the impact of social media on ethnic conflict or genocide has been studied (e.g., the case of Rohingyas in Myanmar), studies often focused on the pitfalls of automated content moderation and the need for greater linguistic diversity. The paper goes beyond these well-established recommendations and provides a “granular analysis of the expertise and procedures needed to moderate genocidal content.”
Methodology
Participants for this study included a mix of journalists, civil society advocates, and former content moderators. Seven participants spent 4 months annotating hateful posts about the Tigray war. Like with commercial content moderation, the researchers developed a policy codebook based on X’s policies to deal with “Hateful Conduct, Violent Speech, Abuse and Harassment, Violent and Hateful Entities, and Glorification of Violence.” The researchers then utilized the now-defunct Twitter Academic API to collect all posts related to the Tigray war through keyword searches, resulting in a dataset of 5.5 million posts. Participants were asked to review a subset of 55 random tweets from this dataset each week. This was done to avoid mental distress for participants and ensure they do not spend more than one hour doing this work. After annotating the posts, disagreements over labelling were addressed via deliberation sessions.
The researchers also interviewed former content moderators to determine how the findings from their social media annotation study relate to the policies of major social media platforms. Through interviews with 15 former content moderators, the researchers were able to unpack what platforms look for when hiring moderators and to what extent moderators are allowed to share disagreements with companies.
Findings
The findings from the social media annotation study revealed that superficial language expertise is not enough to moderate content effectively. The researchers found that familiarity with local dialects is crucial because the meaning and cultural connotation of words can often be different for the same language depending on local dialects. Similarly, a lack of nuanced cultural expertise was found to be a barrier against effective content moderation. At the same time, cultural context can also lead to disagreements about how to categorize posts. For instance, participants struggled to arrive at a consensus on whether a post was dehumanizing or not.
Domain expertise is also critical. The study found that journalists were better equipped to flag misinformation compared to activists. At the same time, advocates working on the ground seemed better suited to identify slurs against specific communities, which may go unnoticed if the moderator is unfamiliar with the specific cultural context.
The most revealing findings from this study pertain to how moderators resolved disagreements on specific moderation decisions. The study revealed that even among well-qualified experts, initial agreement on how to categorize content was low—annotators disagreed on labels in 71% of cases. However, after structured deliberation meetings, disagreement rates dropped significantly to 40%, highlighting the value of collaborative dialogue in reaching consensus. The researchers compared this with majority voting, a tactic often used by commercial content moderators to arrive at consensus, and found that “51% of the labels reached through majority voting were different from those created through deliberation meetings.” In some cases, one annotator convinced the others to agree with them. Oftentimes, majority voting can cement hierarchies and merely reflect the views of the status quo.
The annotation study also revealed the limitations of platform policies. Participants often disagreed with the codebook derived from X’s content moderation policy. In some cases, participants felt that new words emerged to dehumanize certain groups but were not reflected in the codebook. The short training time (3 weeks) to learn and apply the codebook, coupled with the burden of dealing with overwhelming amounts of toxic content, often leads to hate speech going unchecked.
The researchers also interviewed 15 former content moderators to examine how social media platforms choose moderators, the expertise they prioritize, and how platforms resolve disagreements between moderators. All interviewees worked for a large social media platform through an outsourcing company (pseudonymized as OutModeration). The real names were hidden to ensure anonymity and protect participants from potential backlash. Based on a thematic analysis of interviews, the researchers arrived at three key takeaways.
- Language skills: Participants unanimously agreed that language proficiency in regional contexts is key to being hired as a moderator. However, because of staffing and capacity constraints, moderators often worked with languages or dialects they did not know.
- Superficial cultural awareness: Interviewees noted that OutModeration gauged cultural awareness by testing moderators on knowledge about key public figures and socio-political events. This does not reveal deep cultural expertise, and as the researchers note, most of this information “can be easily searched and found on the Internet.”
- Understanding platform policy: Moderators are expected to know the content moderation policy document “by heart and seek clarification and guidance in cases of ambiguity.” This can be challenging because moderators often complain of not having enough time to comprehend the complexities of these policies and must apply them to labelling content under enormous time constraints.
Dispute resolution in content moderation
This research also examined how social media platforms address disagreements related to moderation decisions and the extent to which they contribute to reducing toxic content online. While this study showed that moderators prefer having spaces to deliberate and discuss disagreements via group sessions, the reality is that, in practice, moderators “do not have the agency to voice their disagreements with how posts are labeled and their views are often devalued or completely disregarded.” OutModeration relied on majority voting to reach consensus, which is ineffective, especially in the context of ethnic violence or genocide, because the “larger ethnic or gender group consistently wins regardless of whether or not their judgments are correct.”
Moderators often lack the agency to share their opinions because they are positioned at the bottom of the hierarchy. In most cases, the platform representatives have the final say on content moderation decisions. Relatedly, moderators shared frustrations regarding their inability to shape and inform platform policies. In the interviews, moderators often expressed disagreement with the policy language and shared their concerns with superiors, but to little avail.
Conclusion
By demonstrating how platforms undervalue deep cultural and dialectical knowledge while overemphasizing superficial awareness, the study highlights a fundamental misalignment between current platform practices and effective content moderation. The findings present a compelling case for prioritizing collaborative deliberation over hierarchical decision-making to enhance content moderation among the Global Majority.
As social media continues to influence and, in many cases, exacerbate political conflict and ethnic violence, this research highlights that effective content moderation necessitates not only technical solutions but also a fundamental reorientation toward valuing contextual expertise and establishing supportive institutional structures for moderators working in crisis regions. Platforms need to be proactive in responding to these concerns, something they have failed to do previously. As DAIR’s public statement on the research paper states: “It's not enough to perform a postmortem analysis after millions have been killed, maimed, or displaced, and merely promise to do better without delivering on that promise.”
Authors
