Home

Donate
Analysis

Lost in Translation: How Content Moderation Fails Tamil Speakers Online

Prithvi Iyer / May 19, 2025

Technology platforms moderate content in many languages. In general, platforms under resource moderation outside of English and other major languages. Tech Policy Press has previously covered the difficulty of content moderation for low-resource languages, which disproportionately affects users in the Global Majority. For instance, the Center for Democracy and Technology (CDT) has documented this issue in the context of Maghrebi Arabic and Kiswahili online speech, finding that inaccurate moderation in these languages led to loss of trust in platforms and suppression of free speech in the region.

Building on this body of work, CDT’s Aliya Bhatia and Mona Eswah examine the case of Tamil, a language spoken by over 80 million people, constituting 1% of the global population. Despite its linguistic prevalence in South Asia and diaspora communities worldwide, Tamil is a “low-resource language,” which means there is insufficient high-quality data available to develop robust automated systems for content moderation.

The language’s deep politicization and history of erasure compound this technical limitation. In Sri Lanka, decades of ethnic tension and civil war severely restricted Tamil expression. Events like the burning of the Jaffna Library, which destroyed nearly 100,000 Tamil texts, meant that “Sri Lankan Tamil has fewer resources than other Tamil dialects.” In India, Tamil speakers have long resisted national language policies that favor Hindi and English, viewing them as threats to their linguistic and cultural identity. Given these challenges, this report examines the impact of Tamil’s status as a low-resource language and how that shapes Tamil speakers’ experience on online platforms, especially in the context of democratic backsliding in India and Sri Lanka, two countries where the language is most widely spoken.

The authors conducted an online survey with 147 frequent social media users from India and Sri Lanka, coupled with 17 qualitative interviews with trust and safety representatives, content moderators, and digital rights groups. Based on this mixed-methods approach, the report provides a few key findings:

Circumventing platform moderation

Tamil speakers surveyed for this study clarified that Tamil on social media is quite different from the traditional script, most notably by combining Tamil words with English or by “ transliterating the language using Latin characters” to make it easier to post online. Users also reported using strategies like “algospeak,” which refers to using codewords to avoid moderation. Similarly, some respondents blurred their posts to hide politically sensitive symbols to circumvent multimedia moderation.

Global vs local moderation

Interviews with trust and safety workers indicate that most Western companies employ a global content moderation strategy that is language agnostic, meaning that the policy is uniform irrespective of the language spoken in a particular market. One interviewee called this the “coverage model,” wherein the problem of resource constraints for content moderation in different languages was only addressed in times of crisis. In this “global” approach, content is often machine-translated into English, and “reviewers are not told what the original language was.” In other cases, moderators are told to translate the post into English if they do not speak the language. This is particularly concerning because machine translations can make significant errors, especially for low-resource languages like Tamil. This “global” approach is increasingly involving automated tools to help with moderation. Still, for languages like Tamil, the lack of training data, the cost of procuring these tools, and the systematic underinvestment in this field have stymied progress.

In contrast, some Indian companies reported a localized approach to content moderation where “platforms not only alter policies to better suit Tamil contexts, but also give moderators extra agency to communicate with users through blog posts, provide guidance, and share feedback with the company policy teams.” Interviewees also reported that the efficacy of localized content moderation is heavily determined by the extent ot which users with linguistic expertise proactively flag violations to the platform. The biggest challenge with this approach is cost. Hiring more moderators with linguistic expertise and/or training AI models to flag content violations despite the paucity of training data reliably is resource-intensive and expensive. Nonetheless, this approach is better equipped to deal with harmful content, which is crucial given that “sexist, homophobic, and caste-based harassment and slurs are rampant on Tamil-speaking online forums.”

Perceiving moderation as censorship

A majority of respondents believed that their online speech was silenced via overbroad moderation policies that stifled their political views. Some suspected cases of “shadowbanning”, particularly when they used politically contested symbols or words. One possible explanation for this, according to the report, is that online platforms often rely on government input when crafting moderation policies but do not disclose this to users. India’s IT Act and Sri Lanka’s Online Safety Act require platforms to take down harmful content, but the definition of “harmful” is vague and often politically motivated, leading platforms to make decisions that may inadvertently silence online speech.

So what’s next?

While this report examines the case study of Tamil and how platforms often cave under government pressure regarding the moderation of political content online, this issue is especially salient in the context of the ongoing border conflict between India and Pakistan. While both armies have been fighting each other militarily, an online disinformation war is also rampant. So, what can platforms do about this, and what role does research like this play in making content moderation policies more equitable? In an interview with Tech Policy Press, Aliya Bhatia, the report’s co-author, focused on empowering content policy teams within companies through their research. “Our report has been really helpful for researchers within major tech companies to make the case for language equity. Our theory of change is very much helping the internal multilingual champions or regional champions within tech companies,” she said.

The report’s findings also shed light on the resource constraints for companies when it comes to moderating low-resource languages in the Global Majority. But according to Aliya Bhatia, companies do not need to do all the work and must look towards local organizations that are creating datasets, lexicons, and other resources to fill the gap. “I think the opportunity here is to have processes within companies to actually just vet and implement these initiatives. So, you know, take away the burden or the argument that we don't have the resources to build it,” she told Tech Policy Press.

Technology companies must prioritize collaborations rather than promote a universalist approach to content moderation that may be insensitive to local realities. Lastly, the report cautions against the broad use of automated technology for content moderation, especially for languages like Tamil. As Bhatia noted, “Before scaling the use of these tools that are fundamentally going to rupture and pose barriers to people's free speech, exercise caution and consult with subject matter experts so that they can be rigorously and robustly tested.”

Authors

Prithvi Iyer
Prithvi Iyer is Program Manager at Tech Policy Press. He completed a Master's of Global Affairs from the University of Notre Dame, where he also served as Assistant Director of the Peacetech and Polarization Lab. Prior to his graduate studies, he worked as a research assistant for the Observer Resea...

Related

Social Media Platforms are Silencing Social MovementsMay 14, 2021
Perspective
Why Africa Is Sounding the Alarm on Platforms' Shift in Content ModerationMay 13, 2025

Topics