Evaluating the Forces Shaping the Trust & Safety Industry

Jan Eissfeldt, Sujata Mukherjee / Sep 25, 2023

Sujata Mukherjee heads the Trust & Safety research team at Google. Jan Eissfeldt serves as Director and Global Head of Trust & Safety at the Wikimedia Foundation, the non-profit hosting Wikidata, Wikipedia, and other free knowledge projects.

Trust & Safety policymaking is becoming a multi-dimensional balancing act: between sweeping “universalization” of global policies and excessive “balkanization” of and within individual countries/locales; between scalable, federated moderation processes and distributed, community-reliant systems; between business interests and societal impact. Compounding this complexity is the issue of measurement: the impact of Trust & Safety policies is often indirect, can take time to materialize and unfolds across complex, entangled systems. In this commentary, we review the forces shaping the industry’s balancing act and the challenge of building agile, resilient systems that are future-ready.

A new field emerges

Trust & Safety (T&S) is an emerging field in the tech industry that has grown rapidly in recent years. T&S is used to refer to the teams and corporate function(s) at internet companies and service providers dedicated to ensuring that users are protected from unwanted or harmful experiences. Key functions include Policy, which is the team developing the company’s rules to govern its products, platforms and services, and Enforcement or Operations, the teams that work on ensuring that said rules are followed. Larger platform providers usually have whole ecosystems of support functions for these two foundational elements, including engineering and machine learning teams that build tools and models used for enforcement; data science professionals helping with metrics and analytical research; threat intelligence and enterprise management professionals; attorneys, public policy, and communications folks dedicated to the topic area in-house; and often outsourcing providers and software vendors complementing existing capabilities required to run scaled, global operations.

This growth in T&S over the last decade is due in part to the increasing importance of online platforms in people’s lives. People rely on digital platforms for everything from communication to commerce, and expect them to be safe and trustworthy. Most internet platforms are now aware of product safety principles and social benefits as revenue enablers, not least because there is accelerating pressure from regulators and consumers to do more to protect users from harm. Most platforms realize that investing in Trust & Safety can lead to long-term benefits, such as improved user engagement and loyalty.

However, there are many headwinds facing the T&S industry. Based on our combined two decades of experience working on T&S, we seek to provide a comprehensive mapping of the issues influencing the industry at this moment in time. Our hope is that a situational summary of this type will help to ensure that proposed solutions are grounded in a solid understanding of the sociotechnical context in which this field operates. Finally, this article highlights the speed of evolution in the field, particularly the role of technology drivers. This is important because solutions that were effective in the past may not continue to be so in the future, highlighting the need for future-proof practices, processes, and infrastructure that form the foundation of the field.

Market Forces

Role in Society

Many platforms were created with the promise of the free and open web, to host and extend participation, expression, and connection across the world. But as these platforms grew, the challenges of free interaction and content creation became apparent: illegality, nonconsensual pornography, violence, abuse, and hate. Platforms found themselves needing to moderate to prevent criminal activity, and increasingly, to protect users from each other, prevent the marginalization of already vulnerable groups, and safeguard the integrity of their products and services.

The conversation about the impact of technology has expanded to include not just the impact to individuals, but also impacts on society and geopolitical relationships. Political and social movements are increasingly influenced and amplified by online conversations, and a series of world events including the US elections, UK Brexit Referendum, India Demonetization– all in 2016– made this phenomenon painfully clear. This has also caused a paradigm shift in the perception of a platform’s “responsibility,” from accurately detecting and removing problematic content to also being able to provide transparency in its case reasoning. As platforms increase their user bases and become conduits of information access and digital participation, civil society organizations have highlighted opportunities to reduce inequality, disrupt economic and political power structures and empower individuals.

One aspect of the content moderation debate regards how media bias is influenced by advertising dollars. In that context, research has shown that while monopolistic newspapers can report inaccurately or under-report on matters that affect their advertisers badly, in a highly competitive market, competing newspapers tend to increase accuracy of reportage in order to retain reader attention and win advertising dollars. Today, for-profit platforms need to balance their approach to moderation in a two-sided market as well. Ad-based platform business models seek to appear favorable to advertisers by offering greater reach, so adopt moderation as a means to appeal to the broadest user preferences. Thus, an ad-driven platform’s market incentive to moderate is highest when both users and advertisers have congruent standards for moderation.

Not-for-profit platforms, like Wikipedia, Wikidata, or the Internet Archive, serve the public interest and are not tied to commercial incentives or business models, like advertising, that have major T&S implications. Often, this leads to distinct pathways for their organizational, technological, and ecosystem approaches. For example, they rely on small cohorts of active volunteer contributors who are creating value for the public around a charitable mission. These platforms rely on donors, marshaling limited organizational resourcing and community-reliant moderation models. Active volunteer moderators rally around norms prioritizing truth and factuality over opinion and speculation, and there are efforts to control misinformation and article vandalism through investment in community management tools. Often, non-commercial public interest platforms also serve the broader industry ecosystem as major sources for both freely licensed content and raw materials helping to train commercial AI models and other services.

The evolution of content policy and more comprehensive enforcement has led some audiences to perceive this as a crackdown on free speech. In response, several emerging platforms have consciously positioned themselves as opposed to the hegemony of “Big Tech” and framed more relaxed content moderation guidelines as a form of political defiance. It is important to note that the language used by these providers is not necessarily overtly party-political or endorsing a particular ideology. Rather, effective targeting of dominant publics requires that counterpublic discourses employ similar framing to posit alternative arguments and customer audience pitches.

People may adopt platforms to seek connection, entertainment, or other services that are useful to their day-to-day lives. Most people exhibit routine use of the same core set of sites to satisfy their day-to-day online needs. People also join platforms where their friends are, so there is a critical mass that congregates on the most popular platforms. As people become more tech-aware, they become more discerning about their engagement with technology, requiring platforms to respond accordingly to retain their attention.

The Global-Local Polarity

Historically, most of the professionalization of T&S work has focused on key markets where platform providers have established infrastructure or vital commercial interests, such as the US and EU. T&S issues, however, are of global importance and are often disproportionately impactful in other markets, home to most of the world’s population (Majority World). The user bases in the Majority World asserting their own customer, societal, and sovereign expectations and preferences is a significant trend shaping the industry into this decade. As a field, T&S is becoming more cognizant of platform and societal challenges globally, and is building the necessary capabilities and capacities to address Majority World perspectives equitably.

For the field to structurally change to include the Majority World requires re-evaluating the traditionally US-centric discourse and practices built around value-guided laws. One example is FOSTA/SESTA, the US law intended to address sex trafficking. Its provisions do not scale well when implemented by T&S teams globally. The field already has experience with reasonable adjustments, including changing EU regulations on hate speech. Based on their business model and specific mission, companies can wrestle with the challenges this trend poses in different ways. We have experience making such efforts in our respective organizations:

Google, for example, has adopted a consultative approach to policy development, working with a global network of safety and subject matter experts. The Human Rights Program is a central function responsible for ensuring that all Google products are designed to meet the organization’s commitment to the UNGPs, GNI Principles, and other civil and human rights instruments. This includes executive oversight and board governance, sustained engagement with regional civil society stakeholders, transparency reporting and periodic assessment by an independent third party.

Wikimedia recently hosted a successful global community vote and subsequently ratified the enforcement guidelines for its Universal Code of Conduct (UCoC). The UCoC sets minimal shared expectations across more than 380 different languages while inviting the self-governing communities to adapt the UCoC in their own sociocultural contexts. For example, the English language Wikipedia’s Arbitration Committee, the community-elected volunteer adjudication body making final decisions on conduct issues on that language Wikipedia, relied on the UCoC in its May 2023 ruling confronting notable socio-historical and linguistic complexity in the “World War II and the history of Jews in Poland”-case.

Moderation Models

The policies adopted by platforms are enforced through moderation. The mechanisms of detecting policy violations and taking moderation actions (demotion, removal, age-gating, etc) generally utilize a combination of algorithmic and manual review actions. Content moderation operations are typically cost centers, and platforms can choose between many operating models. The model chosen by the platform often depends on the size of the moderation cost and the magnitude of the brand risk, which is a non-monotonic function (a function that shows different behavior at different points).

A demand for more moderation may not be supported if costs become sufficiently large, for example, when dealing with moderation needs that are idiosyncratic to specific countries or languages. Issues of moderation are exacerbated by an increasing number of users on the platform, and the diversity of the user base. Sociocultural norms and attitudes to content vary greatly across regions of the US, not to mention the world. While the last two factors are a question of scale versus cost, a single major incident may also trigger heavy moderation investment if it poses a large enough risk to the brand.

Certain operating models enable greater moderation than others, but models often evolve as platforms acquire greater resources. In increasing order of cost and scale, the differing models that a platform can adopt are:

  • Removing only illegal content.
  • Removing content in response to user flags and reports, including designated civil society and government-associated “trusted flaggers”.
  • Community-reliant moderation models.
  • Centralized or vendor-supported review and moderation operations.

Approaches to content moderation have already been classified in various ways. They may be federated (eg. Reddit) or centralized (eg. YouTube), community-reliant (eg. Wikipedia) or user-report driven (eg. Telegram). Some classifications have used the terms artisanal (eg. Discord), small-batch (eg. Medium) or industrial (eg. Google) moderation. Ultimately these are all descriptors of the scale and organizational structure of the moderation practice within the company that owns the platform. Which model the company chooses is both a design choice and an evolutionary one. They are dependent on the platform’s purpose, the changing financial resources available to the platform, and the technological affordances offered by its architecture and design.

Today, platforms align on a small number of content-related moderation approaches, mainly around the removal of illegal content such as Child Sexual Abuse Imagery (CSAI) and terrorism. This has led to coordination between platforms, such as sharing detection tools and signals, to remove what is collectively considered the worst content on the internet. For example, Google offers the Content Safety Toolkit, a cloud-based API to detect CSAI, which is used by Meta, Yubo, SaferNet and others.

On other content topics, though, the moderation threshold is less uniform. Motivated bad actors operate across platforms, able to exploit the gaps between different moderation approaches, using the most moderated (usually the largest) platforms to acquire an audience while funneling them to less moderated platforms where problematic content is more likely to persist. T&S efforts on these legal but problematic topics remain mostly focused on their own product surfaces.

Technological Drivers of Change

Historically, companies have built their key T&S capabilities in-house while outsourcing specific, low-complexity moderation activities to providers around the world, typically making scale versus cost trade-offs. With global scale and the need for local context and language coverage, content review operations have grown into a large business sector (~26,000 reviewers as of 2020 across Meta, YouTube and Twitter), where service providers offer services with highly controlled operating environments and oversight.

Increasing awareness of the sophistication of fraud and abuse techniques has led to the rise of vendors offering technologically-advanced and niche services that support multiple aspects of the T&S lifecycle. The improved capability offerings from mature vendors means that most platforms can now choose between building T&S capabilities in-house or buying the services needed to satisfy their policies, from identity verification services to threat intelligence. Greater outsourcing leads to additional trade-offs, as it can improve access to best practices but also deepen co-dependencies with vendors that could be hard to undo and create distinct types of insider risks compared to the traditional in-house approach.

Capabilities also have to keep pace with new paradigms of content creation. The rise of new technologies and communication media introduces new engaging and immersive modes for community members to interact, but it also introduces new norms and, consequently, new challenges for moderation. Many platforms are adopting enhanced privacy affordances (eg. end-to-end encryption, anonymous posting, etc) and ephemeral audio / video formats (eg. self-destruct / timed messages, live audio / video streaming, etc), which while beneficial to users are challenging for content moderation, allowing for bad actors and harmful content to flourish in new, unforeseen ways. AR/VR formats have introduced the need to review conduct and content as the superrealism and virtual embodiment of these formats allow violence and abuse to be manifested in novel ways. Reviewing conduct is an approach that platforms have focused on in a limited way, notably Wikipedia’s long-standing focus on the conduct of contributors in collaborative interactions with peers across its transparent platform.

With the increased availability and adoption of AI, abusive content and automated accounts can now also be created synthetically (eg. synthetic deepfake pornography or CSAI) in addition to emergent threat vectors that are still being understood. At the time of writing, the industry invests considerable resources into exploring whether and how to effectively label the output from generative AI products and services, though it is as yet unclear whether it is money well spent. While the underlying assumption of the debate has been that AI-generated content will become endemic to UGC platforms quickly due to its cheaper production economics compared to human-created material, it strikes the authors as less-than-useful to label the AI output. If the debate’s axiom holds up, labeling authentic human output would likely be more beneficial - aligning with other consumer industries where high-quality product labels are common and potentially building on the established creator and influencer market mechanisms.

The industry is also seeing substantial changes in the workplace, the talent pool, and increased adoption of automation. Early T&S teams grew organically out of the need to respond to unexpected and awful use cases discovered on products. Most early T&S functions were seeded by individuals with transferable skills from adjacent or analogous professions, such as law enforcement, government, public health, crisis management, and general technology services. With increasing conversation about platform responsibility and the evolution of T&S as a practice, there are now professional organizations, efforts to establish an academic discipline, and investment in employee wellbeing and skill development programs.

Due to the sensitivity of content as well as the need to protect confidential procedures, T&S operations have been often performed under controlled access and clean room environments. The COVID-19 pandemic stretched these concepts and provided the impetus needed to build systems that de-risked content review being performed in remote locations. As the technology evolves, there is an ongoing conversation about the use of AI to make detecting bad content faster, improve the efficiency of review processes, and cut costs, while balancing responsible use of such capabilities.


There are growing concerns from governments and civil society that online platforms should be subject to regulation and oversight due to their outsized influence on media markets, collection of personal data, and the gatekeeping of access and distribution of information. At the same time, increased coordination between platforms and government does not necessarily lead to greater user welfare. Transparency is considered one of the cornerstones of effective platform governance, and offers an opportunity to balance these conflicting concerns. Regulation, in addition to increasing geopolitical sensitivities (eg. Sino-American security competition across the Pacific), will require platforms to be attentive to issues previously not prioritized due to opportunity costs of addressing them.

Policymakers worldwide are focusing their energies on the issue of regulating platforms. While there is no agreed-upon global standard for transparency, there is convergence on some themes. Platforms are encouraged to publish regular transparency reports that provide evidence of due process and human rights diligence, policy explainability, and cultural competence. Reports typically include statistics of total moderation actions taken and reversed by the platform provider, and responsiveness to user reports and government requests for information.

In 1996, Section 230 of the Telecommunications Decency Act set the rule that internet platforms are not responsible for posts by users in the US. Nearly three decades since, it is fair to say the age of an unregulated internet is mostly over. Notably, the EU passed the Digital Services Act (DSA) which enhances platform responsibility and aims to foster good societal outcomes; India announced new IT Rules in 2021 which place due diligence and grievance redressal obligations on social media platforms; and the US Supreme Court has considered cases that have challenged the provisions of Section 230 around platform responsibility.

Measuring Progress

T&S teams are generally cost centers balancing competing objectives. On the one hand, these teams are highly mission-driven and focused on protecting users and increasing trust in the company’s products and services. On the other hand, policy guardrails and enforcement decisions may limit product growth, by limiting features or use cases, or adversely affect developers and advertisers, by making onboarding and usage processes more difficult. This makes measurement an important conversation, particularly in cost-constrained environments where explaining the tangible impact (to revenue or brand) of preventative Trust & Safety work becomes an existential imperative.

Metrics used by T&S teams typically cover volumetrics (eg. number of user reports or user flags, numerical distribution of moderation actions, number of appeals, etc), amount of badness caught proactively or reactively, and degree of automation in these actions. While such metrics are used for internal reviews and transparency reporting, the external pulse is usually measured through “trust” constructs: user trust, brand trust, stakeholder trust and more. Measuring trust is significantly challenging. Literature on trust-related studies shows that users employ complex mental heuristics when making trust choices, mediated by their lived experiences, exogenous factors, and sociocultural context. Additionally, trust measures rarely fluctuate in short time intervals; actions taken by product and policy teams take time to move the needle on user perception, and preventive actions, which are not visible to users or stakeholders, are hard to attribute.

Google utilizes a range of survey measures and longitudinal studies to measure trust over time, across the stakeholder ecosystem. These surveys study the drivers of safety perception as experienced by users across major Google products, and try to get at the difference between the users’ perception of safety and the prevalence of bad content that they encounter. This uncovers opportunities to launch or improve safety features for better user experience and recall.

The Wikimedia Foundation publishes regular safety perception surveys for larger Wikipedia language versions. These surveys aim to help communities assess how effectively they are self-governing, identify opportunities to learn from peer groups who perform better, and also offer an additional data point for the platform provider and the public as they form and evaluate views about the projects. Complementary to volunteer community activities on the wikis, the Foundation also measures stakeholder fairness perception of its adjudication across its own staff-provided platform provider workflows.

Established metrics have some shortcomings: they are sample-driven and not generalizable across geographies and user types; categories and methods of measurement across platforms are not uniform, and they still don’t get to the tangible impact to the company’s business goal, such as the bottom line or brand value in the case of for-profit organizations, or societal change in the case of not-for-profits.


We have summarized the state of the Trust & Safety industry and the delicate balancing act required to navigate the headwinds faced by T&S teams. It is clear that the industry has evolved and continues to do so in terms of skills, technology, systems, societal impact, and accountability processes. However, the authors see opportunities in the near-term that will help enhance the agility of the profession and make it more future-proof:

  • Platforms should make the necessary investments and organizational changes necessary to increase representation and geographical diversity in trusted flagger programs to include advocacy organizations in more countries, particularly where language models are weak;
  • Responsible platforms should establish more robust mechanisms and channels for cross-platform sharing of known bad actor signatures;
  • Platforms should build robust survey instruments to consistently and comparably measure trust;
  • Platforms, governments and the research community should Investigate the societal trade-offs of the different business models, from industrial to community-reliant, in more depth to help users, companies, and public policy makers navigate these often differences effectively;
  • Agree on shared practices to address concerns emerging around AI, including on labeling of human vs. AI-generated content;
  • Align on issues of informed consent, bias and inference literacy in shared datasets to train ML models utilized in the content moderation process; and
  • In partnership with policy makers, other companies, and civil society, platforms should support media literacy efforts and improve access to authoritative sources.

T&S has established itself as an essential organizational capability characterized by an entangled ecosystem of platforms, service providers, researchers, civil society, regulators, users and observers. We are hopeful this summary successfully illustrates the innate complexity of the field and invites further study and thought leadership in the pursuit of building a resilient, socially beneficial industry.


Jan Eissfeldt
Jan Eissfeldt serves as Director and Global Head of Trust & Safety at the Wikimedia Foundation, the non-profit hosting Wikidata, Wikipedia, and other free knowledge projects. He is also a working group member of the Trust & Safety Professional Association; Advisory Board member of Marketplace Risk; ...
Sujata Mukherjee
Sujata Mukherjee is a Trust & Safety leader with over a decade of experience across T&S functions. Currently, she heads the Trust & Safety Research team at Google, specializing in research on societal harms and content responsibility. Previously, she led quality and client value programs at IBM and ...