By Abagail Lawson
This article was originally published as part of the series Raisina Edit 2021
In a year that has seen fake cures for COVID-19 proliferate online, disinformation about vaccines spread across social media platforms, and violent incidents fueled by disinformation and hate speech, it is clear that better moderation of what is said and spread online is needed. Governments are increasingly looking for ways to require social media and other online platforms to reduce harmful content, and several countries have announced new rules or have legislative proposals on the table including India, the UK, the European Union, and the US. More and more of these rules include provisions requiring platforms to use AI-based automated tools to proactively identify and remove illegal content from their sites. However, over-reliance on such tools — which are often ill-equipped to distinguish nuance and context in human speech — to solve the content moderation problem is not only insufficient but also raises concerns about preserving fundamental rights of freedom of expression and access to information.
The Promise and Pitfalls of Automated Moderation
Broadly speaking, the automated part of content moderation consists of machine learning or AI-based tools that identify elements of content and filter it according to algorithmic rules. This includes technologies such as image recognition software; digital hashing (converting data into a unique digital signature, very commonly used to identify child sexual abuse material [CSAM] and copyright violations); metadata filters; and natural language processing (NLP) tools, often used to analyse text for hate speech and extremist content. AI systems can be taught to identify bots used by malicious actors to amplify disinformation, and to evaluate the credibility of content sources. In a platform’s content moderation system, posted content can sometimes be removed or blocked through a fully automated process; in other instances, tools are used to flag content for review by a human moderator. Platforms have long been touting automation as an efficient solution in the face of increasing pressure to “do more” about harmful content, as in 2018 when Facebook CEO Mark Zuckerberg told the US Congress that building AI tools was going to be the way to moderate at scale.
AI systems can be taught to identify bots used by malicious actors to amplify disinformation, and to evaluate the credibility of content sources.
Two years after Zuckerberg’s statement, the efficacy of automated content moderation was tested in real time. When the COVID-19 pandemic hit in March 2020, Facebook, YouTube and Twitter sent their human moderators home to lock down and turned to automated tools to fill the gap. Immediately, it was clear that the AI moderators were less effective than the humans, and many of the big tech firms subsequently made statements about the decrease in efficacy of the automated systems.
A central problem is that automated tools are more “blunt instruments” than fine scalpels, and result in over-removal of permissible content (so-called “false positives”) and under-removal (“false negatives”), where they miss content that should be taken down. Machine learning systems are not good at understanding context or that meaning can change depending on who is posting (e.g., when is something terrorist propaganda or evidence of war crimes?). This is especially difficult as language changes over time, and can negatively impact minority groups.
Communities with particular dialects or rarer languages are often over-censored by tools that were not taught to understand them, as most datasets used to train NLP tools are in English.
Existing social biases can also be reinforced in AI tools through the datasets from which they learn, such as when a word embedding tool learned traditional gender stereotypes from reading Google News or via the biases of human annotators labeling the data. Communities with particular dialects or rarer languages are often over-censored by tools that were not taught to understand them, as most datasets used to train NLP tools are in English. Even with advancements in their complexity (there is now image recognition software that can distinguish the make and model of weapons in an image, for example), AI tools are not neutral arbiters that can replace human moderators and absolve decision-makers of responsibility in making difficult choices about balancing values of free expression, access to information and creating a safe online experience.
Automated Censorship
When using such blunt methods to moderate a vast amount of content, there is another concern about preserving censorship as the default. Platforms, faced with the prospect of steep fines for not removing illegal or harmful content, may use overly broad automated tools, sweeping up innocent speech in the process. Smaller platforms without the resources to hire thousands of human moderators may accept some level of censorship as a necessary casualty of the ability to moderate at an affordable rate and scale with automated tools.
Platforms, faced with the prospect of steep fines for not removing illegal or harmful content, may use overly broad automated tools, sweeping up innocent speech in the process.
More dangerous to democratic societies, however, is the threat of “function creep:” That is, the use of tools that have been built and authorised for a specific purpose in contexts beyond their original (or authorised) use. Function creep can violate the public trust or, worse, slow-walk a society into a broad censorship regime where proactive, automated, mass monitoring of online content becomes the norm and an infrastructure of content blocking technology is available to governments or companies to use without accountability or legal authority. Some incidents have already raised this concern, for example, when the Delhi police requested use of image filtering software designed to identify CSAM for unrelated cases. On a broader scale, a 2020 study showed that where laws are passed requiring use of filters for harmful content like child pornography or illegal gambling, increased content blocking is seen more broadly (including in supposed progressive democracies). These actions are often opaque and without justification, leaving the public unaware of how their governments and even Internet service providers are using these tools to shape access to information.
In order to safeguard against function creep, governments and platforms must be transparent about how and where automated tools are used. This can provide accountability and help users feel confident in the decisions affecting their online experience. Any policy suggesting or requiring platforms use automated tools should also require reports on their accuracy and effectiveness, including the number and type of automated decisions that were appealed and overturned. Transparency can also provide better data for researchers, improving the knowledge base for policymakers.
Solving a Human Problem with Technology
At the core, fake news and disinformation are not technology problems, they are social problems. Fake medical cures, propaganda, and violent mobs whipped up by hate mongers and conspiracy theorists have been around since well before the Internet. Expecting to eliminate toxic or illegal content from the Internet when toxic speech and illegal activity exist in the world is unrealistic. Striving for perfect enforcement online can raise the danger of going down a road towards mass surveillance and censorship, enabled by advanced filtering and image and speech detection tools.
Expecting to eliminate toxic or illegal content from the Internet when toxic speech and illegal activity exist in the world is unrealistic.
The harms of toxic online content do need to be addressed, however, and automated tools are certainly necessary elements to do so when balanced with human moderation and protections against abuse. Any content moderation policy or regulation should ensure human oversight of automated decisions, reviews and transparency reports on performance and use of automated tools, and a human-based due process mechanism for users to seek redress. Governments and platforms should approach this issue in a holistic way, understanding online content as a reflection of social, political and economic dynamics in a society, and that any solution to lessening toxic and illegal content online will necessarily involve both technology and people-centered action.