Now Anyone Can Deploy Google’s Troll-Fighting AI

Google subsidiary Jigsaw is now offering developers access to an API for its AI-based detector for abusive comments.
Image may contain Sphere
Merjin Hos

Last September, a Google offshoot called Jigsaw declared war on trolls, launching a project to defeat online harassment using machine learning. Now, the team is opening up that troll-fighting system to the world.

On Thursday, Jigsaw and its partners on Google's Counter Abuse Technology Team released a new piece of code called Perspective, an API that gives any developer access to the anti-harassment tools that Jigsaw has worked on for over a year. Part of the team's broader Conversation AI initiative, Perspective uses machine learning to automatically detect insults, harassment, and abusive speech online. Enter a sentence into its interface, and Jigsaw says its AI can immediately spit out an assessment of the phrase's "toxicity" more accurately than any keyword blacklist, and faster than any human moderator.

The Perspective release brings Conversation AI a step closer to its goal of helping to foster troll-free discussion online, and filtering out the abusive comments that silence vulnerable voices---or, as the project's critics have less generously put it, to sanitize public discussions based on algorithmic decisions.

An Internet Antitoxin

Conversation AI has always been an open source project. But by opening up that system further with an API, Jigsaw and Google can offer developers the ability to tap into that machine-learning-trained speech toxicity detector running on Google's servers, whether for identifying harassment and abuse on social media or more efficiently filtering invective from the comments on a news website.

"We hope this is a moment where Conversation AI goes from being 'this is interesting' to a place where everyone can start engaging and leveraging these models to improve discussion," says Conversation AI product manager CJ Adams. For anyone trying to rein in the comments on a news site or social media, Adams says, “the options have been upvotes, downvotes, turning off comments altogether or manually moderating. This gives them a new option: Take a bunch of collective intelligence---that will keep getting better over time---about what toxic comments people have said would make them leave, and use that information to help your community’s discussions.”

On a demonstration website launched today, Conversation AI will now let anyone type a phrase into Perspective's interface to instantaneously see how it rates on the "toxicity" scale. Google and Jigsaw developed that measurement tool by taking millions of comments from Wikipedia editorial discussions, the New York Times and other unnamed partners---five times as much data, Jigsaw says, as when it debuted Conversation AI in September---and then showing every one of those comments to panels of ten people Jigsaw recruited online to state whether they found the comment "toxic."

The resulting judgements gave Jigsaw and Google a massive set of training examples with which to teach their machine learning model, just as human children are largely taught by example what constitutes abusive language or harassment in the offline world. Type "you are not a nice person" into its text field, and Perspective will tell you it has an 8 percent similarity to phrases people consider "toxic." Write "you are a nasty woman," by contrast, and Perspective will rate it 92 percent toxic, and "you are a bad hombre" gets a 78 percent rating. If one of its ratings seems wrong, the interface offers an option to report a correction, too, which will eventually be used to retrain the machine learning model.

The Perspective API will allow developers to access that test with automated code, providing answers quickly enough that publishers can integrate it into their website to show toxicity ratings to commenters even as they're typing. And Jigsaw has already partnered with online communities and publishers to implement that toxicity measurement system. Wikipedia used it to perform a study of its editorial discussion pages. The New York Times is planning to use it as a first pass of all its comments, automatically flagging abusive ones for its team of human moderators. And the Guardian and the Economist are now both experimenting with the system to see how they might use it to improve their comment sections, too. "Ultimately we want the AI to surface the toxic stuff to us faster," says Denise Law, the Economist's community editor. "If we can remove that, what we’d have left is all the really nice comments. We’d create a safe space where everyone can have intelligent debates."

Censorship and Sensibility

Despite that impulse to create an increasingly necessary "safe space" for online discussions, critics of Conversation AI have argued that it could itself represent a form of censorship, enabling an automated system to delete comments that are either false positives (the insult "nasty woman," for instance, took on a positive connotation for some, after then-candidate Donald Trump used the phrase to describe Hillary Clinton) or in a gray area between freewheeling conversation and abuse. “People need to be able to talk in whatever register they talk,” feminist writer Sady Doyle, herself a victim of online harassment, told WIRED last summer when Conversation AI launched. “Imagine what the internet would be like if you couldn’t say ‘Donald Trump is a moron.’”

Jigsaw has argued that its tool isn't meant to have final say as to whether a comment is published. But short-staffed social media startup or newspaper moderators might still use it that way, says Emma Llansó, director of the Free Expression Project at the nonprofit Center for Democracy and Technology. “An automated detection system can open the door to the delete-it-all option, rather than spending the time and resources to identify false positives," she says.

But Jared Cohen, Jigsaw's founder and president, counters that the alternative for many media sites has been to censor clumsy blacklists of offensive words or to shut off comments altogether. "The default position right now is actually censorship," says Cohen. "We’re hoping publishers will look at this and say 'we now have a better way to facilitate conversations, and we want you to come back.'"

Jigsaw also suggests that the Perspective API can offer a new tool to not only moderators, but to readers. Their online demo offers a sliding scale that changes which comments about topics like climate change and the 2016 election appear for different tolerances of "toxicity," showing how readers themselves could be allowed to filter comments. And Cohen suggests that the tool is still just one step toward better online conversations; he hopes it can eventually be recreated in other languages like Russian, to counter the state-sponsored use of abusive trolling as a censorship tactic. "It’s a milestone, not a solution," says Cohen. "We're not claiming to have created a panacea for the toxicity problem."

In an era when online discussion is more partisan and polarized than ever---and the president himself lobs insults from his Twitter feed---Jigsaw argues that a software tool for pruning comments may actually help to bring a more open atmosphere of discussion back to the internet. "We’re in a situation where online conversations are becoming so toxic that we end up just talking to people we agree with," says Jigsaw's Adams. "That's made us all the more interested in creating technology to help people continue talking and continue listening to each other, even when they disagree."