Just over a month ago, Unity bought AI-driven audio analysis platform OTO to fight toxic behavior in multiplayer games. A month before that, Discord acquired Sentropy, a company behind AI-powered software to detect and remove online harassment. We sat down with Utopia AnalyticsSharon Fisher, a veteran in the games moderation space, to discuss the challenges the industry is facing in its quest to make the gaming environment safer.

Sharon Fisher

Sharon, please give us an overview of yourself and your path in the community and content moderation space?

I’ve been working in the online content moderation space for more than a decade. I started out at Disney Interactive, responsible for building up the big Spanish-speaking online Disney community. While I was there I was in charge of implementing Spanish language chat filters that helped create a safer online environment for kids, which we rolled out across all Spanish speaking countries.

After more than five years with Disney, I joined content moderation specialists Two Hat where I helped to grow its customer list, working with many of the big game publishers and platforms.

I had left Two Hat and was running my own consultancy when I was offered the opportunity to get involved with Utopia Analytics to help them develop their presence in gaming — which was a challenge too good to turn down, so here I am!

Before we go on, let’s get our terminology straight. Is there a consenus on what constitutes online toxicity?

Toxicity is such a broad word, and it is defined by every gaming company, social platform, and individual moderator in different ways. Something that a kids’ social platform would consider toxic is very different from what a teens game would call toxic.

If we define toxicity as the act of disturbing social interactions to the point that the other user would consider not continuing the conversation or even leaving the platform, then the core challenge remains the same no matter the platform. The most important thing is the impact on users, and how to prevent it in the first place.

Gamers are passionate about the games they play and it’s a part of their everyday life. For many, it’s not just an online interaction, but an important part of their social life, or a way to relax and unplug at the end of a long day. If toxicity begins to creep in, then their favourite hangout space is suddenly disrupted, so it’s essential these communities are protected.

And what happens to users reported for toxic behaviour? Are there any warning systems or corrective protocols that can be deployed to keep them in the community but change the way they communicate?

That’s a great question! Ultimately, this call is for the owners of the community as they define their own rules. Some players will get warnings, some may be banned outright — it really depends. It’s fair to say that more consistency would be a good thing, so players know what to expect regardless of the game or platform.

There is also the question of whether we should continue to “punish” bad behaviour or should we also start to focus on rewarding and incentivizing players that are doing good in a community.

Image Credit: Aniwalls

How bad is it, anyway? Any stats you could provide to give us an idea of the scope of the toxicity and online harassment problem in the gaming world?

The latest annual report from the Anti Defamation League is a good but worrying snapshot. Firstly, the amount of harassment reported by adult gamers has increased for the third year in a row. The ADL also found that 71% of that abuse was considered “severe,” including physical threats, stalking and long term harassment. Not to mention, these numbers are based on reported content, so the true number is likely higher since we know not all players report these kinds of instances.

We’ve also just completed Utopia’s own survey of gamers in the US and the findings are very similar to the ADL report. The bottom line is that toxic behaviours from a minority have a disproportionate effect on the entire community, and current moderation strategies — where gamers are expected to flag players who are being offensive, or worse — simply don’t work.

Why don’t they work exactly?

Nowadays, there are so many different platforms, games, apps and other services that all offer consumers different ways to communicate — and all of that content needs to be assessed as parts of an overall moderation problem.

Many online services use very basic word filters on their platforms. In the past, this caught the majority of bad content, but all the user needed to do to bypass these filters and confuse the system was by misspelling words or breaking up a word using a space. With the introduction of mobile chat apps, emojis, other languages, upside-down text, unicode and voice to text, these challenges have been multiplied tenfold. It’s pretty clear that tools that were built years ago are not prepared to deal with modern online communities.

The backlash we’ve seen against social media platforms has exposed how the old-fashioned approach to moderation needs to be re-thought. Many platforms use third party moderators that take a one-size-fits-all approach to moderation.

In terms of the practical challenges, scale is the obvious one. Minecraft has 600 million player accounts and Fortnite has 350 million registered players. Years ago, you didn’t have this volume of data and chat logs to contend with,  so it can quickly become unmanageable. The pandemic has only acted as a catalyst, player numbers for some games rose by as much as a third, drastically shifting the moderation challenge overnight.


So Minecraft and Fortnite pose moderation challenges because of their sheer scale. But are there any specific gaming communities that are just prone to toxicity? Maybe any specific game genres or even titles whose communities tend to be unsafe?

There are some major titles in the FPS, RTS or MMORPG genres that have been more prone to it in the past. But overall, it’s such a widespread issue across the industry so it’s unfair to pick out specific communities.

That being said, toxicity is becoming a significant problem in esports, though it’s difficult to pinpoint exactly why. Some research suggests it’s the high-pressure environments of competitive play or simply the advantage gained from distracting opposing players. Whatever the cause, there has been a lot of investment by brands into esports who won’t want to be associated with negative media stories, so I’m sure we’ll continue to see platforms take steps to mitigate toxicity in future.

Steps like what? What are the solutions and approaches available to moderation professionals?

The way moderation has traditionally been done is to use human moderation teams, to either build tools internally or moderate content in real-time. Besides being outdated, this approach is prone to bias, less effective at scale and very costly – and not only in a monetary sense, the negative effects exposure to this kind of content can have on human moderators has to be taken into consideration. Facebook has 15,000 full-time moderators, and Mark Zuckerberg admitted they could be making the wrong decision 10% of the time. Across billions of users, that’s a lot of errors.

Because of the limits of human moderation we’ve seen a second wave of software that filters content using rules and lists of banned words. Some of these claim to be AI-based, but they are not AI in the true sense. Really, they are extensive dictionaries of words and phrases which need to be regularly updated as players find workarounds like swapping or misspelling words to get around the filters.

The current cutting-edge in moderation is context-aware advanced AI, which considers every single user as equal and is built based on each community dynamics, so the tool is bespoke to a specific community. This kind of AI learns the context within the text, meaning it can moderate intent as well as just recognising words and phrases, and can moderate in multiple languages as it doesn’t rely on dictionaries.

The next big challenge will be to moderate voice chat. Right now it’s impossible to do this in real-time, But at Utopia we are working on technology that’s not too far away.

You mentioned this context-aware AI. Can we really trust it to be able to tell genuine harassment apart from friendly banter especially across different cultural contexts? Can AI “overreact” by flagging and removing content created in good faith but containing certain questionable words?

This really depends on how the model and algorithms are built. There’s a common misconception around AI; just calling something ‘AI-powered’ doesn’t mean it’s a futuristic machine that can think like a human. AI is based on algorithms, and these can be flawed or biased in the same way that human decision-making can be.

An AI is just a tool; it will operate within the parameters you set it. So the only reason for an AI to overreact is if it’s not been correctly configured. For example, Utopia’s AI is deliberately designed to avoid the kind of bias or overreaction you mention by using AI to learn patterns in language, analysing entire sentences so it doesn’t misconstrue the semantic meaning of an individual word.

So the more advanced AI becomes, the less faith you have in human-based moderation?

Oh no! I love all of the human moderator folks. There’s a difference between ‘human-based’ and ‘human-powered’. Moderators aren’t super-humans, so we can’t expect them to make thousands of decisions on toxic, and sometimes mentally scarring, content as consistently and quickly as an automated system. We’ve already mentioned the number of moderators used by Facebook, yet it still faces constant problems with toxic content. So simply throwing more people at the problem will not solve it, not to mention exposing more people to disturbing content.

Secondly, humans come equipped with our own inherent biases. What might be offensive to one person is OK to another, even within the same organisation. In order to mitigate this, companies give human moderators huge parameters and rules to try to prevent bias, but then we remove the thing we are great at, which is understanding context.

Human moderators are fundamental to the process. The real question is — when do you involve them? Before the toxic content is present, during or after?

It is always up to human moderators, then, to make the final call?

Humans should always make the final call on the guidelines and consequences. Technology is now able to analyse content from many different dimensions, but rather than replace them, this gives the human moderators the tools to make the most informed decision possible.

The capabilities of a specific system, and how advanced the AI is, really determines how much human input is needed. For more advanced AI systems, the work of human moderators can be reduced by 99.9%. That being said, 0.1% of a billion messages is still a lot of work!

Most importantly, AI is just a tool. Humans are the master and the ethics are defined by us. There must always be a layer of human moderation for cases where a judgement call or nuanced understanding of context is needed. But from my experience it’s just not healthy, practical or feasible to moderate millions of messages with an army of people — these processes have to be supported by technology.

Speaking of nuanced judgement. What about user-generated content? There are some cases like the “Kaaba destruction” mod in Fortnite that are obviously insulting. But then there are users creating pro-China or pro-Taiwan content in Animal Crossing. How do you solve that moderation challenge without taking sides? Or maybe you have to take sides?

I understand what you are saying, but those are examples where players have set out to use the tools within a game for a reason they were not intended. That’s not really a moderation issue — it’s about whether a publisher or platform holder is OK with players wanting to make political statements inside their game.

But you are right, this is one more challenge that needs to be addressed, and yet another unintended consequence of popular games that are built around the idea of community and user content creation.

And how do you know if you are doing a good job moderating a community? Is there a way to gauge the effect of moderation on a game’s peformance?

A clear effect of poor moderation on a community is player churn. Riot Games published a study that showed players experiencing toxicity were over three times more likely to churn. Similarly, a study of the hugely popular League of Legends found that first-time players who experienced toxicity were 320% more likely to churn immediately and never play again.

Player churn is also going to affect the perception of the brand, and hurt the bottom line if new players are discouraged from trying a game or existing players stop logging on. There may not be much data in public about this, but you can be sure that publishers are keeping a close eye on this internally.

League of Legends

Finally, what’s your forecast for the moderation market?

I think in the past there has been a fear of moderation affecting people’s ability to express themselves, but what we are seeing on Facebook and Twitter is the result of too little moderation, and there is a big worry that we’ll soon have the same problems with games. Moderation is becoming a ‘must have’ — especially now we have so many games built around online and connected play.

The ongoing legislative debates we’re seeing in the USA, UK, France and Brazil will also have a huge bearing on the future of moderation. If companies are prepared when regulatory changes take place, the transition will be a lot more straightforward.

Ultimately, if gaming companies don’t move away from outdated methods of moderation they risk losing players, and as an industry, nobody wants that. So I really do think that we’ll see moderation become a much higher priority as the risks of doing nothing are too great.

Sharon, thank you for the interview!

Got a story you'd like to share? Reach us at [email protected]