JADE/Intro blog/Short story

Artificial intelligences (AIs) very useful, but they can also mask biases. They pose a great opportunity for Wikimedia projects (like Wikipedia) to work efficiently at scale. Yet they also have the potential to exacerbate our biases and limit the diversity of our participants. In this blog we'll discuss how we're developing AIs and working to mitigate their biases in novel ways.

The great potential of AI
At Wikimedia, we develop AIs that are trained to emulate human judgments. These AIs help us support quality control work, task routing, and other critical infrastructures for maintaining Wikipedia and other Wikimedia wikis. For example, we maintain ORES, an AI service that provides several types of machine predictions to our volunteers. They use ORES predictions to do counter-vandalism work. ORES flags edits that look (to a machine) to be sketchy, and then a real live human will review that edit to make sure it gets cleaned up if it is a problem. By highlighting edits that need review, we can reduce the overall workload of our volunteers by 10X. It literally turns a 270 hours per day job into a 27 hours per day job. That means Wikipedia could get 10X bigger and our volunteers could keep up with the workload.

By deploying AIs that make Wikipedians' work more efficient, we make it easier to grow -- to keep Wikipedia's doors open. If counter-vandalism or another type of maintenance work in Wikipedia were to become overbearing, that would threaten their ability to keep Wikipedia the open encyclopedia that anyone can edit. This is exactly what happened recently around one of the quality control activities in Wikipedia. Reviewing new articles for vandalism and notability is a huge burden in English Wikipedia. Over 1600 new article creations need to be reviewed every day. The group of volunteers who review new articles couldn't keep up, and in the face of a growing backlog, they decided to disable new article creation for new editors. We're actively working on AIs that can help filter and route new page creations -- to lessen this load and to discuss opening up new article creation to new editors again.

Without a strategy for increasing the efficiency of work, we couldn't even begin to have that conversation.

Biases and diversity
But with all of the efficiency benefits that come with AI, we must be wary of problems. Humans have biases whether we choose to or not. When we train AIs to replicate human judgement, we can hop that at best, those AIs will only be biased in the same ways as their instructors. Worse, these biases can appear in insidious ways because an AI is far more difficult to interrogate than a real live human being. Recently, the media and the research literature has been discussing ways in which bias creeps into AIs like ours. For example, Zeynep Tufecki has warned that "We are in a new age of machine intelligence, and we should all be a little scared." And when we first announced ORES to our editing community, our own Wnt urged us to "Please exercise extreme caution to avoid encoding racism or other biases into an AI scheme. [...] My feeling is that editors should keep a healthy skepticism - this was a project meant to be written, and reviewed, by people." We agree. AI-enforced biases have the potential to exacerbate issues around diversity that are already present in Wikipedia -- especially when they are used to help our volunteers efficiently reject contributions.

In order to directly address these biases, we're trying a novel strategy, but one that will be familiar to Wikipedians. We're working to open up ORES, our AI service, to be publicly audited by our volunteers. When we first deployed ORES, we noticed that many of our volunteers began to make wiki pages specifically for tracking its mistakes (Wikidata, Italian Wikipedia, etc.). These mistake reports were essential for helping us recognize and mitigate issues that showed up in our AIs predictions. Interestingly, it was difficult for any individual to see any of the problematic trends. But when we worked together to record ORES' mistakes in a central location, it was easy to see trends and address them. See a presentation by one of our research scientists on some of the biases we discovered and how we mitigated the issues.

By observing how Wikipedians gathered reports, we were able to recognize a set of pain points that made the process of auditing ORES difficult. From that, we've begun to design JADE -- the Judgement and Dialog Engine. JADE is intended to make the work of critiquing our AIs easier by providing standardized ways to agree or disagree with an AI's judgement. We're building in ways for our volunteers to discuss and refine their own examples for training, testing, and checking on ORES in ways that are just too difficult to do with wiki pages.

Open auditing and the future
We think we're on to something here. We need ORES and other AIs to help our volunteers scale up their wiki-work -- to build Wikipedia and other open knowledge projects efficiently and in a way that aligns with our values of open knowledge. JADE is intended to put more power in our volunteers' hands to help us make sure that we can take full advantage of AIs while efficiently detecting and addressing the biases and bugs that will inevitably appear. In this case, we're hoping to lead by example. While other organizations who are responsible for managing online platforms prevent audits of their algorithms to protect their intellectual property (see lawsuit by C. Sandvig et al. ), we're preemptively opening up our AIs to public audits in the hope of making them better and more human.