Wikimedia Product/Perspectives/Augmentation

Overview
The Wikimedia movement wants the sum of all knowledge to be available to everyone in the world. We also want the process to assemble that knowledge to be inclusive, balanced, and safe for all participants. But there is too much knowledge needed, in too many languages, for humans to do this alone. As an example, if we assume that a Wikipedia that covers a substantial amount of knowledge has 2 million articles (likely a low estimate), and we believe that 300 languages should have access to that knowledge, we should expect there to be 600 million articles. There are currently only 48 million articles1, which is 8% of the way there. There are simply not enough potential human editors, especially in smaller languages, to get there. Whether or not we believe that long-form articles will be the medium of the future, this illustrates the problem we face.

Augmentation for contribution activities is our path to closing these gaps. Augmentation refers to any technology that helps humans do their work, and wikis have been using augmentation almost since their beginnings: Rambot created 34,000 articles from Census data in 20022, Twinkle has been automating repetitive tasks since 20073, ClueBot has been reverting vandalism since 20114, and the Content Translation tool has employed machine translation to generate content since 2016. Over the next three to five years, human editors will need to increasingly wield augmentation tools, especially those that incorporate artificial intelligence, to create content, curate content, and maintain a safe environment on the wikis. Artificial intelligence will not replace human editors -- it will allow human editors to focus on the most impactful and fulfilling work, and, if used correctly, will open up more avenues for more contributors.5

But although artificial intelligence is a powerful editing aide, it also has the potential to powerfully magnify the problems of bias and unfairness6,7 that already exist in the wikis, and has the potential to discourage new editors8. Therefore, the role of human editors will change in the future to focus on wielding these tools safely to guard the wiki values that only humans understand.5  In pursuing any augmentation technology, we should stick to the principles we apply to code and content: transparency and the ability for anyone to contribute. We should build closed-loop systems that essentially make augmentation “editable” by community-members, even non-technical ones. By making it possible for members of all communities to audit augmentation tools, contribute training data, flag errors, and tailor tools to their wikis, we will ensure that wikis are not unduly influenced by the smaller set of people who build the tools, while also opening up a new avenue of contribution.

In terms of capabilities we need to build, the Wikimedia movement should do two main things:


 * 1) Build an infrastructure platform for many people to contribute augmentation tools, coupled with Wikidata (or something like it) to serve as a repository of facts.
 * 2) Provide interfaces that make it possible for non-technical editors to adjust and contribute to those tools.

The former would likely be pursued by the Technology Depar, while the latter would be pursued by the Audiences operation. The Audiences work will create on-wiki tools that allow non-technical editors to record training data, identify errors in existing algorithms, and tune algorithms to fit their wiki’s culture; surfacing those tasks as first-class wiki work that other editors can see. Through these interfaces, the shepherding of augmentation tools will become a new, major way of contributing that will ensure that machines are fair and healthy contributors to every wiki.

Assembling the platform and the interfaces that allow a feedback loop are the most important parts of this strategy -- more important than the particular applications of augmentation. That said, particular augmentation tools will generally fall into three aspects: content generation, content curation, and community conduct. We will need to develop design principles in each of these aspects that ensure augmentation tools are transparent and editable; and that ensure augmentation respects the boundaries between human work and machine work. These principles should also govern the ways we incorporate augmentation resources from third parties not controlled by the Wikimedia movement, such as machine translation services.

And finally, in order to be successful with this strategy, we will need to continuously recognize and embrace augmentation as a major way to contribute to the wikis. We can do this through strategic planning, conferences, training, and general discussion in the community.

Aspects

 * [DRAFT] Content Curation
 * [DRAFT] Content Generation
 * [DRAFT] Governance

Examples

 * Rambot (content generation)
 * Twinkle (content curation)
 * ClueBot (content curation)
 * SuggestBot (content generation)
 * HostBot (governance)
 * Bot approval process (governance)
 * ORES models in RecentChanges and Watchlist (content curation)
 * Content Translation tool (content generation)
 * Article Placeholder (content generation)

Areas of Impact

 * Wikidata10
 * ORES11
 * Experienced editors12
 * Volunteer developers13

Key External Factors

 * The rate of improvement to artificial intelligence, especially machine translation.14
 * Efforts by other tech companies to automatically translate English Wikipedia, or to otherwise make massive amounts of information available.15
 * The movement’s ability to get top talent to work on these issues as staff or volunteers.16

Resources and References

 * 1) Wiki segmentation 2018
 * 2) Lih, Andrew (2009). The Wikipedia Revolution: How a Bunch of Nobodies Created the World's Greatest Encyclopedia. p. 102. New York: Hyperion. ISBN 978-1-4013-0371-6.
 * 3) History of Twinkle
 * 4) History of ClueBot
 * 5) Brynjolfsson, Erik and McAfee, Andrew (2016). The Second Machine Age: Work, Progress, and Prosperity in a Time of Brilliant Technologies. New York: W. W. Norton and Company. ISBN 978-0-393-35064-7.
 * 6) This book studies the likely economic implications of artificial intelligence by looking at the effects of the Industrial Revolution.
 * 7) It makes the case that in the near and medium terms, artificial intelligence can create more jobs for humans than it replaces.
 * 8) Rather than replacing humans with machines, smart businesses will overhaul the way they work to incorporate new technologies as tools wielded by humans with increasingly sophisticated skill sets.  This is already the case with companies who have successfully adopted modern IT practices.
 * 9) Basic description of JADE by Aaron Halfaker
 * 10) Gives a short description of how models can reinforce bias in the Wikimedia setting.  Academic sources also exist.
 * 11) Buowamlini, Joy (2018). The Dangers of Supremely White Data and The Coded Gaze [Video from Wikimania 2018]. Retrieved from https://youtu.be/ZSJXKoD6mA8.
 * 12) Describes how artificial intelligence for facial recognition reflects the biases of the builders of the software, and ideas for improvement.
 * 13) Halfaker, A., Gieger, R. S., Morgan, J., & Riedl, J. (2013). The Rise and Decline of an Open Collaboration System: How Wikipedia's reaction to sudden popularity is causing its decline. American Behavioral Scientist 57(5) 664-688.
 * 14) Identifies automated curation systems as a key factor in de-personalizing the wikis and driving away new contributors.
 * 15) Idea from Aaron Halfaker, September 2018; source pending.
 * 16) Wikidata has the potential to be the abstract database of facts from which artificial intelligence could create content.  We should decide whether we want this to be the case, and if so, to put resources behind Wikidata.
 * 17) ORES and the way it is architected is the proof-of-concept for an open and auditable artificial intelligence abstraction in the wikis.  It could either continue to grow to encompass more tasks, or it could serve as a model for future systems.
 * 18) Experienced editors will need to continuously adjust their perception of what it means to do wiki work, as technology gives them increasingly powerful tools for content generation, content curation, and governance.
 * 19) Volunteer developers will have a new way to contribute to the wikis beyond just software and content.  They will be able to contribute algorithms.
 * 20) There are many varying estimates for how quickly artificial intelligence will be able to take on human tasks.  It is possible that capabilities will increase so quickly that the wikis are operating fundamentally differently within five years.  Or that may not happen for 30 years.  We should err on the side of expecting changes sooner, otherwise the wikis may be eclipsed by other, less open and fair, projects.
 * 21) As machine translation improves, major tech companies and startups may attempt to make information, such as English Wikipedia, automatically available across all languages.  The risk is that those companies would not have the same inclinations toward openness and fairness as the Wikimedia movement.  If companies become suppliers of information before Wikimedia projects do, the world may wind up with an inferior dominant source of information.
 * 22) If we see artificial intelligence as a critical path toward our movement’s goals, we should be mindful of the difficulty of getting top talent to work on it.  People who work on artificial intelligence are in high demand at the most elite high-paying companies in the world, but we will need them as volunteers and staff for Wikimedia projects.