Wikimedia Product/Perspectives/Augmentation

Overview
The Wikimedia movement wants the sum of all knowledge to be available to everyone in the world. We also want the process to assemble that knowledge to be inclusive, balanced, and safe for all participants. But there is too much knowledge needed, in too many languages, for humans to do this alone. As an example, if we assume that a Wikipedia that covers a substantial amount of knowledge has 2 million articles (likely a low estimate), and we believe that 300 languages should have access to that knowledge, we should expect there to be 600 million articles. There are currently only 48 million articles, which is 8% of the way there. There are simply not enough potential human editors, especially in smaller languages, to get there. Whether or not we believe that long-form articles will be the medium of the future, this illustrates the problem we face.

Augmentation for contribution activities is our path to closing these gaps. Augmentation refers to any technology that helps humans do their work, and wikis have been using augmentation almost since their beginnings: Rambot created 34,000 articles from Census data in 2002, Twinkle has been automating repetitive tasks since 2007, ClueBot has been reverting vandalism since 2011, and the Content Translation tool has employed machine translation to help generate content since 2016. Over the next three to five years, human editors will need to increasingly wield augmentation tools, especially those that incorporate artificial intelligence, to create content, curate content, and maintain a safe environment on the wikis. Artificial intelligence will not replace human editors -- it will allow human editors to focus on the most impactful and fulfilling work, and, if used correctly, will open up more avenues for more contributors.

But although artificial intelligence is a powerful editing aide, it also has the potential to powerfully magnify the problems of bias and unfairness that already exist in the wikis, and has the potential to discourage new editors. Therefore, the role of human editors will change in the future to focus on wielding these tools safely to guard the wiki values that only humans understand. In pursuing any augmentation technology, we should stick to the principles we apply to code and content: transparency and the ability for anyone to contribute. We should build closed-loop systems that essentially make augmentation “editable” by community-members, even non-technical ones. By making it possible for members of all communities to audit augmentation tools, contribute training data, flag errors, and tailor tools to their wikis, we will ensure that wikis are not unduly influenced by the smaller set of people who build the tools, while also opening up a new avenue of contribution.

In terms of capabilities we need to build, the Wikimedia Foundation should do two main things:


 * 1) Build an infrastructure platform for many people to contribute augmentation tools.
 * 2) Provide interfaces that make it possible for non-technical editors to apply, adjust and contribute to those tools.

The former would likely be pursued by the Technology department, while the latter would be pursued by the Audiences department. The Audiences work will create on-wiki tools that allow non-technical editors to record training data, identify errors in existing algorithms, and tune algorithms to fit their wiki’s culture; surfacing those tasks as first-class wiki work that other editors can see. Through these interfaces, the shepherding of augmentation tools will become a new, major way of contributing that will ensure that machines are fair and healthy contributors to every wiki.

Assembling the platform and the interfaces that allow a feedback loop are the most important parts of this strategy -- more important than the particular applications of augmentation. That said, particular augmentation tools will generally fall into three aspects: content generation, content curation, and community conduct. We will need to develop design principles in each of these aspects that ensure augmentation tools are transparent and editable; and that ensure augmentation respects the boundaries between human work and machine work. These principles should also govern the ways we incorporate augmentation resources from third parties not controlled by the Wikimedia movement, such as machine translation services.

And finally, in order to be successful with this strategy, we will need to continuously recognize and embrace augmentation as a major way to contribute to the wikis. We can do this through community capacity building, holding events, providing training, and encouraging discussion in the community.

Aspects

 * Content Curation
 * Content Generation
 * Governance
 * Machine Translation

Examples

 * Rambot (content generation)
 * Twinkle (content curation)
 * ClueBot (content curation)
 * SuggestBot (content generation)
 * HostBot (governance)
 * Bot approval process (governance)
 * ORES models in RecentChanges and Watchlist (content curation)
 * Content Translation tool (content generation)
 * Article Placeholder (content generation)

Areas of Impact

 * Wikidata
 * ORES
 * Experienced editors
 * Volunteer developers

Key External Factors

 * The rate of improvement to artificial intelligence, especially machine translation.
 * Efforts by other tech companies to automatically translate English Wikipedia, or to otherwise make massive amounts of information available.
 * The movement’s ability to get top talent to work on these issues as staff or volunteers.

White Paper
DRAFT

Resources

 * Bohannon, John and Dharnidharka, Vedant (2018). Quicksilver: Training an ML system to generate draft Wikipedia articles and Wikidata entries simultaneously. [Video from Wikimedia Research Showcase August 2018]. Retrieved from https://youtu.be/OGPMS4YGDMk.
 * Chisholm, A., Radford, W., & Hachey, B. (2017). Learning to generate one-sentence biographies from Wikidata. EACL.
 * Halfaker, Aaron. 2017. Interpolating Quality Dynamics in Wikipedia and Demonstrating the Keilana Effect. In Proceedings of the 13th International Symposium on Open Collaboration (OpenSym '17). ACM, New York, NY, USA, Article 19, 9 pages. DOI: https://doi.org/10.1145/3125433.3125475
 * Halfaker, Aaron, et. al. ORES: Facilitating re-mediation of Wikipedia’s socio-technical problems. From Wikimedia Commons.
 * Kaffee LA. et al. (2018) Mind the (Language) Gap: Generation of Multilingual Wikipedia Summaries from Wikidata for ArticlePlaceholders. In: Gangemi A. et al. (eds) The Semantic Web. ESWC 2018. Lecture Notes in Computer Science, vol 10843.