Wikimedia Scoring Platform team/FY2019



The Scoring Platform team is an experimental, research focused, community supported, AI-as-a-service team. Our work focuses on balancing the efficiency that machine classification strategies bring to wiki-processes with transparency, ethics, and fairness. Our primary platform is ORES, an AI service that supports wiki processes like vandal fighting, gap detection, and new page patrolling. The current set of ORES-supported products are loved by our communities and our team's work, supporting overloaded community processes with AI, has shown great potential to enable conversations about growing our community (Knowledge Equity). In this proposal, we'll describe what we think we can accomplish given our current, minimal staffing. We'll also propose to fully staff the team along the lines of the original FY2018 Scoring Platform proposal so that we can expand our in the critical area of bias detection and mitigation.

Overview of FY2018
Last year, we invested in the Scoring Platform team by giving Aaron a budget by staffing the team with Adam Wight as a senior engineer (80%) and Amir Sarabadani as a junior engineer (50%). We also retained a contracting budget to hire experts to develop new AIs and evaluation strategies. In total, we have a staff of 2.55 FTEs.

Despite this minimal staffing, the team has been quite successful.
 * Lots more models delivered to lots more wikis (targeting emerging communities, increasing capacity for knowledge equity)
 * Deployed ORES on a dedicated cluster and refactored the ORES extension (more uptime, evolving infrastructure)
 * Collaborated with commtech on a study of new page review issues -- trained and tested a critical technology for mitigating the issue (evolving our infrastructure and experimenting with new strategies for supporting newcomers)
 * Published papers about why people cite what they cite and the dynamics of bot governence (increasing our understanding of wiki processes)
 * We performed a community consultation and system design process for JADE, our proposed auditing support infrastructure

Contingency planning for FY2019
In order to deal with funding realities, we've prepared two annual plans for our department. The first alternative presents our ideal plan, which includes a reasonable amount of growth. The other is what we can accomplish if staffing levels are static.

Staffing increased as requested

 * We can bring the team up to a higher level capacity and robustness by (1) promoting Amir to a full-time req holder and (2) hiring an engineering manager/tech lead to remove that burden from Aaron. This is in-line with our original plan for FY2018.


 * Support more languages and wikis: we have a large backlog of requests for ORES support.
 * Bring our new auditing system, JADE, online. Start to tracking algorithmic bias—the kind of problems that keep some potential contributors out—much more effectively (Knowledge Equity).
 * More robust ORES service.
 * Develop new prediction models more quickly (Knowledge as a Service). Many of the models we will target are intended to provide a fertile ground for experimentation around mixing efficient quality control with better newcomer support (Knowledge Equity).
 * Once Aaron is wearing fewer hats, he'll be less of a bottleneck for the team, and Aaron will able to participate in thought leadership/outreach more effectively.

Staffing unchanged from FY2018 levels
In the next fiscal year, we'd like to continue our work towards making ORES more robust and expanding our prediction models to new wiki processes and under-served wiki communities.
 * Increase model support to more wikis -- targeting emerging communities
 * Experiment with the new article routing models and expand them to more communities
 * Publish datasets and papers about the process and machine-based process augmentation time.

Risks and challenges

 * While the Scoring Platform team has been able collaborate effectively with volunteers in order to supplement it's minimal resourcing, the fact is that the development of ORES (useful AIs) and JADE (our auditing system) has been slowed substantially. We have the chance to help lead the industry on this front, but it may escape us.
 * Our bus factor is still far too low. Were we to temporarily lose the one full-time engineer on the team, development and deployments would nearly come to a halt.  Or worse, if Aaron were to be lost, the majority of the team's infrastructure would leave with him.