Wikimedia Scoring Platform team/FY2019

The Scoring Platform team is an experimental, research-focused, community-supported, AI-as-a-service team. Our work focuses on balancing the efficiency that machine classification strategies bring to wiki-processes with transparency, ethics, and fairness. Our primary platform is ORES, an AI service that supports wiki processes such as vandal fighting, gap detection, and new page patrolling. The current set of ORES-supported products are loved by our communities, and our team's work of relieving overloaded community processes with AI, has shown great potential to enable conversations about growing our community (Knowledge Equity). In this proposal, we'll describe what we think we can accomplish given our current, minimal staffing. We'll also propose to fully staff the team along the lines of the original FY2018 Scoring Platform proposal so that we can expand our capacity in the critical area of bias detection and mitigation.

Overview of FY2018[edit]

Last year, we invested in the Scoring Platform team by giving Aaron a budget, staffing the team with Adam Wight as a senior engineer (80%) and Amir Sarabadani as a junior engineer (50%). We also retained a contracting budget to hire experts to develop new AIs and evaluation strategies. In total, we have a staff of 2.55 FTEs.

Despite this minimal staffing, the team has been quite successful.

Lots more models delivered to lots more wikis (targeting emerging communities, increasing capacity for knowledge equity)
Deployed ORES on a dedicated cluster and refactored the ORES extension (more uptime, evolving infrastructure)
Collaborated with Community Tech on a study of new page review issues�—training and testing a critical technology for mitigating the issue (evolving our infrastructure and experimenting with new strategies for supporting newcomers).
Published papers about why people cite what they cite, and the dynamics of bot governence (increasing our understanding of wiki processes).
We performed a community consultation and system design process for JADE, our proposed auditing support infrastructure.

Contingency planning for FY2019[edit]

In order to deal with funding realities, we've prepared two annual plans for our department. The first alternative presents our ideal plan, which includes a reasonable amount of growth. The other is what we can accomplish if staffing levels cannot be improved.

Staffing increased as requested[edit]

Ask[edit]

Bring the team up to a higher level capacity and robustness by:

Promote Amir to a full-time requisition holder
Hire an engineering manager/tech lead to remove this burden from Aaron. This was proposed in our original plan for FY2018.

Benefits[edit]

Support more languages and wikis: we have a large backlog of requests for ORES support.
Bring our new auditing system, JADE, online. Start to tracking algorithmic bias—the kind of problems that keep some potential contributors out—much more effectively (Knowledge Equity).
More robust ORES service.
Develop new prediction models more quickly (Knowledge as a Service). Many of the models we will target are intended to provide fertile ground for experimentation around the balance between efficient quality control and better newcomer support (Knowledge Equity).
Once Aaron is wearing fewer hats, he'll be less of a bottleneck for the team. With more time, Aaron will be able to participate in thought leadership/outreach more effectively.

Staffing unchanged from FY2018 levels[edit]

Ask[edit]

Continue funding the Scoring Platform Team at FY2018 levels.

Benefits[edit]

In the next fiscal year, we will continue our work of making ORES more robust, and expanding our prediction models to new wiki processes and under-served wiki communities.

Slowly increase model support to more wikis, prioritizing emerging communities.
Experiment with the new article routing models and expand them to more communities.
Publish datasets and papers about the process and machine-based process augmentation time.

Risks and challenges[edit]

While the Scoring Platform team has been able collaborate effectively with volunteers in order to supplement its minimal resourcing, the fact is that the development of ORES (useful AIs) and JADE (our auditing system) has been slowed substantially by understaffing. We have the chance to help lead the industry on this front, but it may escape us.
Our bus factor is still far too low. Were we to lose the one full-time engineer on the team, development and deployments would nearly come to a halt. Or worse, if Aaron were to be lost, the majority of the team's infrastructure would leave with him.