ORES/Newcomerquality

Overview
The newcomer quality project aims to use machine learning to predict how damaging and goodfaith new editors to Wikipedia projects are, within their first few edits. The technology builds on the ORES platform, aggregating the lower-level predictions about edits into higher-level predictions about edit-sessions, and finally to predictions about users.

Community Motivation
Newcomer retention is one of the largest problems facing Wikipedias today. One approach that has found success are newcomer welcoming and mentoring programs such as the en:Wikipedia:Teahouse (or fr:WP:Forum des Nouveaux.) However getting new editors to those forums usually involves either a) inviting all newcomers, which has the problems of overwhelming mentors and potentially invites vandals, or b) inviting a subset of newcomers based on heuristics, which could miss out on some good editors. Artificial intelligence or machine learning could potentially bridge the gap by inviting only the best newcomers without humans having to sort through the hundreds or thousands of newly registered editors each day.

Technical Motivation
ORES as a predictive algorithm can already predict the quality of single edits and articles, this project aims to extend that capability to sessions of multiple related edits. Being able to predict session quality paves the way for potential future tools such as automatically detecting promising new editors or edit wars on pages. Of course this idea is not new, since 2014 Snuggle has been trying to detect new editors that may have been bitten by vandal-fighters, but its infrastructure is reliant on pre-ORES technology, and is not easily generalizable. Continuing on that stream of work with ORES we could start predicting labels for collections of edits of all kinds.

HostBot Integration
Since about 2016 en:User:HostBot has been working in tandem with the English Wikipedia TeaHouse to do the repetitive work of inviting newly-registered. In order to keep the number of invitees manageable for TeaHouse hosts, it limits itself to inviting just 300 users among the approximately 2000 qualifying every day.

New page potential
Other possible uses for the technology being developed is to classify collections of edits all relating to a new page rather than a new user. In this way we could aid article creation and and article deletion processes by classifying damaging or goodfaith new pages.

Teahouse Experiment

 * How hostbot currently works
 * The target is the 300 best users to recommend, based on goodfaithness
 * Example differences link to page.
 * An A/B test, with retention metric
 * When it would run, how it would be randomized, that it would be a blind test.

Labelling Campaigns
Any machine learning project requires ground-truth labels. The labelling campaigns are:


 * enwiki campaign w:en:Wikipedia:Labels/Newcomer_session_quality

Code, Technical Details, and Research

 * What is a session.
 * Feature lists
 * The model
 * An API or a package?
 * Why goodfaith models first. And why
 * November 16 2018 - Training with 30 features included revert ideas from qualitative research, such as edit war detection. Using precision at k=300 for metric. Shows that Logistic Regression and Gradient Boosting will do well. However performance is slightly diminshed when I subset only to non-singletons. So I want to label 100 more non-singleton edits and then finally pick the model.
 * What heuristics can be used to go from sessions->user predictions for now.
 * The repo
 * https://github.com/notconfusing/newcomerquality/commit/db6f8811ab8ed029a73faefb10e3fc06ff0355d8

Future Directions and Open Questions

 * Change user labels from (2) damaging and goodfaith to more fine-grained taxonomy of (4) adding vandal and golden see meta:Research:Newcomer_quality.
 * A well-researched method to aggregate session predictions into single user predictions, rather than current heursitc.
 * Are damaging-but-goodfaith editors more easily detectable in sessions rather than.