ORES/New model checklist

Overview
This is a guide to training new ORES models and enabling them on wikis.

This guide is intended for members of the Scoring Platform team.

Learn more about ORES and machine learning as a service.

See historical examples of adding new models:

Step 1: Determine which models need to be built
Each model can target one set of classifications. The current norm is to begin with  and   models for a single language wiki database.

Note: Obtaining labeled training observations can require a large investment of volunteer time. This can be a limiting factor when making decisions about which models to build.

Step 2: Create tasks in Phabricator
Create the following set of tasks in for Scoring Platform in Phabricator


 * Optional parent epic
 * Task to gather labeled test data.
 * Task to engineer language features
 * Task to collect word lists.
 * Task to train, test, and tune the model (blocked by test data).
 * Task to deploy the model (blocked by training).

Step 3: Compile the test dataset.

 * For edit quality, run a WikiLabels campaign on the target wiki.
 * File a Phabricator ticket to set up the campaign.
 * Announce the campaign on-wiki.
 * Run the campaign until it is complete. Update the community regularly during the campaign. Typically, around 5,000 articles will need to be assessed.
 * For an article quality model, we'll need to collect the set of all articles already given evaluations, grouped by quality classification.
 * For some models, test data may be extracted by using database queries.

Step 4: Create badwords and informal words lists
Badword lists (AKA BDWS lists) and informal words lists have already been generated for a number of languages. An appropriate BDWS list for the new model may be found in the existing revision scoring word lists.

Lean more about how to sort BDWDS-generated words lists.

Step 5: Add the new model to configuration files and Makefiles

 * For example, https://github.com/wiki-ai/editquality/pull/68/commits/4a9916e4d0291db39ca9d916600ecd39f1c1ec15
 * TBD: and more?

Step 8: Add and commit the model
If the model requires the installation of a new language dictionary, add the dictionary to the ORES base config in puppet.

Step 9: Deploy the new model
The final step is to deploy the new model.

There are a number of spaces where new models can be deployed. Refer to the following chart to determine the appropriate space.

How to deploy server and MediaWiki components of ORES on a new wiki

 * 1) Follow all of the steps on the beta cluster.
 * 2) Enable   for the wiki.
 * 3) Configure wiki thresholds to reasonable defaults (how do we guess these?) in.
 * 4) Request new DB tables.