ORES/Components

DRAFT description of the ORES system and components



Change propagation


When a new revision is encountered in the  Kafka change propagation stream, we tickle the revscoring API to pre-cache scores for that revision, once for each revscoring model we support on that wiki:"https://github.com/wikimedia/mediawiki-services-change-propagation-deploy/blob/master/scap/templates/config.yaml.j2#L327"The new revision trigger hits URLs of this pattern:

https://ores.wikimedia.org/v2/scores/ / //?precache=true

for example,

https://ores.wikimedia.org/v2/scores/enwiki/damaging/745065890/?precache=true We're currently having some discussion about how to simplify the configuration for this system.

ORES API
See online documentation: https://ores.wikimedia.org/. The ORES API is a container for multiple machine learning models that take article revisions as their input. URLs are RESTful, and will always return the same results until the model is updated. The 'precache' parameter is passed through to metrics collection, with no other side effects. All requests that are intended to pre-populate ORES' cache should include precache= .

Celery
Celery is used to manage concurrency among ORES API workers. We're guaranteed to never run more than  workers at the same time (per machine), and the web frontends can be decoupled from scoring workers.

The service will refuses to create new jobs or serve requests once the queue size goes above a configured.

ORES uses celery's task ID naming system to avoid recalculating scores when (nearly) simultaneous requests for the same score arrive. Instead, requests for the score will both read from the same task ID once computation of the score has completed.

Revscoring engine
Read more about the various machine learning models on the metawiki ORES page.

When a model is updated to a new version, all cached scores for that model are invalidated. It is up to clients that cache scores to invalidate based on version numbers as well.

MediaWiki Extension:ORES
This frontend displays revscoring data on the Special:Contributions and Special:RecentChanges pages.

We create a  in response to the   event, which fetches scores from the ORES API and caches them in the local MediaWiki database for efficient access.

Varnish
Unlike most WMF services, we don't use the Varnish front-end cache.

Redis
The ORES backend stores scores in Redis as they are calculated, and will serve scores from that cache. Each score is saved under a key like,.

MediaWiki database
As Extension:ORES pulls scores from the ORES API, they are stored in the  table.

The  job checks for updates to the models, causing us to purge cached scores from previous versions of the model.

Wikilabels
Humans build the training and validation sets used to train the models, by answering questions about sets of revisions in the wiki labels interface.