ORES/Components

DRAFT description of the ORES system and components



Change propagation


When a new revision is encountered in the  Kafka change propagation stream, we tickle the revscoring API to pre-cache scores for that revision, once for each revscoring model we support on that wiki:"https://github.com/wikimedia/mediawiki-services-change-propagation-deploy/blob/master/scap/templates/config.yaml.j2#L327"The new revision trigger hits URLs of this pattern:

https://ores.wikimedia.org/v2/scores/ / //?precache=true

for example,

https://ores.wikimedia.org/v2/scores/enwiki/damaging/745065890/?precache=true

ORES API
See online documentation: https://ores.wikimedia.org/. The ORES API is a container for multiple machine learning models that take article revisions as their input. URLs are RESTful, and will always return the same results until the model is updated.

TODO: Explain the "precache" parameter. Is this passed through to metrics collection, with no other side effects?

Celery
Celery is used to manage concurrency among ORES API workers. We're guaranteed to never run more than  workers at the same time, and the web frontends can be decoupled from scoring workers.

TODO: Was that correct?

We will refuses to create new jobs or serve requests once the queue size goes above a configured.

TODO: Something magical in here prevents recalculating scores when simultaneous requests arrive for the same data.

Revscoring engine
Read more about the various machine learning models on the metawiki ORES page.

When a model is updated to a new version, all cached scores for that model are invalidated.

MediaWiki Extension:ORES
This frontend displays revscoring data on the Special:Contributions and Special:RecentChanges pages.

We create a  in response to the   event, which fetches scores from the ORES API and caches them in the local MediaWiki database for efficient access.

Varnish
Unlike most WMF services, we don't use the Varnish front-end cache.

Redis
The ORES backend stores scores in Redis as they are calculated, and will serve scores from that cache. Each score is saved under a key like,.

MediaWiki database
As Extension:ORES pulls scores from the ORES API, they are stored in the  table.

The  job checks for updates to the models, causing us to purge cached scores from previous versions of the model.

Wikilabels
Humans build the training and validation sets used to train the models, by answering questions about sets of revisions in the wiki labels interface.