DRAFT description of the ORES system and components. For an overview of the network architecture as deployed to WMF servers, see wikitech .
When a new revision is encountered in the
mediawiki.revision_create Kafka change propagation stream, we tickle the revscoring API to pre-cache scores for that revision, once for each revscoring model we support on that wiki:
The new revision trigger hits URLs of this pattern:
We're currently having some discussion about how to simplify the configuration for this system.
See online documentation: https://ores.wikimedia.org/. The ORES API is a container for multiple machine learning models that take article revisions as their input. URLs are RESTful, and will always return the same results until the model is updated. The 'precache' parameter is passed through to metrics collection, with no other side effects. All requests that are intended to pre-populate ORES' cache should include precache=<some identifying string>.
Celery is used to manage concurrency among ORES API workers. We're guaranteed to never run more than
CELERYD_CONCURRENCY workers at the same time (per machine), and the web frontends can be decoupled from scoring workers.
The service will refuses to create new jobs or serve requests once the queue size goes above a configured
ORES uses celery's task ID naming system to avoid recalculating scores when (nearly) simultaneous requests for the same score arrive. Instead, requests for the score will both read from the same task ID once computation of the score has completed.
Read more about the various machine learning models on the metawiki ORES page.
When a model is updated to a new version, all cached scores for that model are invalidated. It is up to clients that cache scores to invalidate based on version numbers as well.
This frontend displays revscoring data on the Special:Contributions and Special:RecentChanges pages.
We create a
FetchScoreJob in response to the
RecentChange_save event, which fetches scores from the ORES API and caches them in the local MediaWiki database for efficient access.
Unlike most WMF services, we don't use the Varnish front-end cache.
The ORES backend stores scores in Redis as they are calculated, and will serve scores from that cache. Each score is saved under a key like,
As Extension:ORES pulls scores from the ORES API, they are stored in the
CheckModelVersions job checks for updates to the models, causing us to purge cached scores from previous versions of the model.
Humans build the training and validation sets used to train the models, by answering questions about sets of revisions in the wiki labels interface.