ORES

From MediaWiki.org
Jump to: navigation, search


ORES (/ɔɹz/) is a web service and API that provides machine learning as a service for Wikimedia projects maintained by the Scoring Platform team. The system is designed to help automate critical wiki-work -- for example: vandalism detection and removal. Currently, the two general types of scores that ORES generates are in the context of "edit quality" and "article quality".

ORES is a back-end service and does not directly provide a way to make use of the scores. If you'd like to use ORES scores, check our list of tools that use ORES scores. If ORES doesn't support your wiki yet, see our instructions for requesting support.

Edit quality[edit]

ORES edit quality flow. A descriptive diagram of edits flowing from "The Internet" to Wikipedia depicts the "unknown" quality of edits before ORES and the "good", "needs review", "damaging" labeling that is possible after ORES is made available.

One of the most critical concerns about Wikimedia's open projects is the review of potentially damaging contributions ("edits"). There's also the need to identify good-faith contributors (who may be inadvertently causing damage) and offer them support. These models intended to make the work of filtering through the Special:RecentChanges feed easier. We offer two levels of support for edit quality prediction models: basic and advanced.

Basic support[edit]

Assuming that most damaging edits will be reverted and edits that are not damaging will not be reverted, we can build using the history of edits (and reverted edits) from a wiki. This model is easy to set up, but it suffers from the problem that many edits are reverted for reasons other than damage and vandalism. To help that, we create a model based on bad words.

  • reverted -- predicts whether an edit will eventually be reverted

Advanced support[edit]

Rather than assuming, we can ask editors to train ORES which edits are in-fact damaging and which edits look like they were saved in goodfaith. This requires additional work on the part of volunteers in the community, but it affords a more accurate and nuanced prediction with regards to the quality of an edit. Many tools will only function when advanced support is available for a target wiki.

  • damaging -- predicts whether or not an edit causes damage
  • goodfaith -- predicts whether an edit was saved in good-faith


Article quality[edit]

English Wikipedia assessment table. A screenshot of the English Wikipedia assessment table generated by WP 1.0 bot is presented.

The quality of encyclopedia articles is a core concern for Wikipedians. New pages must be reviewed and curated to ensure that spam, vandalism, and attack articles do not remain in the wiki. For articles that survive the initial curation, some of the Wikipedians periodically evaluate the quality of articles, but this is highly labor intensive and the assessments are often out of date.

Curation support[edit]

The faster that seriously problematic types of draft articles are removed, the better. Curating new page creations can be a lot of work. Like the problem of counter-vandalism in edits, machine predictions can help curators focus on the most problematic new pages first. Based on comments left by admins when they delete pages (see the logging table), we can train a model to predict which pages will need quick deletion. See en:WP:CSD for a list of quick deletion reasons for English Wikipedia. For the English model, we used G3 "vandalism", G10 "attack", and G11 "spam".

  • draftquality -- predicts if the article will need to be speedy deleted (spam, vandalism, attack, or OK)

Assessment scale support[edit]

For articles that survive the initial curation, some of the large Wikipedias periodically evaluate the quality of articles using a scale that roughly corresponds to the Wikipedia 1.0 assessment rating scale ("wp10"). Having these assessments is very useful because it helps us gauge our progress and identify missed opportunities (e.g., popular articles that are low quality). However, keeping these assessments up to date is challenging, so coverage is inconsistent. This is where the wp10 machine learning model comes in handy. By training a model to replicate the article quality assessments that humans perform, we can automatically assess every article and every revision with a computer. This model has been used to help WikiProjects triage re-assessment work and to explore the editing dynamics that lead to article quality improvements.

  • wp10 -- predicts the (Wikipedia 1.0-like) assessment class of an article or draft

Support table[edit]

The following table reports the status of ORES support by wiki and model available. If you don't see your wiki listed or support for the model you'd like to use, you can request support.


The ORES support table presents in progress campaign (In progress) and done campaigns (Done). If you want to add a new campaign for your wiki, please visit the Get support page.

context edit quality article quality
damaging goodfaith reverted wp10 draftquality
arwiki Arabic Wikipedia In progress In progress Yes check.svg
cawiki Catalan Wikipedia In progress In progress
cswiki Czech Wikipedia Yes check.svg Yes check.svg Yes check.svg
dewiki German Wikipedia In progress In progress Yes check.svg
enwiki English Wikipedia Yes check.svg Yes check.svg Yes check.svg Yes check.svg Yes check.svg
enwiktionary English Wiktionary Yes check.svg
eswiki Spanish Wikipedia In progress In progress Yes check.svg
eswikibooks Spanish Wikibooks In progress In progress Yes check.svg
etwiki Estonian Wikipedia Yes check.svg Yes check.svg Yes check.svg
fawiki Persian Wikipedia Yes check.svg Yes check.svg Yes check.svg
fiwiki Finnish Wikipedia Yes check.svg Yes check.svg Yes check.svg
frwiki French Wikipedia In-progress In-progress Yes check.svg Yes check.svg
hewiki Hebrew Wikipedia Yes check.svg Yes check.svg Yes check.svg
huwiki Hungarian Wikipedia In progress In progress Yes check.svg
idwiki Indonesian Wikipedia In progress In progress Yes check.svg
itwiki Italian Wikipedia In progress In progress Yes check.svg
kowiki Korean Wikipedia In progress In progress Yes check.svg
nlwiki Dutch Wikipedia Yes check.svg Yes check.svg Yes check.svg
nowiki Norwegian Wikipedia In progress In progress Yes check.svg
plwiki Polish Wikipedia Yes check.svg Yes check.svg Yes check.svg
ptwiki Portuguese Wikipedia Yes check.svg Yes check.svg Yes check.svg
rowiki Romanian Wikipedia In progress In progress Yes check.svg
ruwiki Russian Wikipedia Yes check.svg Yes check.svg Yes check.svg Yes check.svg
svwiki Swedish Wikipedia In progress In progress Yes check.svg
trwiki Turkish Wikipedia Yes check.svg Yes check.svg Yes check.svg
ukwiki Ukrainian Wikipedia In progress In progress Yes check.svg
viwiki Vietnamese Wikipedia In progress In progress Yes check.svg
wikidatawiki Wikidata Yes check.svg Yes check.svg Yes check.svg

API usage[edit]

ORES offers a Restful API service for dynamically retrieving scoring information about revisions. See https://ores.wikimedia.org for more information on how to use the API.

If you're querying the service about a large number of revisions, it's recommended to batch 50 revisions in each request as described below. It's acceptable to use up to four parallel requests.