Jump to navigation Jump to search

About this board

This talk page is intended to be used to discuss the development of ORES. For the bots/tools that use ORES, please contact their owners/maintainers. See ORES/Applications for a list of tools/bots that use ORES.

Should ORES be aggressive to catch vandalism or should ORES be less aggressive to be nice to newcomers?

EpochFail (talkcontribs)

Imagine you’ve just spent 10 minutes working on what you earnestly thought would be a helpful edit to your favorite article. You click that bright blue “Publish changes” button for the very first time, and you see your edit go live! Weeee! But 10 seconds later, you refresh the page and discover that your edit has been reverted.

Actually, an AI system - called ORES- has contributed to the judgement of hundreds of thousands of edits on Wikipedia. ORES is a machine learning system that automatically predicts edit and article quality to support editing tools in Wikipedia.

I'm exploring strategies for tuning ORES predictions about quality and vandalism to your needs and I'd like to work with you. I am are looking for editors to discuss the values of Wikipedia as it relates to ORES.

If you are interested in participating, please fill out the short survey below. Thanks!

Klaas van Buiten (talkcontribs)

We have to stay at least as supportive to beginners as we ere in the beginning hen we haad to grow from small to gigantic like we are now. Practically everyone has good will

Reply to "Should ORES be aggressive to catch vandalism or should ORES be less aggressive to be nice to newcomers?"

ORES template not refreshing

Chtnnh (talkcontribs)

I made a minor edit to the ORES template to update the team name that maintains ORES. Despite refreshing the page, the changes are not reflected. Can someone look into this?

Reply to "ORES template not refreshing" (talkcontribs)

I edited both ORES/FAQ and ORES/Get support pages to mention FANDOM wikis as well.

ORES AI service is used by Wikimedia Foundation's projects, most notably English Wikipedia, but what about FANDOM's Unified Community Platform (UCP) wikis?

I'm curious about machine learning assisting local wiki moderators and admins, and even SOAP (formerly VSTF) members finding spam and vandalism as well as disruptive edits.

EpochFail (talkcontribs)

ORES could totally be used on any wiki. Are the FANDOM wikis running MediaWiki? If not, there would need to be some engineering work to develop the API connectors that would allow ORES to pull data from whatever wiki platform FANDOM is running on. If not, I think the hardest part is going to be getting a server to run it as we couldn't host a model for FANDOM in Wikimedia's production installation. Depending on the # of edits, ORES can run on relatively minimal hardware.

Reply to "ORES on FANDOM wikis?"

Edit summaries classifying edits as damaging or good-faith?

Enervation (talkcontribs)

Is it suggested to add keywords in your edit summaries undoing other people's edits such as "damaging", "vandalism", or "good-faith" to help train ORES?

EpochFail (talkcontribs)

We don't process edit summaries to look for such annotations. Instead, the team is developing mw:Jade, a system for explicitly saying what was going on in an edit while undoing it. They are pretty close to deploying a pilot.

Reply to "Edit summaries classifying edits as damaging or good-faith?"

What changes in probabilities are significant?

Sebastian Berlin (WMSE) (talkcontribs)

As part of our work with the community wish list on SVWP , we're going to develop a gadget that gives an editor feedback using the article quality assessment. The idea is to show the quality before and after the edit. A question that has arisen is what changes to show and with what precision. Is it reasonable to show the difference for all probabilities, regardless of how small that difference is? My worry is that small changes to the probabilities may not be significant and could be misleading. Could someone give me some help with what changes are useful to show in a case like this?

EpochFail (talkcontribs)

I've developed a Javascript gadget that does something very similar to what you are planning. See I wonder if we could make a modification to this tool to support what you are working on.

I've been using a "weighted sum" strategy to collapse the probabilities across classes into a single value. See this paper and the following code for an overview of how it works for English Wikipedia.

WEIGHTED_CLASSES = {FA: 6, GA: 5, B: 4, C: 3, Start: 2, Stub: 1}

weightedSum = function(score){
  var sum = 0
  for(var qualityClass in score.probability){
    if (!score.hasOwnProperty(qualityClass)) continue;
    var proba = score.probability[qualityClass]
    sum += proba * WEIGHTED_CLASSES[qualityClass]
  return sum

This function returns a number between 1 and 6 that represents the model's prediction projected on a continuous scale.

Now, how big of a change matters? That's a good question and it's a hard one to answer. I think we'll learn in practice quite quickly once we have the model for svwiki.

Sebastian Berlin (WMSE) (talkcontribs)

That looks very interesting. I found (part of?) your script earlier, but I haven't had time to go figure out exactly what's going on there. I'll have a look and see what bits are reusable for this. I'd guess that the backend stuff (API interaction, weighting etc.) should be fairly similar.

I like the idea of just having one number to present to the user, along with the quality. From what I've understood, the quality levels aren't as evenly spaced on SVWP as on ENWP; it goes directly from Stub to equivalent to B. I don't know if and how this would impact the weighting algorithm, but maybe that will become apparent once it's in use.

EpochFail (talkcontribs)

We can have non-linear jumps in the scale. E.g. {Stub: 1, B: 4, GA: 5, FA: 6}

Claudiamuellerbirn (talkcontribs)

Dear all. I am not sure, if this thread is still active, however, I have a student working for an interface for representing the quality of Wikidata items. I would be happy to meet and to talk about it. We are based in Berlin :)

Reply to "What changes in probabilities are significant?"

PSA: Switch from using to

EpochFail (talkcontribs)

Hey folks! I've been debugging some issues with our experimental installation of ORES in Cloud VPS recently. It looks like we're getting *a lot* of traffic there. I just want to make sure that everyone knows that the production instance of ORES is stable and available at and that will be going up and down as we use it to experiment with ORES and new models we'd like to bring to production.

Reply to "PSA: Switch from using to"

ORES downtime on July 16th @ 1500 UTC

Halfak (WMF) (talkcontribs)

Hey folks,

We expect a couple minutes of downtime while we restart a machine tomorrow (Tuesday, July 16th @ 1500 UTC).  

Halfak (WMF) (talkcontribs)

The maintenance is done and it doesn't appear that there was any downtime.

Reply to "ORES downtime on July 16th @ 1500 UTC"

It seems the Phabracator is calling ORES JADE now?

Xinbenlv (talkcontribs)
94rain (talkcontribs)

ORES is not Jade, and Jade is not ORES. They are related but not the same thing.

There is also a page for Jade.

But there is a request phab:T153143 that ORES query results should include JADE refutations.

Xinbenlv (talkcontribs)

OK, I see. I was trying to find some documentation of JADE, where can I find it? thank you!

94rain (talkcontribs)
EEggleston (WMF) (talkcontribs)

Is the page the right place to link?

EpochFail (talkcontribs)

It's not a bad place to link. We keep all of our primary repos within that organization.

This post was hidden by Neil Shah-Quinn (WMF) (history)
Adamw (talkcontribs)

Unfortunately, this is out-of-date now. The wiki-ai organization still shows which repos we work in, but the most current code for those projects will be in the `wikimedia` organization.

We need to create a new entry point for developers.

Reply to "Links to repos?"

Question from a student doing independent research

Summary by FeralOink

I am marking this question as fully resolved. To summarize, a computer science student at an American university inquired of Wikimedia regarding detection of undisclosed paid edits and revisions. The student is using a particular framework for his own project of detecting paid Wikipedia edits, and wished to validate its accuracy versus the findings of Wikimedia's own methods of undisclosed paid edit detection.

An employee of Wikimedia responded to the inquiry. He provided links to recent public releases of data, as well as descriptions of ORES and a predictive model used for detection. He provided ORES parameter settings and information about API use. The student replied, acknowledging that the response he received was helpful and adequate. The student also gave permission for the conversation to be openly posted on mediawiki.

Halfak (WMF) (talkcontribs)

The following is an email conversation that I had with Mark Wang about ORES. I'm posting this so that it'll maybe gain some long-term usefulness. As you can see at the end of the thread, Mark agreed to me posting this publicly.

On Sat, Nov 17, 2018 at 8:50 PM Wang, Mark <> wrote:

Hi Scoring Platform Team!

I'm Mark, a CS student at Brown Univ. I'm working on experimenting with applying Snorkel (a framework for leveraging unlabeled data with noisy label proposals) to detect paid Wikipedia edits. I've got a few selfish requests / questions for you guys.

Snorkel code : Snorkel paper :

Some selfish questions:

1) Is it possible for me to have access to edits and page-stats data that you work with? I can scrape them myself (with a reasonable crawl rate), but of course, it's less convenient and I'll end up working with less data.

2) How do you represent revisions? I'm thinking about using character embeddings here. What are some methods that worked well for you guys? And what should I probably not try?

3) What features seem to be strongly informative in your models for detecting low-quality edits?

4) Any additional recommendations /advice?

Thank you in advance for your time, Mark Wang

On Mon, Nov 19, 2018 at 4:49 PM Aaron Halfaker <> wrote: Hi Mark!

Thanks for reaching out! Have you seen our recent data release of known paid editors?

1) I'm not sure what page stats you are looking for, but you can see the features we use in making predictions by adding a "?feature" argument to an ORES query. For example, shows the features extracted and a "is this edit damaging" prediction for

2) A revision is a vector that we feed into the prediction model. We do a lot of manual feature engineering, but we use vector embeddings for topic modeling. We're actually looking into just using our current word2vec strategies for implementing better damage detection too. See

3) Here's an output of our feature importance weights for the same model. This is estimated by sklearn's GradientBoosting model.

feature.log((temporal.revision.user.seconds_since_registration + 1)) 0.131
feature.revision.user.is_anon 0.036
feature.english.dictionary.revision.diff.dict_word_prop_delta_sum 0.033
feature.revision.parent.markups_per_token 0.029
feature.revision.parent.words_per_token 0.028
feature.revision.parent.chars_per_word 0.027
feature.log((wikitext.revision.parent.ref_tags + 1)) 0.026
feature.revision.diff.chars_change 0.026
feature.revision.user.is_patroller 0.026
feature.english.dictionary.revision.diff.dict_word_prop_delta_increase 0.025
feature.log((wikitext.revision.parent.chars + 1)) 0.023
feature.log((AggregatorsScalar(<datasource.tokenized(datasource.revision.parent.text)>) + 1)) 0.023
feature.log((AggregatorsScalar(<datasource.wikitext.revision.parent.words>) + 1)) 0.023
feature.revision.parent.uppercase_words_per_word 0.022
feature.log((wikitext.revision.parent.wikilinks + 1)) 0.021
feature.log((wikitext.revision.parent.external_links + 1)) 0.02
feature.log((wikitext.revision.parent.templates + 1)) 0.02
feature.wikitext.revision.diff.markup_prop_delta_sum 0.02
feature.english.dictionary.revision.diff.non_dict_word_prop_delta_sum 0.02
feature.log((AggregatorsScalar(<datasource.wikitext.revision.parent.uppercase_words>) + 1)) 0.018
feature.revision.diff.tokens_change 0.018
feature.log((wikitext.revision.parent.headings + 1)) 0.017
feature.wikitext.revision.diff.markup_delta_sum 0.015
feature.revision.diff.words_change 0.015
feature.english.dictionary.revision.diff.dict_word_delta_sum 0.015
feature.english.dictionary.revision.diff.dict_word_prop_delta_decrease 0.015
feature.english.dictionary.revision.diff.non_dict_word_prop_delta_increase 0.015
feature.revision.diff.markups_change 0.014
feature.english.dictionary.revision.diff.dict_word_delta_increase 0.014
feature.wikitext.revision.diff.markup_prop_delta_increase 0.013
feature.wikitext.revision.diff.markup_delta_increase 0.012
feature.wikitext.revision.diff.number_prop_delta_sum 0.011
feature.wikitext.revision.diff.number_prop_delta_increase 0.011
feature.english.dictionary.revision.diff.non_dict_word_delta_sum 0.011
feature.wikitext.revision.diff.number_delta_increase 0.01
feature.revision.diff.wikilinks_change 0.01
feature.revision.comment.has_link 0.01
feature.english.dictionary.revision.diff.dict_word_delta_decrease 0.01 0.009
feature.wikitext.revision.diff.number_delta_sum 0.009
feature.wikitext.revision.diff.markup_prop_delta_decrease 0.008
feature.english.dictionary.revision.diff.non_dict_word_prop_delta_decrease 0.008 0.007
feature.revision.diff.external_links_change 0.007
feature.revision.diff.templates_change 0.007
feature.revision.diff.ref_tags_change 0.007
feature.english.informals.revision.diff.match_prop_delta_sum 0.007
feature.english.informals.revision.diff.match_prop_delta_increase 0.007
feature.wikitext.revision.diff.number_prop_delta_decrease 0.006
feature.revision.comment.suggests_section_edit 0.006
feature.english.dictionary.revision.diff.non_dict_word_delta_increase 0.006
feature.wikitext.revision.diff.markup_delta_decrease 0.005
feature.revision.user.is_bot 0.005
feature.revision.user.is_admin 0.005
feature.english.badwords.revision.diff.match_prop_delta_sum 0.005
feature.wikitext.revision.diff.number_delta_decrease 0.004
feature.wikitext.revision.diff.uppercase_word_prop_delta_sum 0.004
feature.revision.diff.headings_change 0.004
feature.revision.diff.longest_new_repeated_char 0.004
feature.english.badwords.revision.diff.match_prop_delta_increase 0.004
feature.english.informals.revision.diff.match_delta_increase 0.004
feature.english.dictionary.revision.diff.non_dict_word_delta_decrease 0.004
feature.wikitext.revision.diff.uppercase_word_delta_sum 0.003
feature.wikitext.revision.diff.uppercase_word_prop_delta_increase 0.003
feature.revision.diff.longest_new_token 0.003
feature.english.informals.revision.diff.match_delta_sum 0.003
feature.wikitext.revision.diff.uppercase_word_delta_increase 0.002
feature.wikitext.revision.diff.uppercase_word_prop_delta_decrease 0.002
feature.english.badwords.revision.diff.match_delta_sum 0.002
feature.english.badwords.revision.diff.match_delta_increase 0.002
feature.wikitext.revision.diff.uppercase_word_delta_decrease 0.001
feature.english.informals.revision.diff.match_prop_delta_decrease 0.001 0.0
feature.revision.user.has_advanced_rights 0.0
feature.revision.user.is_trusted 0.0
feature.revision.user.is_curator 0.0
feature.english.badwords.revision.diff.match_delta_decrease 0.0
feature.english.badwords.revision.diff.match_prop_delta_decrease 0.0
feature.english.informals.revision.diff.match_delta_decrease 0.0

4) You'll note that time since registration and is_anon are strongly predictive. They don't overwhelm the predictions -- we can still differentiate good from bad among newcomers and anonymous editors. But the model generally doesn't predict that an edit by a very experienced editors is bad regardless of what's actually in the edit. The more we can move away from relying is_anon and seconds_since_registration, the more we'll be targeting the things that people do -- rather than targeting them for their status. See section 7.4 our systems paper for a more substantial discussion of this problem.


On Mon, Nov 19, 2018 at 6:47 PM Wang, Mark <> wrote:

Thanks a bunch for your help Aaron! This is all very informative.

One more question from me: May I borrow your features? And if so, is accessing them through the API the preferred method of access for an outsider?

Thanks again, Mark

On Tue, Nov 20, 2018 at 11:07 AM Aaron Halfaker <> wrote:

Say, I'd like to save this conversation publicly so that others might benefit from it. Would you be OK with me posting our discussion publicly on a wiki?

On Tue, Nov 20, 2018 at 10:06 AM Aaron Halfaker <> wrote: Yes. That is a good method for accessing the features. You'll notice that the features that the API reports are actually just the basic reagents for the features the model uses.

For example, we have features like this:

  • words added
  • words removed
  • words add / words removed
  • log(words added)
  • log(words removed)
  • etc.

In all of these features, the basic foundation is "words added" and "words removed" with some mathematical operators on top. So we only report those two via the API. To see the full set of features for our damage detection model, see See also a quick overview I put together for feature engineering here:

If I wanted to extract the raw feature values for the English Wikipedia "damaging" model, I'd install the "revscoring" library (pip install revscoring) and then run the following code from the base of the editquality repo:

$ python
Python 3.5.1+ (default, Mar 30 2016, 22:46:26) 
[GCC 5.3.1 20160330] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> from editquality.feature_lists.enwiki import damaging
/home/halfak/venv/3.5/lib/python3.5/site-packages/sklearn/ DeprecationWarning: This module was deprecated in version 0.18 in favor of the model_selection module into which all the refactored classes and functions are moved. Also note that the interface of the new CV iterators are different from that of this module. This module will be removed in 0.20.
  "This module will be removed in 0.20.", DeprecationWarning)
>>> from revscoring.extractors import api
>>> import mwapi
>>> extractor = api.Extractor(mwapi.Session(""))
Sending requests with default User-Agent.  Set 'user_agent' on mwapi.Session to quiet this message.
>>> list(extractor.extract(123456789, damaging))
[True, True, False, 10.06581896445358, 9.010913347279288, 8.079927770758275, 3.4965075614664802, 2.772588722239781, 5.402677381872279, 2.70805020110221, 1.791759469228055, 2.1972245773362196, 7.287484510532837, 0.3940910755707484, 0.009913258983890954, 0.06543767549749725, 0.0, 2.0, -2.0, 0.04273504273504275, 0.15384615384615385, -0.1111111111111111, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 2.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 1, 1, False, False, False, False, True, False, False, 11.305126087390619, False, False, 0, 0, 0, 0.0, 0.0, 0.0, 0, 0, 0, 0.0, 0.0, 0.0, 0, 0, 0, 0.0, 0.0, 0.0, 0, 0, 0, 0.0, 0.0, 0.0]

This extracts the features for this edit:


Hi Aaron:

Thank you so much! This is all so helpful. And of course, feel free to publicize any of our conversations.