There was recently a suggestion in the Catalan Wikipedia to start using ORES again after months of inactivity. We have an edit type campaign, but there haven't been any damaging-goodfaith campaigns, which seems unusual according to ORES/Get support. Furthermore, we do not seem to have language support for Catalan. Are we on the right track? What should we do to reactivate the labelling project? Thanks in advance.
Is there a correlation between the editing environment and draftquality?
I'd like to know whether
draftquality is the same for new accounts that create new articles in the visual editor as it is for new accounts that create new articles in the older wikitext editors (at the English Wikipedia).
@Nettrom, I'm assuming that this is outside the scope of your current projects. @Neil P. Quinn-WMF, is this something that you could do? I'm not sure how much work this would be, but I assume that it's not very difficult.
I'm very interested in all VE research results. Please ping me if this project goes forwards.
Confounding variables of using different self-selected populations would produce unreliable results, especially because some percentage of those "new accounts" actually represent experienced editors. (Paid editors in particular abuse throw-away accounts for each new article.) However I have a fix. You can do a retroactive controlled study. You have to ignore whether an article was created using VE, and look for any difference in draftquality between the experimental and control groups of the May 2015 study of VisualEditor's effect on newly registered editors.
Comparing control group wikitext articles against experimental group wikitext+VE articles will cut your signal strength in half, but it's the only way to avoid junk data due to skewed population selection.
Your idea might measure the added value for VisualEditor's contribution (e.g., if you were trying to show that a new editor using VisualEditor is more likely to properly format a citation), but that's not actually my goal. I'm thinking that the chosen editing environment might be a useful marker.
There's obviously no value in collecting data showing that experienced editors produce higher quality drafts than new users.
That is likely what we would get if we ran your proposed data-collection without modification. An experienced user with a new-account is more likely to know how to switch to the secondary editor. This can introduce an experienced-user bias in the study's population-selection.
(Collecting reliable data from the wild isn't easy.)
My proposal is to study an objective, non-speculative condition: "new accounts that create new articles".
What value would that have, if it merely establishes that experienced users produce higher quality drafts than new users?
We can get much more valuable results by re-examining data from the controlled study.
You are welcome to re-examine that old data if you want to.
You are also welcome to study whether new accounts that you believe to be experienced editors actually produce higher quality drafts. However, it sounds circular to me: How will you divide the brand-new accounts into "experienced" and "new" editors? By looking at the quality of the draft. What are you going to study? Whether the ones that you labeled "experienced", on the basis of their higher quality drafts, produced higher quality drafts than the ones that you labeled "new", on the basis of their lower quality drafts. If you did not find a perfect correlation in such a study, then you would probably want to look for an arithmetic error.
I do not want to discourage you from researching whatever interests you, but your question does not interest me.
What??? Do you understand why you're going to get junk data?
For new accounts, you can't distinguish experienced editors from new editors. It's a confounding factor. You're proposing to use biased populations.
I also don't understand why you seem actively-averse to looking at high quality data.
Again, I'm not trying to distinguish experienced editors from new editors.
I'm trying to find out whether new accounts (=an objective, unbiased, machine-identifiable state that is only partially correlated with the actual experience level of the humans who are using those accounts) and either use, or don't use, the newer editing software, produce the same or different results on the specific measure of ORES
As a side note, it sounds like you're assuming that experienced editors are more likely to switch to visual editing than new editors. I don't think that there is any data to support your assumption.
Hypothesis 1: Experienced users are more likely to know how to switch to VisualEditor. Draftquality for new-accounts using VisualEditor will skew high, because you're measuring more experienced editors in VE vs newbies in wikitext.
Hypothesis 2: Experienced editors overwhelmingly prefer wikitext. Draftquality for new-accounts using VisualEditor will skew towards 'suck', because you're measuring more experienced editors in wikitext vs newbies in VE.
I find it hard to imagine any valid use for the results when you don't know what you're measuring. I can however imagine some invalid uses for a collection of random numbers.
Edit: Perhaps it would aid my understanding if you identified how you wanted to use the data, rather defining the data to be collected. It's the intent here, which will help me understand if I'm mistaken.
Links to repos?
Is the github.com/wiki-ai page the right place to link?
It's not a bad place to link. We keep all of our primary repos within that organization.
Labeling gadget diffs in reverse order?
I noticed that recently the labeling gadget http://labels.wmflabs.org/ui/ has reversed sides for before and after diffs, except on cases when there is a new page creation. Looks like a bug. Can someone verify? I am doing good/bad rating on lvwiki tasks.
This is a vandalism: https://lv.wikipedia.org/w/index.php?diff=2674922 But on labeling page the "after" is shown in first column.
This is a huge bug! Thank you for reporting it!
We're working on a bug fix deployment right now. It should be ready in a couple of minutes. It looks like 21 labels have been submitted since the bug was introduced. I'll be removing those. There will be an announcement going out soon.
Glad to help
Thanks again for reporting. Your timely notice was invaluable!
Rebuild 'reverted' model
Is it possible to rebuild a 'reverted' model to finetune it or does it finetune itself and with the help of new labeled edits? Thanks.
It gets fine-tuned with new edits. Is there a problem with one of the reverted models?
Thank you for your reply. Not a problem but "roughness" sometimes, so to speak (which is to be expected as the docs says) . But I was not aware that it gets finetuned itself with new edits. In any case I hope we can finish the damaging/good faith labeling on eswiki soon to have better heuristics. Regards.
So it doesn't automatically get fine-tuned with new edits, but we can always retrain the model with new data. If you'd like us to give that a try, we can add it to the backlog. It's not too difficult to do.
I'm leaving that to @-jem- because his bot, PatruBOT, is using the currently avalaible data from ORES at eswiki. If he thinks it'd be worth requesting a retrain of the reverted model for his bot to be more accurate until the damaging&good faith campaing is not active, his choice, as long as the labelling campaign do not reset and all our job gets lost.
Thanks, @MarcoAurelio. Well, as the other campaigns may still be some weeks or months ahead, and we still have some false positives in the reverted model, if the model can be improved with not so much work, I (and the eswiki community) will appreciate it. Or maybe you can give me some hints about fine-tuning the bot with ORES information beyond the use of the reverted probability. I'll keep on reading. Regards.
As long as we don't loose the already-done tagging of edits (we're at 70% now) I have no objections to rebuild the reverted model. Maybe we can exclude es:User:PatruBOT from the rebuild so its false positives do not "contaminate" the results?
When PatruBOT commits a false-positive, is the good edit generally restored via a second revert? If so, we're likely already excluding them from the model. Whenever a reverted edit is later restored by someone other than the original author, we exclude that edit as a revert example for the model.
I think most of those good edits are restored when people check their watchlists, but pages with no active watchers will remain without the edit. But the same can happen with human mistakes, so I think things are acceptable as is. And if I can help you by explaining (privately and away from vandals) how PatruBOT works, I'll be glad to do it.
Patterns based on sessions from anonymous users?
(This is a question emerging from a discussion in Catalan Wikipedia.)
Imagine this situation: an anonymous (IP) user makes 5 edits during one session. They are all subtle vandalism, introducing wrong words and concepts that require certain knowledge to detect (i.e. that Town X is not in the coast).
Now imagine, that from these five edits, three are detected and reverted, but the other two still remain in place. It could be that the other two edits are legitimate, but chances are that the other two edits done by the same user in the same session are vandalism too, just not detected.
Is ORES analyzing this type of situations already? If not, is this a pattern that could be considered?
ORES currently doesn't analyze beyond a single edit. This is something we're hoping to look into thought. For Phab:T155756, we're planning to take a whole session of edits and build features from them. Once we have that infra, we can experiment with other models too.
Sharing 'reverted' or 'damaging'/'good faith' across projects with the same language
Hello. I was wondering if we could share the results of campaigns across projects in the same language. Spanish Wikipedia and Wikibooks, for example, both have the 'reverted' model already built, however the edit tagging is being very slow. Since we operate in the same language, I assume that we can share some things; right? If not, do we have to request a reverted model for each sister project/damaging-good faith for each project as well? Thank you.
I'm not sure how a model would work cross-project. We've never tried that before. It sounds like we ought to do some research before we try it.
Forgot to watch this page!
Sorry for the delayed responses. Working through old questions now.
No more automatic bots for all pages
Only for ended and serious wikipedia pages if is necessary
Please is very ridiculous authoritarian to much fast reversions
Obsolete ORES link in preferences
w:MediaWiki:Eri-rcfilters-beta-description-ores is displayed at w:Special:Preferences#mw-prefsection-betafeatures. It and the various language versions have obsolete links on "ORES" to meta:Objective Revision Evaluation Service instead of mw:ORES. I'm an English Wikipedia admin and could create a local message there with the new link but a central fix for all wikis would be much better.