Topic on Talk:Edit Review Improvements/Proposed Huggle improvements

Queue ordering and the value of icons

4
JMatazzoni (WMF) (talkcontribs)

Thanks so much @Excirial for your feedback. It’s essential we hear from people like you so that we avoid just the type of errors you’re describing. I wanted to start this thread to discuss one of the specific issues you raise. You write:

“For me using Huggle equals looking at the displayed diff while using keyboard shortcuts to navigate the edit queue. I wouldn't take note of any added queue icons or the ORES score….I am manually checking each edit so a machine evaluation is redundant.”

So it sounds like you generally examine edits in the queue sequentially; you don’t pick and choose based on Huggle’s icons or other clues. This is good feedback. You don’t say which filtering option you use for the queue. I’m going to guess it’s “Filtered edits,” which sorts by Huggle score? Is that right?

If so, then at the top of your queue you're seeing the edits Huggle's internal algorithm thinks are the most likely vandalism, followed by less and less likely candidates. So that would explain why going in order works for you. But what if I said there is research to suggest that ORES’s quality predictions (based on its “damaging” test) provide more reliable predictions? (I think this is true; @Halfak (WMF) can comment).

Assuming that’s so, then a queue sorted by ORES score could help you to more quickly eliminate the worst edits, putting your efforts where they'll make the most difference.  What might work for you, I'm thinking, is if we simply offer one or more alternative sorting options based on the ORES scores. Then you could try them and decide which options you prefer. Would that be interesting, do you think?

It sounds like you find Huggle's queue icons irrelevant. I wonder if other users feel differently? There must be a reason why this complex system evolved? I’d be interested in hearing from users who do look at these icons for clues. What is useful, what not? If there are such users, I’m inclined to believe that implementing the ORES icons could still provide valuable data for some. (Though I assume we’ll  let users turn them on or off as they prefer.)

Excirial (talkcontribs)

Hello @JMatazzoni (WMF),

The Huggle queue i use is indeed the default filtered queue that lists all edits except for edits made by whitelisted users. Beyond this i have the page history and user info tab closed and ORES plugin disabled in order to minimize the amount of network traffic and parsing required for each diff.

Before disabling the ORES plugin I did monitor the scores for some time though. In most cases the scores seem to be decent, but i thought there were too many false positives and negatives to base anything beyond a queue sorting on them. For example there is currently a wave of vandalism that adds cat-like typing to article's (Eg: "Zmmajjsnd klksmww Oskkdmma wqpidif"). While clearly non productive to a human editor the ORES score for these edits generally seems to be in the 0-150 range. The inverse also occurs: I saw a new user wikilinking a few general words and receive a 500+ ORES score for his effort. In the latter case i am a bit concerned that adding "Likely bad" icons may cause a patrol to go along with the machine assessment too rapidly and revert without thinking.

Using ORES to sort the edit queue may work though i would note that Huggles queue mechanic is suboptimal for this. In my case i can generally keep up with the queue so there wouldn't be a lot to sort. Ignoring this for a second i'd note that Huggles queue is limited to 200 items before it stops the recent changes feed (default setting) which would lead to lost edits regardless of scoring. The alternative setting is to let Huggle trim the queue by removing the oldest edits. In this case one could possibly use ORES to - for example - remove the lowest scored edits and retain likely vandalism edits in the queue.

This mechanic would have its own drawback though. Huggle is a fantastic piece of software, but one major issue is that it synergizes horribly. If two patrols use Huggle at the same time both will look at and evaluate the exact same edits with the only variance being where they are currently in the queue. While two sets of eyes is generally a good thing, ten concurrent patrols would lead to a lot of wasted effort. If all these patrols would sort based on ORES score we can be nigh certain they would be looking at the exact same queue.

One Wikipedia addition that i have long been hoping for to counter this - and i apologize as i may be straying somewhat out of the boundaries of this subject - is an aggregated edit feed that tracks unreviewed edits. In other words: Pretty much the ReviewStream edit review improvement extended with the ability to track unreviewed edits and the ability to provide extended feedback on an edit as a user. I imagine the reviewsteam returning edits until a quorum (perhaps two?) editors viewed a a specific edit and found it not problematic. Not only would this solve a major issue i have seen for years (Excessive vandalism patrols wasting time on the same edits at one hour and none reviewing whatsoever on another hour), but I can imagine it being employed for other types of feedback as well. For example: More than once i see diffs where a user seems to struggle to achieve something. I try to leave those users an invitation to the teahouse but i can never really follow this up. If one could simply flag the edit as "Promising new editor" another tool or editor could fetch these edits / flagged editors and follow up on them. Heck, i can imagine a situation where a user would set Huggle's queue to "Promising new editors". This would in turn query the reviewsteam for all new editors registered as good which haven't been welcomed by another user yet.

If anything i believe that providing a means to cooperate with other users would be more valuable than providing an individual vandalism patrol with more details regarding an edit.

Excirial (talkcontribs)

Correcting pingback and fixing some typo's.

Smalljim (talkcontribs)

Great thoughts on the next generation of AV programs, Excirial. I really think a new approach is needed instead of trying to ginger up the faithful old workhorse.

Reply to "Queue ordering and the value of icons"