Help:New filters for edit review/Quality and Intent Filters

New filters for edit review introduces two filter groups—Contribution Quality and User Intent—that work differently from other edit-review filters. The filters in these groups offer probabilistic predictions about, respectively, whether or not edits are likely to contain problems and whether the users who made them were acting in good faith. Knowing a bit about how these unique tools work will help you use them more effectively.

Based on machine learning (only on some wikis)
The predictions that make the Quality and Intent filters possible are calculated by ORES, a machine learning program trained on a large set of edits previously scored by human editors. Machine learning is a powerful technology that lets machines replicate some (limited) aspects of human judgement.

The Quality and Intent filters are available only on wikis where the “damaging”  and “good faith” ORES “models” are supported. The ORES “damaging” model powers Quality predictions, while its “good-faith” model powers Intent. (Enabling ORES requires volunteers to score edits on the relevant wiki. This page explains the process and how you can get it started on your wiki.)

Choosing the right tool
Looking at the Quality and Intent filters, you may notice something different about them. Unlike filters in other groups, the various options don’t target different edit properties. Instead, many of them target the same property, but offer different levels of accuracy.

Why would anyone choose to use a tool that's less accurate? Because such accuracy comes at a cost.

Increase prediction probability (higher ‘precision’)
The more “accurate” filters on the menu return a higher percentage of correct versus incorrect predictions and, consequently, fewer false positives. (In the lingo of pattern recognition, these filters have a higher “precision”.) They achieve this accuracy by being narrower, stricter. When searching, they set a higher bar for probability. The downside of this is that they return a smaller percentage of their target.
 * Example: The “Very likely have problems” filter is the most accurate of the Quality filters. Its predictions are right about 90% of the time. The tradeoff is that it finds less than 10% of all the problem edits in a given set—because it passes over problems that are harder to detect. The problems this filter finds will often include obvious vandalism.

Find more of your target (higher ‘recall’)
If your priority is finding all or most of your target, then you’ll want a broader, less accurate filter. These find more of what they’re looking for by setting the bar for probability lower. The tradeoff here is that they return more false positives. (In technical parlance, these filters have higher “recall”, defined as the percentage of the stuff you’re looking for that your query actually finds.)
 * Example: The “May have problems” filter is the broadest Quality filter. It catches about 90% of problem edits. On the downside, this filter is right only about 15% of the time.

If 15% doesn’t sound very helpful, consider that problem edits actually occur at a rate of fewer than 5 in100—or 5%. So 15% is a 3x boost over random. And of course, patrollers don’t sample randomly; they’re skilled at using various tools and clues to increase their hit rates. Combined with those techniques, “May have problems” provides a significant edge.

Get the best of both worlds (with highlighting)
The filtering system is designed to let users get around the tradeoffs described above. You can do this by filtering broadly while Highlighting the information that matters most.

To use this strategy, it’s helpful to understand that the more accurate filters, like “Very likely have problems,” return results that are a subset of the less accurate filters, such as “May have problems”. In other words, all “Very likely” results are also included in the broader “May have problems” set—like the bullseye of a target contained within the outer rings. (The diagram at right illustrates this concept.)
 * Example: Find almost all damage while emphasizing the worst/most likely:
 * 1) With the default settings loaded,
 * 2) Check the broadest Quality filter, “May have problems.”
 * 3) At the same time, highlight—without checking the filter boxes—“Likely have problems”, in yellow, and “Very likely have problems”, in red.
 * Because you are using the broadest Quality filter, your results will include 90% of problem edits (high “recall”). But by scanning for the yellow and orange (i.e., blended red + yellow) bands,  you will easily be able to pick out the most likely problem edits. (Find help on using highlights without filtering.)

Re-use your settings
Use the above example as a jumping-off place for your own experiments. Find setting combinations that work for you. When you do, you can save your settings and re-use them. To do so, simply set everything as you want it, then copy the page URL and save it in a document someplace. Clicking on the URL later will reinstate all the settings that were in effect when it was copied.

This technique works on mobile browsers, too, even though the new user interface for filtering doesn’t display on mobile currently. Even without the interface, all your settings will be activated.

Find the good (and reward it)
Good faith is easy to find, literally! So are good edits.

The “Very likely good faith” filter and the “Very likely good” (Quality) filter give you new ways to find and encourage users who are working to improve the wikis. For example, you might use the “Very likely good” filter in combination with the “Newcomers” filter to thank new users for their good work.

Or, since research shows that new users are particularly vulnerable to having their edits reverted, you might use the settings below to find new users who are making mistakes but who are, nonetheless, working in good-faith—and then offer constructive comments and support.
 * Example: Find problem edits by good-faith new users
 * 1) With the default settings loaded,
 * 2) Uncheck the Type of Change filter “Logged actions” (because we’re not interested in this type of change)
 * 3) Check the medium-level Quality filter, “Likely have problems.”
 * 4) Check the Experience Level filter “Newcomers” (this has the hidden effect of limiting your results to registered users).
 * 5) Highlight—without checking the filter boxes—the User Intent filters “May be bad faith”, in yellow, and “Very likely good faith”, in green.
 * Filters: All edits in your results will be by Newcomers (users with fewer than 10 edits and 4 days of activity). The “Likely have problems” filter has a medium accuracy, so a little less than half of the results should have some kind of problem.
 * Highlighting: The highlight settings demonstrate an advanced technique. Your results should show some edits colored both yellow and green. How is it possible for something to be both bad faith and good faith? It can't, but both filters may apply to edits in an intermediate range of good-faith scoring because the medium-level bad-faith filter, “May be bad faith,” overlaps somewhat with the “Very likely good faith” category (the diagram above illustrates the concept).  The pure green edits are those most likely to be good faith. The yellow-green ones are in the middling, overlapping zone; most are probably good faith, but a small portion may be bad. Combining colors strategically like this can give you more information to work with.

Good is everywhere!
The “good” filters mentioned above are both accurate and broad, meaning they aren’t subject to the tradeoffs described in the previous section (they combine high “precision” with high “recall”). These filters are correct about 99% of the time and find well over 90% of their targets. How can they do that?

The happy answer is that the “good” filters perform so well because good is more common than bad. That is, good edits and good faith are much, much more plentiful than their opposites—and therefore easier to find. It may surprise some patrollers to hear this, but on English Wikipedia, for example, one out of every 20 edits has  problems, and only about half those problematic edits are intentional vandalism.

Contribution quality predictions
Very likely good

Highly accurate at finding almost all problem-free edits.

May have problems

Finds most flawed or damaging edits but with lower accuracy.

Likely have problems

Finds half of flawed or damaging edits with medium accuracy.

Very likely have problems

Highly accurate at finding the most obvious 10% of flawed or damaging edits.

User intent predictions
Very likely good faith

Highly accurate at finding almost all good-faith edits.

May be bad faith

Finds most bad-faith edits but with a lower accuracy.

Likely bad faith

With medium accuracy, finds the most obvious obvious 25% of bad-faith edits.