Topic on Talk:Edit Review Improvements/New filters for edit review

Suggestion: Possible new filters

12
Summary by 197.218.89.34

Automatic summaries (T167656),Popularity metric (T167655),  No of editors (T167696),

Blocked users (T167698)

197.218.82.118 (talkcontribs)

Disclaimer: This is not a suggestion to add all of these, rather a listing of possible ideas to implement.

> Please let us know on this page if you have any good ideas for more new stuff.[1]

Perhaps looking into the recent changes backlog (https://phabricator.wikimedia.org/maniphest/query/cXue32spU5oz/#R) might be useful to evaluate to see if there are any things that are simple to add to the existing ones.

As a general note, filtering by some page metadata (e.g. Pageprops) might be useful:

  • Disambiguation
  • Redirect
  • Page creator
  • Automatic_edit_summaries - This might be useful to see clear vandalism, e.g. page blanking, changing almost all content, etc. The easiest option would be to add standard tags for all of these, although tags can be removed by admins, so it has drawbacks, maybe "irremovable" tags would work...

Here are more future looking ones, some of which are possibly not feasible:

  • Popularity metric - a percentage score (wikirank!!), maybe using some metric based on pageviews, links, interlanguage links, interwiki links and search rank .
    • Page views - popular pages are probably a huge incentive for spamming and vandalism
    • Search rank - is this an often search page?
    • links - Does this page have a lot of incoming links?
    • interlanguage links - an important topic will likely have many links
  • Number of editors - A page that has only been edited by 1 person is more likely to have problems such as proofreading issues.
  • Category - if page has a category or not
  • Page image - it might be useful to know if a page has images
  • links - if page has links (a good page must have at least 2 links) and links must not be an orphan
    • backlinks - if page has links to it (not orphan)
    • internal links - if the page has links to local pages
  • references - if the page has references, a wikipedia page must always have these.
  • editor metadata
    • Block status - is the page creator banned ? Maybe they are banned for vandalizing. All pages they edited may need re-review
    • Creation / deletion ratio - if someone creates 1000 pages, and 800 are deleted, that shows likelihood of them not being good editors or not producing content that wiki users want.

Generally, the future looking ones would be a perfect way to bring some of the functionality of Extension:PageTriage (see Special:NewPagesFeed) to all wikis without all the extra cruft it has that nobody needs. It might be worth looking into where it succeeds and where it fails.

It might also be useful to look into a concept of MVP (Minimum viable page) score filter. A useful page on any wiki needs at least 2 things, enough content and a category. An MVP on a wikipedia needs content, a category, references, and internal links, so this might be customizable per wiki type.

Along with other ideas, filtering using this MVP score and Popularity filters would be pretty useful.

Trizek (WMF) (talkcontribs)

Thank you for this big list of possible improvements! on the comments I've written below, I'm sometimes challenging your suggestions a bit, just to see if we can go deeper, or if it is not an idea that can be merged with another or solved in a different way - nothing personal. ;)

Filter by "metadata" will be soon possible, by the integration on namespace, users tags filtering. For example, you will be able to filter disambiguation pages (through categories).

I retain the option of automatic summaries, to see if it can be added to tags - T167656

Popularity metric is an interesting option, to keep an eye on pages that are trendy but not visible. - T167655

Incoming lists may not trigger the pages you expect. All paths lead to Rome, and all Wikipedia articles lead to philosophy. ;) Don't you think the popularity metric will solve this? Or may be weight by the number of incoming links?

Interlanguage links may be artificial, given the fact that may cities or butterflies have multiple articles, but created by bots. So is it relevant?

Filter by number of editors may give some falses positives: you can have a problematic page edited by 4 users; one for the whole content, one for the categories, one to quickly add banners and the last one to fix a typo. This is a very common cases we have observed on English Wikipedia when some people were gathering data about the recent articles pending revisions and it is also something I've observed on French Wikipedia. Those fixes are not systematically done using an automated (and tagged as such) editing tool.

Can you give me an example why a filter should be helpful on a page which hasn't images, categories or is orphan? Don't you think those cases can be solved by using tags?

Editor metadata may be handled with the future namespace filtering, where you will be able to filter a particular user contributions (like in Special:Contributions). A look to the logs already gives you what you expect (deletion ratio, status...). But I may not see the context.

MVP is an option that will open big debates. The expectation may vary from each wiki, and may vary from each user (my volunteer contributor aspect of my self doesn't agree with your "enough content and a category" idea for instance ). Our idea is to give to reviewers an important number of filters so they can decide on what to search. Force a MVP definition may be a blocker, reviewers may be aimed to reject all cases that are not perfectly matching. Unless if I'm wrong, define filters to identify article without sources or categories and filter those tags does the job.

197.218.80.218 (talkcontribs)

Summary: Popularity rank is the most useful metric / filter, but others may still be important . Sorry for the long reply...

>Don't you think the popularity metric will solve this? Or may be weight by the number of incoming links?

I would say that it needs to be part of the weighting formula. Pageviews may happen suddenly for any number of reasons, a bot or some random news. But page links are carefully curated and harder to randomly insert to any page. Interlanguage links along with internal links and a popularity metric will give a much better view of popularity (or importance), compare :

Barack obama probably has more views than tree, air or water. But in the grand scheme of things, we can live without barack obama, but we can't live without water or trees. Indeed it is more important in encyclopedic terms to detail what water is than who barack obama is.

>Interlanguage links may be artificial, given the fact that may cities or butterflies have multiple articles, but created by bots. So is it relevant?

This is partly true, perhaps making it posible to filter pages based on page quality or assessment e.g. (Extension:PageAssessments) or pages with badges might help it stand out more.

> Filter by number of editors may give some falses positives

Indeed, but not all wikis are wikipedias. Wikis such as mediawiki.org or wiktionary don't use as many bots, so a whole page might only be edited by one user. I'd say that it depends, for a massive wiki such as wikipedia, it might be useful to have ranges, e.g. page edited by 1 - 5 users, 6 - 20, more than 20 users.

>Can you give me an example why a filter should be helpful on a page which hasn't images, categories or is orphan? Don't you think those cases can be solved by using tags?

Tags should suffice, but currently the software doesn't list any, unlike pagetriage.

Editor metadata may be handled with the future namespace filtering, where you will be able to filter a particular user contributions (like in Special:Contributions). A look to the logs already gives you what you expect (deletion ratio, status...). But I may not see the context.

Say an editor creates 30 pages and edits 40 pages, and 20 might be deleted as vandalism. Going to the special:contributions of each person is a massive waste of time, especially if there are several people doing this independently. The people may then be banned, but the pages remain.

Unless if I'm wrong, define filters to identify article without sources or categories and filter those tags does the job.

This isn't really accurate, every page on every wiki has a certain expectation. Indeed, a page is only counted as a "page" by mediawiki when it has a link, content and isn't a redirect (https://www.mediawiki.org/wiki/Manual:Article_count).

I'd say it is undisputable that on every wikipedia every encyclopedic article should also have at least a reference (<ref>) to ensure a neutral point of view (NPOV) as far Wikimedia (https://meta.wikimedia.org/wiki/Founding_principles) is concerned (https://en.wikipedia.org/wiki/Wikipedia:Five_pillars).

Some projects (wikinews, wikidata) don't really need that, and they can simply ignore the filter or it can be disabled there.

Trizek (WMF) (talkcontribs)

Your reply is indeed long, but it is not a problem because it desserves it. :)

I've added your idea of weight by links and interwiki links to the ticket about metrics.

Concerning filtering the number of people who have edited an article, I honestly don't know how to define the need. It is really a need for a Recent Changes page? You want to check if an article is not made only by one user alone. You may go faster through a bot request or another tool to check on the quality through time. Do you have an idea how to describe the need to someone who hasn't read our conversation?

About banned users, an filter to only display contributions from blocked users would be enough?

A <ref> tag is not a prove of NPOV: it can be a simple footnote, or the article can be made of three content line, plus a sections for the bibliography. :) Again, a tagging to identify pages without a ref tag, will do the job. You can then filter the tagged edit on the RecentChanges. Those tags can be set for each wiki, according to their (reasonable) needs.

197.218.80.218 (talkcontribs)

>  Do you have an idea how to describe the need to someone who hasn't read our conversation?

As a user, I'd like to see which pages are edited by :

  • many users so that I can easily identify highly edited pages, which may be experiencing high rates of edit wars, vandalism or spam
  • few users - so that I can easily identify low edit rate pages that may need improvement or proofreading or removal

I'm guessing that this would be valuable for admins too (talk pages with many editors may indicate a heated discussion related to hoax or harassment on a article).

> About banned users, an filter to only display contributions from blocked users would be enough?

Yes, I suppose it might be enough. It is probably too much of a hassle to calculate the create / delete rate for each user, and might be better done by another tool.

> A <ref> tag is not a prove of NPOV: it can be a simple footnote, or the article can be made of three content line, plus a sections for the bibliography

Indeed very true, I had forgotten about the possibility of empty or innaccurate references.

The general idea of filters is to provide tools that will indicate pages that likely need help. For example, if an article has 200 references, and someone removes all of them, then it is likely vandalism, unless of course they are spam, but either way they need to be verified. It might be generally useful to add filters that indicate when a "page" becomes an "article" (as defined by mediawiki) and when it ceases to be (e.g. removed all links). The MVP concept would be great for this, but if it is controversial then at least the default Mediawiki definition of an article can be used as a filter.

Using abusefilter for this (tagging) is overkill in some cases, and many wiki admins / users may not even know how to do so.

Trizek (WMF) (talkcontribs)

I've created the ticket for most edited and less edited pages: T167696

Concerning the blocked users, I've created a ticket as well (T167698). The filter will at least give you a list of all edits made by a blocked users. Combined with ORES (available on certain wikis), you may be able to list the most problematic contributions, or maybe see unjustified blocks. :)

Don't you think the MVP can be handled with ORES support? I don't know if you have tried it, but cases like the ones you describe seems to be handled by it.

Using AbuseFilter can be done finely. Maybe improve the documentation, allow people to easily understand this system and copy best practices would help.

197.218.80.218 (talkcontribs)

> Don't you think the MVP can be handled with ORES support? I don't know if you have tried it, but cases like the ones you describe seems to be handled by it.

Ores works well where it is supported. The problem is that it is not supported everywhere and seems to require considerable effort by many users to do so. So far less than 2 dozen wikis seem to be supported. Many wikis seem to have too few editors to generate unbiased data for ores, and even big wikis are too different from normal ores to be used. For instance, ores doesn't support commons (big in terms of media), and for some strange reason it isn't even enabled in mediawiki.org, considering that it would be pretty easy to identify vandalism here.

The biggest weakness of ORES is that it doesn't have a "standard" configuration that can be used in any wiki by default. Some type of vandalism is pretty much the same in every language, e.g. people repeating the same text 300 times ("blah buu blah buu blah buu blah"), people replacing everything with something completely unrelated, e.g. replace all content with 2 or 3 images, people spamming the same text in multiple pages. Neither ores nor abusefilter seem to be really good at all at detecting this.

General recent changes tools are much better due to the ability to immediately be enabled on any wiki with less effort.

Trizek (WMF) (talkcontribs)

ORES is supported by more and more communities; they just need to have a little bit of help, for motivation. :)

Same text typing or big removes can be detected by AbuseFilters. The other weaknesses of ORES you describe can't be handled by an existing tool, I'm afraid. The new filters are "just" an improvement of some existing RecentChanges page features, which have been refined, and ORES service integration. We hope those changes will ease people's life on all wikis.

197.218.89.34 (talkcontribs)

> The other weaknesses of ORES you describe can't be handled by an existing tool, I'm afraid.

It is dead easy to catch many of these, because as stated previously, some of these edits will change the definition of the page, it will change from being an "article" to becoming a "page", and will essentially lower the {{articlecount}} of the wiki whenever they happen. Perhaps it is a task for future tools.

Anyway, thanks for taking the time to read the long ideas. It seems like most of them are covered by phabricator tasks, and maybe the developers / designers will find some of them useful and feasible .

Trizek (WMF) (talkcontribs)

Thank you for your suggestions ! They are really appreciated. :)

AS (talkcontribs)

Hi. I'd also appreciate filter by any user group (for example, I din't found filter for edits made by admins)

Trizek (WMF) (talkcontribs)

Hi @AS, I've documented your idea on T168887.

Reply to "Suggestion: Possible new filters"