User:Krinkle/Patrolling

This subject is very close to my heart. This is written by me in the role of senior staff and operations manager of the Countervandalism Network.

Preamble
At the moment the foundation's investment in counter vandalism and quality assurance is insufficient or entirely absent (depending on the type of content). I find this unacceptable and it needs to change.

The only projects relevant to this subject that are backed by the foundation are:
 * prevention (AbuseFilter)
 * restriction (FlaggedRevs)

But those are hardly actively maintained last I checked, and they aren't designed to be used for patrolling edits. In theory you could you enable FlaggedRevs in all namespaces for all pages, but that seems like a bad idea (and it'd still be a hard-to-use interface that is non-standard from MediaWiki's core point of view, so all the other initiatives have no way of knowing about this data as FlaggedRevs has its own system independent from the patrol-flag in core).

There is a basic patrolling feature part of MediaWiki core, but it has no workable interface and is actually disabled on the biggest wikis for questionable reasons (though the community is in part to blame for the latter). The backend for this in core isn't bad and I'd like to consider that the common ground to start from.

Initiatives
We currently rely completely on community driven initiatives to fight vandalism. To name a few:


 * Cobi's ClueBot: A highly effective and self-learning bot that detects vandalistic edits. Seems like a candidate to integrate in the cluster and perhaps expose in AbuseFilter to prevent the edit if it scores above a certain threshold – instead of the current situation (running in labs and reverting edits milliseconds after saving).


 * STIki, Huggle and the like: Standalone programs that need installing on a computer. Relies on irc.wikimedia.org. Naturally don't integrate in any usable workflow, not accessible from the web (though Huggle is working on a web-app version, it would still be standalone, not integrated). They both maintain their own database for keeping track of which changes have been patrolled. Once the patrol backend is enabled on all wikis, the tools could share that. Though (as outlined below) there will be official patrolling tools, there is nothing wrong with people using these tools and they should interface with MediaWiki's API so that there are no duplicate efforts.


 * RTRC, LiveRC and the like: Gadgets that implement an interface for the core patrolling feature. It is basically an enhanced version of SpecialRecentChanges. It features a live-reloading queue of edits and/or page creations, various filters (e.g. only show edits by anonymous users between 3 and 4 PM) and inline preview of the edit and ability to mark it as patrolled.


 * CVN's database, irc bot channels and SWMT: Similar to tools like Huggle and RTRC, except the feed of "potentially interesting events" is output via IRC instead of through a web interface or standalone application. One interesting aspect is that the CVN has a public API to its database which contains a shared watchlist for all CVN patrollers (both per-wiki and globally) and a blacklist of users and IP-addresses of which activity should be paid attention to. Most of the items on the blacklist are automatically maintained by SWMTBot, whenever a user is blocked on a wiki, the user is blacklisted in CVN so that activity on other wikis is highlighted. To my knowledge this has been the most valuable system (and also the only system) to catch cross-wiki vandalism and repeated offenders. The blacklist duration is typically double the duration of the block. That way if a vandal is active on, say, nl.wikipedia.org and blocked. If during that block he goes to de.wikipedia.org or Commons the CVN patrollers will pay extra attention to his edits. This is especially useful to help catch more subtle vandalism that took a while to be discovered on one wiki and will now be immediately caught on the second wiki because the user has been globally flagged thanks to the CVN database. This allows wiki sysops, patrollers and rollbackers from different projects and languages collaborate.

And all of these (except for RTRC) either have no way of keeping track what is already reviewed or they implement their own database for keeping track of which edits are reviewed. They should be using MediaWiki's  flag, which would allow users to use any tool to contribute and work off the same queue – instead of duplicating efforts (right now each of these tools has their own database, so the same edit is flagged in all of them and "marked as patrolled" in each of them, lots of wasted effort).

By using the patrol-flag of MediaWiki core it would make the information available from the wiki interface, statistics and interoperability with other extensions etc.

In addition to sharing the same queue between different interfaces, enabling the patrol flag will also allow MediaWiki's behaviour of automatically patrolling edits that are reverted (with rollback). There's other details like this that MediaWiki takes care of. It makes no sense for the patrol backend to be disabled in production.

Workflow

 * This section is a work in progress

Here is a case study of the review workflow on Dutch Wikipedia.

Before we continue, a quick definition of terms and subroutines:
 * Checklist: A wiki page listing each day of the week divided in blocks of 2 to 6 hours (depending on time of day). Users keep track here of the progress.

Patrollers work either live or from a checklist. Those patrolling live will work from a queue that is continuously updated and sorted descending (latest edits on top).

Those patrolling from the checklist (nl:Wikipedia:Checklist countervandalism) will review the edits that were not patrolled by the live patrollers (sometimes nobody is watching, or certain edits may not seem in need of immediate review based on their history summary)


 * Reviewing an edit: Before we get into the workflows, to avoid repetition, first what it means to review an edit (e.g. when viewing the diff page in a web browser)
 * Review the edit
 * If it is problematic the user might undo it, revert it or make a follow-up edit correcting the mistake
 * If the edit is vandalism, the user leaves a message on the talk page
 * If the user should be blocked (e.g. repeated vandalism) the user does so (if he or she is an administrator) or places a block request on the administrators noticeboard.
 * Click "Mark as patrolled"
 * Live patrol:
 * IRC: Some users join the irc:cvn-wp-nl channel on IRC where CVNBot reports edits from nl.wikipedia.org (filtered to only show edits by new and anonymous users, edits by blacklisted users, edits to a watched page and other patterns). Users of this channel have the "patrol" user right. They click the diff urls in their irc client, which opens it in their web browsers in whch they review the edit. Though this is a common way of patrolling, the down side is that the log of the irc channel is fixed. Once a user reviews an edit the irc channel can't be "updated" to remove it from the queue or something like that. This leads to situations where a link is easily missed in the long queue, and the opposite happens too: duplicate efforts (many people attempting to review the same edit).
 * RTRC: Some users enable the nl:Gebruiker:Krinkle/RTRC gadget which creates an interface based on with a configurable filter and real-time updating of the queue. For example, users can choose to see the latest 30 unpatrolled edits by anonymous users. The end result is similar to the IRC channel, except that it is integrated into the wiki environment, and has real-time updating (user A reviews an edit, within a second the edit will disappear from the queue for user B as well). It also features inline viewing of the edit (no need to open a link in a new window or tab).
 * Edit patrol:
 * RTRC: When setting a different filter (chronological order) RTRC is used for edit patrol. Where the "most recent" filter is for live patrolling (where some edits will inevitable be skipped), the "oldest edits" is for taking care of the backlog. One-click patrolling allows one to rapidly work through a backlog. Though the real-time updating is great for live patrolling, when working on a backlog there is still a chance of two users entering the same edit. For this reason the Dutch Wikipedia community maintains a checklist where they divide each day in time slots. They then set the time-range in RTRC and work their way through while someone else works on a different part of the backlog.

Proposal
Officially start focussing on counter-vandalism and reviewing of contributions. Create a team within Features (or extend the EE team) for working on the review related infrastructure

Configuration changes

 * Enable by default

Projects

 * Web-compatible changes feed Implement a modern service for listening to recent changes events with machine-readable (preferably JSON) information about each event (both recent changes and log events). Keeping irc.wikimedia.org for backwards compatibility, possibly re-implemented by listening to this very feed. See also RFC/Structured data push notification support for recent changes.
 * Extension:ActivityMonitor An extension (or multiple) inspired by RTRC and CVN.
 * The special page will open a socket to the changes feed ( with fallback to polling the API if the socket is not available, if sockets are not supported by this browser, or if the wiki doesn't have a changes feed ) and start populating and updating the queue based on filters (timeframe, patrolled, change type, namespace etc. like Special:RecentChanges and RTRC).
 * Database - meant to be globally shared across wikis in the same wiki farm
 * User data (blacklist, greylist)
 * Watch patterns (page titles, edit summary regex, page title regex, user name regex)
 * API - Querying (by the activity monitor) and updating (by patrollers) of this data