Page Curation

'''This document is a work in progress. Comments are appreciated but this is not a final draft.'''

This document describes the design of a new interface for triaging "New Pages". This document is a work in progress. Feedback is welcome on the talk page.

This project is envisioned with multiple phases. By necessity (both of resourcing and change control), the project has been split into sections.

The design as published is incomplete and has many holes. This document is intended as a starting line for discussion about improving the overall experience of patrolling New Pages

Notes on Nomenclature
This document has developed a new term, "triaged". We believe that "triage" is a more descriptive term than "patrolled." Further, it does not evoke feelings of militarism or police work; rather, that of a doctor trying to save patients rather than prevent them from treatment.

This document will refer to "New Pages" as any page that has not been marked as "Patrolled". Edit count on the page is not taken into consideration.

Further, pages that are marked for speedy deletion will be simply referred to as "marked for deletion"; these pages are also technically marked as "patrolled".

For the sake of verbiage, this document will assume three states of an article:


 * Unpatrolled - The article has not been marked patrolled.
 * Marked for Deletion - The article has had one or more speedy deletion criteria applied, or has been nominated for BLPprod or even AFD.
 * Patrolled - The article has been marked as "patrolled". Tags requesting improvement may or may not have been applied.

Rationale

 * New Page Patrol is a complicated process that is poorly supported by the MediaWiki software itself.
 * No two patrollers seem to utilize the same process.
 * Users who perform New Page Patrol report high levels of frustration and burn out due to feeling overworked for these reasons:
 * Inexperienced patrollers aggressively over-template, requiring work to be rechecked;
 * They often don't identify or fix major problems with new articles, requiring work to be rechecked
 * Too few users choose to become patrollers
 * Because education about the patrolling process is difficult
 * Because optimizing a system for page patrol is a "Power User" job, requiring greater-than average computer savvy as well as (oftentimes) downloading of third party software

Hypotheses
A native, easy-to-use interface for New Page Patrol would:
 * increase the number of users who choose to become patrollers, reducing workload.
 * help establish better education about the process as is, resulting in lower "false positive" rates.
 * allow expansion and modification of the system to support different backend systems and logic screens.
 * serve as an engagement point for mobile and tablet users, for whom editing is currently not feasible.
 * utilize positive messaging features to reduce new editor bite, thus promoting editor retention.

Feature Requirements

 * Track the users who have triaged a page and the dates that they did so.
 * Provide a list view of New Pages.
 * This list must be filterable.
 * This list must easily show the state of a page, whether or not it has been triaged.
 * This list view must provide as much useful information as possible about an article.
 * This list view must allow for selection of multiple articles to be brought into the "zoom" view.
 * Provide a pageable, easy-to-use, and intuitive "zoom" interface that allows page examination and tagging in situ
 * This interface must provide meta-data about the article
 * This interface must show the article in the interface
 * This interface must be pageable without leaving the interface
 * Ideally, the interface's "paging queue" will be smart and modify itself according to behaviors of other patrollers and their work.
 * This helps to prevent a race condition wherein two patrollers work on the same article simultaneously, and generate edit conflicts.

Triage Principles
In order to combat the problem of inexperienced users marking pages as "patrolled" and requiring their work to be double checked by other, more experienced patrollers, the process of triaging a page to "patrolled" status will change. Pages may become "fully triaged" (patrolled) when one of the following criteria have been met:


 * A user with the PAGEPATROLLER right has marked it as "Triaged" (see below); or
 * A community-defined number of users without the PAGEPATROLLER right have marked it as "Triaged". This number should be between 3 and 5.

PAGEPATROLLER User Right
A new userright, PAGEPATROLLER (name change possible) will be created. This is a userright that must be granted by an administrator (or possibly others with the same userright). Users with this right are assumed to know what they are doing with regards to page triaging and will not require their work to be double-checked.
 * The PagePatroller user right was opposed by a small majority in the NPP survey and is unlikely to get consensus support on EN wiki.

The "Triage Stack"
Currently, problems exist with users selectively editing from either the front of the queue or the back of the queue. This results in too many pages reaching an atrophied state of attention within the "middle" of the queue. Further, when patrolling, edit conflicts and duplicative work are rampant (since most patrollers will be operating in near to the same space).

A solution is proposed, therefore, of a "Randomized Stack" of pages that are keyed to a single user within a "Triaging Session". Upon choosing to start a session (by selecting a one or more articles for triage), the user's "stack" is stuff with anywhere from 5 to 10 random pages through the entire queue. These pages are "tagged" to the user in question and cannot show up in another person's stack (for the period that the session lasts, or until the page has been marked as partially-triaged).

Front or Back Stack Flow Mechanics
When a user chooses to operate in the front or the back of the queue, the stack will automatically add the next item in the queue that has not already been claimed. In the case of a race condition (two or more users starting at the same time), the next N pages will be shuffled and dealt to each user.

Selected Article Stack Flow Mechanics
From the List View, users may define the contents of their stacks from the List interface by selecting articles to be held. Once these articles have been selected and submitted, they are considered "claimed". In the case of a race condition, articles that have already been claimed will be silently removed from the user's stack.

User Experience: List View


The proposed List View interface explodes the current "unpatrolled" list into a more readable and scannable format.

Filter Mechanisms
Ideally, there will be multiple ways to filter the List Interface:


 * Show/Hide Triaged Pages
 * Show/Hide Bot pages
 * Show/Hide redirects
 * By Creator Username
 * By Namespace
 * By Category
 * By WikiProject (ohman, this would be awesome)

Bulk Selection
Currently, many users who perform patrolling simply go down the list of new pages and open pages that they wish to patrol in new tabs. This is inefficient.

The List Interface allows the user to place a check next to each entry that they wish to add to their Triage Stack. The user then clicks the "Triage Checked" button and is immediately brought to the "Zoom Interface", centered on the first item in the list.

Individual Entries
Each entry within the List View contains the following elements:


 * A "Bulk Selection" checkbox
 * The Page title, along with its size and number of edits
 * A count of images and categories.
 * If there are no images or categories, this shall be called out in bold and red
 * If the page is an orphan, this too shall be called out boldly


 * The date the page was created
 * The user name of the page creator, his or her edit count, and when he or she started editing;
 * The summary message of the creation
 * A "Triage" button
 * A "Triaged" or "Not Triaged" indicator.
 * If a page has been fully triaged, a green checkmark will appear
 * If the page has not been fully triaged, a red alert mark will appear
 * The user names and edit counts of the individuals who have triaged the article will appear to the right of the icon.
 * If any one of these individuals has the PAGEPATROLLER right, this will be called out.

Clicking on the "Triage" button next to any page entry will bring the user to the "Zoom" interface, centered on the selected page. The user's stack will randomly auto-populate in both directions (nominally by one entry only).

The List Interface is envisioned to be infinitely scrolling. The "Triage Checked" controls will persist at the top and the bottom of the page, but scrolling within the page will infinite scroll through the entire queue.

User Experience: Zoom Interface
Currently, New Page Patrol requires that all actions taken on an article from the list interface happen on a separate page outside any specialized patrolling interface. Alternatively, the "zoom" interface is a close-up, actionable interface for New Page Patrolling. It is heavily AJAX-dependant, so Javascript is required.

Queue Direction
The default load of the queue "direction" (oldest-to-newest versus newest-to-oldest) is a complex topic because the needs of the queue direction change. One feature of New Page Patrol is that the newest entries have to be reviewed first, while the oldest ones are left to languish (and cause the queue to grow longer). There has long been a need to recruit more experienced patrollers to focus on the back of the queue once they've become proficient at patrolling at the front of the queue.

The bulk of the most "scandalous" pages and revisions are found and dealt with at the front of the queue. These include vandalism, attack pages and others that could also possibly create legal problems such as unsourced negative BLPs. The review of these edits is a high priority, but as most of these are simple clearcut issues it is an easier task than dealing with the residue at the back of the queue. Most new articles including almost all badfaith ones are patrolled or tagged for speedy deletion while at the front of the queue, and usually the ones that are unusual or borderline are left to the more experienced patrollers at the back of the queue. CorenSearchBot tags the copyvio ones to be resolved mid queue.

As currently designed, the Zoom interface works from the rear of the queue, with the option for the user to switch directions at any time. However, to be acceptable on EN wiki this default needs to be changed, as it would be unacceptable to make a change that resulted in hundreds of extra attack pages on Wikipedia at any one time, and also we don't want to restrict NPPzoom to only be used by the experienced editors who are ready to work the back of the queue..

One option - a difficult one, resource-wise - would be to develop an automated system that would be able to detect certain patterns of text that are typically associated with attack pages (e.g., "SO AND SO IS GAY!"), or the editors previous pages include ones deleted G3 or G10 and then mark the new page as "Suspect". Suspect pages would then form their own sub-queue, and would float to the top regardless of the direction that the user is currently patrolling.

A second option would be to change the default according to the userrights of the patroller - reviewers would default to the back of the queue and others including newer editors would default to the front.

A third option would be to make this a user preference that would default to the front of the queue, but with perhaps a bot message after a certain number of patrols, or a barnstar and review from an admin saying that an editor is now accurate enough that they might want to try the back of the queue.

Interface Layout
In the Zoom interface, the user is presented with a dynamic screen that consists of three primary elements and several secondary, context-sensitive elements:


 * Interface Filters and Meta Information - this section (at the top) includes controls that allow the user to change the filters surrounding the list of pages that have entered the queue as well as providing additional meta data that is of use.
 * Article Viewing and Tagging Pane - this pane is context aware and associated with the article that is being reviewed. This section has several sub-components:
 * Article Metadata - size, create date, incoming links, etc.
 * User Metadata - creator, information about that user, etc.
 * Article Viewer - displays the article itself
 * Patrolled Tagging Pane - provides an easy-to-use pane for tagging articles for improvement
 * Deletion Tagging Pane - provides an easy-to-use pane for tagging articles for deletion
 * Pagination Controls - Two sections, one at the top and one at the bottom, where the user can simply skip to the next or previous article in the stack (which are shown to the user)

Workflow
Currently, it is assumed that there are three possible actions a Patroller can take when viewing a page in the Zoom interface:


 * 1) Ignore the article - the User clicks "next" or "previous" and skips this article.  The article remains unchanged.
 * 2) Nominate the article for deletion - the User selects one or more of the common tags for deletion and then clicks the appropriate "Mark and Next" button.
 * 3) Mark as Patrolled - the User can select zero or more of the common tags to mark the article as needing improvement and then clicks the appropriate "Mark and Next" button.

Ignoring an Article
If the user opts to ignore an article, the currently viewed article will not change state. The article viewing pane will be replaced with the next or previous article, depending.

Nominating an Article for Deletion
The user will select (via checkboxes) all appropriate "Deletion" tags. Some tags are multi-leveled (e.g., there are child tags for more specific cases). The system will be smart and only select the correct tags if they exist within the tree (done as flyouts), but the root level checkbox will remain.

The system will then insert the tags onto the page and mark the article as patrolled.

Attempting to nominate an article for deletion without selecting one or more tags will result in an error.


 * The system will automatically inform the creator of the article that the article has been nominated for deletion. It will do this by leaving a note on the user's talk page, which for most editors who have enabled email will result in them getting emailed.

Marking an Article as Patrolled
Marking an article as patrolled will do just that. Selecting additional tags for improvement is not required: some articles are fine just the way they are when they enter the system.

If a tag is selected, the proper template will be inserted upon clicking the "Mark and Next" button.

Other possible improvements
These are flaws in the current system or requests for changes:

For initial phases of the implementation, the tool can work within the existing template/tag system by automatically adding templates to the article
 * New users would be gradually taught to patrol correctly and could work with what they feel comfortable with, eventually graduating up to areas of additional difficulty
 * Includes automated systems to aid in patrolling
 * Includes a more crowd-sourced, moderation-queue like process
 * This will increase work-load overall, but probably decrease it per-user
 * Has multiple flags other than simply "patrolled" vs. "not patrolled"
 * tagged for deletion so that if a deletion is declined the article reverts to unpatrolled
 * submitted by an editor whose previous article was deleted as badfaith (hoax or attack)
 * second opinion requested
 * Allows for the re-viewing and flagging of an the article in situ
 * Could easily be used on tablets and mobiles
 * Gesture support would be awesome
 * New article reports for the 700 or so wikiprojects would get more experienced editors who are interested in the various subject areas. If these reports were in effect special new pages filtered by wikiproject then it would bring extra patrolling to the mid queue. See Meta:Research_talk:Patroller_work_load
 * A less convoluted and much more efficient way to do this is to simply display the mark as patrolled box to any autoconfirmed editor who views the article. Active WikiProjects and many Wikipedians who have a special interest already look at the new articles that appear in the categories they are interested in, and do much of the mid queue deletion tagging. But currently they don't get the opportunity to mark the acceptable articles as patrolled.
 * Notice when an article is moved and keep the moved article in the queue not the redirect. (This is an annoying bug in the current system)
 * Don't have a cutoff for unpatrolled articles. Currently anything that makes it to thirty days is automatically patrolled. We need a system that has no cutoff.
 * When people convert redirects into articles the Newpage patrol system should treat that as a new article. Currently this is a bit of a loophole (see 16705)
 * When people move drafts from userspace to mainspace the new page patrol system should treat the newly moved page as a new article (moves within mainspace are different - but any page moved in from other spaces is potentially a new page sidestepping newpage patrol) Currently this is a bit of a loophole (see bug 12363 )
 * we need an intelligent edit filter that notices when someone is creating an article in the wrong language and can say to them, in the language they are writing in: Hi I just noticed you were about to add an article written in Arabic to the English Wikipedia. Would you rather add that to the Arabic Wikipedia. Yes take me to AR wiki|No, I'm going to translate this into English before I publish it on the English Wikipedia. On EN wiki alone we get several editors every day who make this mistake, so easily over a thousand newbies a year who we could give a much better start to.
 * We also need and edit filter to deal with the large proportion of newbies who are from the copypaste generation and need to be taught that writing your own words is not the same as copying from other websites. Currently we do this laboriously and painfully by bots checking after the event, and people sweeping up after the bots. But that is a design from a different internet era. What we should have today is an edit filter that incorporates the search that corensearchbot does, so if someone clicks save on a paragraph of new text the system can spot that this is a straight copy of foo.com and explain to them why we don't do that. I think that would be less bitey for the newbies who need to be taught about Copyvio, and less work for the rest of us. Apparently this would require changes to Mediawiki and at least for the next few years there would be a serious processing overhead, so initially you would have to throttle this to some random newbie created articles. But as Moore's Law kicks in the filter can be unthrottled.

Currently:


 * No way to tag an article for improvement but not mark it as patrolled. This needs to be added because there will be articles where a patroller is sure it is unreferenced but not sure whether to mark it as patrolled. Or where a patroller has adopted the "if in doubt categorise" policy. It is essential that articles are only marked as patrolled when someone consciously thinks tey are fit to be so marked.

Mouseover tooltips should show a preview of the intended template uw message (see talk, re Twinkle)
 * Tooltips


 * Meta info
 * Should show if a page under that title has already  been deleted.
 * Should show if the creator's talkpage has previously  been relinked, and official  warnings, or if creator has been previously blocked. (Possible through AJAX?)

Training
Video tutorial  for page patrollers. (currently under experimentation)