Page Curation

'''This document is a work in progress. Comments are appreciated but this is not a final draft.'''

This document describes the design of a new interface for triaging pages in MediaWiki. This document is a work in progress. Feedback is welcome on the talk page.

This project is envisioned with multiple phases. By necessity, this bulk of this document focuses on what is currently targeted as "release one" of the Page Triage project. Notes about additional releases follow.


 * Wait, what? I thought this was called "New Page Triage".
 * Well, it was until today. We decided the technology should be broader than this, so, in fine WMF form, we've renamed it. Now you know we're serious.

Notes on Nomenclature
This document has developed a new term, "triaged". We believe that "triage" is a more descriptive term than "patrolled." Further, it does not evoke feelings of militarism or police work; rather, that of a doctor trying to save patients rather than prevent them from treatment.

This document will refer to "New Pages" as any page that has not been marked as "Patrolled". Edit count on the page is not taken into consideration.

For the sake of verbiage, this document will assume two states of an article:
 * Unpatrolled - The article has not been marked patrolled.
 * Patrolled - The article has been marked as "patrolled". Tags requesting improvement may or may not have been applied.

Rationale

 * "New Page Patrol" is a complicated process that is poorly supported by the MediaWiki software itself.
 * No two patrollers seem to utilize the same process.
 * Users who perform New Page Patrol report high levels of frustration and burn out due to feeling overworked for these reasons:
 * Inexperienced patrollers aggressively over-template, requiring work to be rechecked;
 * They often don't identify or fix major problems with new articles, requiring work to be rechecked
 * Too few users choose to become patrollers
 * Because education about the patrolling process is difficult
 * Because optimizing a system for page patrol is a "Power User" job, requiring greater-than average computer savvy as well as (oftentimes) downloading of third party software

Hypotheses
A native, easy-to-use interface for New Page Patrol would:
 * increase the number of users who choose to become patrollers, reducing workload.
 * help establish better education about the process as is, resulting in lower "false positive" rates.
 * allow expansion and modification of the system to support different backend systems and logic screens.
 * serve as an engagement point for mobile and tablet users, for whom editing is currently not feasible.
 * utilize positive messaging features to reduce new editor bite, thus promoting editor retention.

Feature Requirements

 * Track the users who have triaged a page and the dates that they did so.
 * Provide a list view of New Pages.
 * This list must be filterable.
 * This list must easily show the state of a page, whether or not it has been triaged.
 * This list view must provide as much useful information as possible about an article.
 * Eliminate the 30 day "expiry" to the New Pages queue.
 * Provide a "Mark as Patrolled/Triaged" link on every page that has not been triaged without requiring the user to have first visted Special:PageTriage
 * Optionally mark every un triaged page as NOINDEX

Easement of 30 Day Expiration
A major feature of Page Triage phase 1 is the removal and replacement of the current New Pages queue. The current implementation of the queue is an artifact of the Recent Changes queue, which is only kept for thirty days.

With Page Triage, a new queuing infrastructure will be implemented. This queue will operate thus:


 * When a new page is created, it will be added to this queue with the "patrolled" bit set to "false".
 * When a page is marked "patrolled", the "patrolled" bit will flip to "true", and the date of the event as well as the user who did the action will be stored.
 * Pages that are marked as "patrolled" will remain in the queue for an addtional sixty days, after which they will be removed from the queue.
 * Pages that have not been marked as "patrolled" will continue to remain in the queue indefinately.

Issues with Queue Modification
A driving factor for the continued patrolling of new pages is the existence of the 30 day expiry. This expiration date serves as a motivator: patrollers are encouraged to clear the queue.

It is unknown what the removal of this limit will do with regards to patroller motivation. Other motivations will be experimented with. If statistics show that the removal of the expiration date radically impacts the performance of the patrolling system, it is suggested that it be re-enabled until such time as better systems can be found.

Addition of "Mark as Patrolled/Triaged" link
With the modification of the queuing infrastructure, it should be possible to add a "mark as patrolled" or "mark as triaged" link to any unpatrolled page and have it be persistant without causing cache fragmentation.

This control will be hidden by default. There will be a user preference created that, when enabled, will cause the control to appear.

NOINDEX Flag on Un-Triaged Pages
Since a major problem with new pages is the proliferation of potentially illegal content being indexed by search engines, a default stance of marking pages as NOINDEX until they have been patrolled is an option.

When an article is marked as "patrolled", the NOINDEX flag will be removed from the article.

Issues with NOINDEX
Of primary concern with the addition of an automatic NOINDEX flag on unpatrolled/untriaged edits is that of "Current Events" articles. It is desirable for these articles to be quickly indexed by search engines; hiding them until they have been patrolled may be undesirable.

User Experience: List View


The proposed List View interface explodes the current "unpatrolled" list into a more readable and scannable format.

For Discussion: Mark as Unpatrolled
It may be desirable to provide a mechanism to flip an article back into an "unpatrolled" state. While such a feature would (in theory) be easy to implement, questions have arisen as to its desirability in the workflow.

The reason for this is that articles that are incorrectly marked as patrolled are often headed for the "delete" pile anyway, and it is a simple matter to PROD such an article.

Header Section
The header section will provide controls for simple and advanced filtering as well as the ability to switch queue direction (back or front).

A count of the number of untriaged articles will also be displayed.

Filter Mechanisms
Ideally, there will be multiple ways to filter the List Interface:

Basic Filters:
 * Show/Hide Triaged Pages
 * Show/Hide Bot pages
 * Show/Hide redirects

Advanced Filters:
 * By Creator Username
 * By Namespace
 * By Category
 * By WikiProject (ohman, this would be awesome)

Footer
The footer of the list interface will show statistic information about the performance of the queue:


 * Top 5 patrollers
 * Median age of unpatrolled article
 * Age of 25th, 50th, 75th, 90th percentile of unpatrolled articles
 * Age of oldest article
 * A link/button that leads to a more detailed analysis of queue performance

Individual Entries


Each entry within the List View contains the following elements:


 * A quick-scannable "triaged or not" badge/notification
 * If a page has been triaged, a green checkmark will appear
 * If the page has not been triaged, a red alert mark will appear
 * When the user hovers over the icon, if the article has been triaged, a hover will appear showing who did the triaging and when.


 * The Page title, along with a link to its history, its size and number of edits
 * A count of categories and images
 * If there are categories, this shall be called out in bold and red
 * If the page is an orphan, this too shall be called out boldly


 * The date the page was created
 * The user name of the page creator, his or her edit count, and when he or she started editing;
 * The first 500 characters of the page/article (not the edit summary)
 * A "Triage" button (in phase one this will open the article in a new tab)

The List Interface is envisioned to be infinitely scrolling. The filter controls will persist at the top of the screen and the statistics pane will persist at the bottom of the screeen. Scrolling within the page will infinite scroll through the entire queue.

Feature Requirements

 * Provide a pageable, easy-to-use, and intuitive "zoom" interface that allows page examination and tagging in situ
 * This interface must provide meta-data about the article
 * This interface must show the article in the interface
 * This interface must allow for editing in situ
 * This interface must work with Twinkle
 * This interface must promote positive feedback/welcoming actions
 * This interface must be pageable without leaving the interface
 * Ideally, the interface's "paging queue" will be smart and modify itself according to behaviors of other patrollers and their work.
 * This helps to prevent a race condition wherein two patrollers work on the same article simultaneously, and generate edit conflicts.

User Experience: Zoom Interface
This section is a work in progress.

Currently, New Page Patrol requires that all actions taken on an article from the list interface happen on a separate page outside any specialized patrolling interface. Alternatively, the "zoom" interface is a close-up, actionable interface for New Page Patrolling. It is heavily AJAX-dependant, so Javascript is required.

Clicking on the "Triage" button next to any page entry will bring the user to the "Zoom" interface, centered on the selected page. The selected article will open directly within the list.

The selected article will have the following elements:


 * Interface Filters and Meta Information - this section (at the top) provides similar metadata about the article as is shown in the list view (e.g., size of the article, edit count, user history)
 * Sidebar - This section has several sub-components:
 * Possible Problems - This section highlights issues that the system has detected with the article. This information is designed to guide inexperienced patrollers, showing what may be problematic with the article.
 * Mark as Triaged - This button marks the page as being "triaged". Clicking this button will advance the user to the next article.
 * Skip this Page - This button will advance the user to the next article without making changes to the shown article


 * Page Viewing Pane - The article is loaded into this pane in full.
 * A prominant "Edit" button is to be provided. Editing the article (or an article's section) will open the editor within this pane directly.

Other possible improvements
These are flaws in the current system or requests for changes:

For initial phases of the implementation, the tool can work within the existing template/tag system by automatically adding templates to the article
 * New users would be gradually taught to patrol correctly and could work with what they feel comfortable with, eventually graduating up to areas of additional difficulty
 * Includes automated systems to aid in patrolling
 * Includes a more crowd-sourced, moderation-queue like process
 * This will increase work-load overall, but probably decrease it per-user
 * Has multiple flags other than simply "patrolled" vs. "not patrolled"
 * tagged for deletion so that if a deletion is declined the article reverts to unpatrolled
 * submitted by an editor whose previous article was deleted as badfaith (hoax or attack)
 * second opinion requested
 * Allows for the re-viewing and flagging of an the article in situ
 * Could easily be used on tablets and mobiles
 * Gesture support would be awesome
 * New article reports for the 700 or so wikiprojects would get more experienced editors who are interested in the various subject areas. If these reports were in effect special new pages filtered by wikiproject then it would bring extra patrolling to the mid queue. See Meta:Research_talk:Patroller_work_load
 * A less convoluted and much more efficient way to do this is to simply display the mark as patrolled box to any autoconfirmed editor who views the article. Active WikiProjects and many Wikipedians who have a special interest already look at the new articles that appear in the categories they are interested in, and do much of the mid queue deletion tagging. But currently they don't get the opportunity to mark the acceptable articles as patrolled.
 * Notice when an article is moved and keep the moved article in the queue not the redirect. (This is an annoying bug in the current system)
 * Don't have a cutoff for unpatrolled articles. Currently anything that makes it to thirty days is automatically patrolled. We need a system that has no cutoff.
 * When people convert redirects into articles the Newpage patrol system should treat that as a new article. Currently this is a bit of a loophole (see 16705)
 * When people move drafts from userspace to mainspace the new page patrol system should treat the newly moved page as a new article (moves within mainspace are different - but any page moved in from other spaces is potentially a new page sidestepping newpage patrol) Currently this is a bit of a loophole (see bug 12363 )
 * we need an intelligent edit filter that notices when someone is creating an article in the wrong language and can say to them, in the language they are writing in: Hi I just noticed you were about to add an article written in Arabic to the English Wikipedia. Would you rather add that to the Arabic Wikipedia. Yes take me to AR wiki|No, I'm going to translate this into English before I publish it on the English Wikipedia. On EN wiki alone we get several editors every day who make this mistake, so easily over a thousand newbies a year who we could give a much better start to.
 * We also need and edit filter to deal with the large proportion of newbies who are from the copypaste generation and need to be taught that writing your own words is not the same as copying from other websites. Currently we do this laboriously and painfully by bots checking after the event, and people sweeping up after the bots. But that is a design from a different internet era. What we should have today is an edit filter that incorporates the search that corensearchbot does, so if someone clicks save on a paragraph of new text the system can spot that this is a straight copy of foo.com and explain to them why we don't do that. I think that would be less bitey for the newbies who need to be taught about Copyvio, and less work for the rest of us. Apparently this would require changes to Mediawiki and at least for the next few years there would be a serious processing overhead, so initially you would have to throttle this to some random newbie created articles. But as Moore's Law kicks in the filter can be unthrottled.

Currently:


 * No way to tag an article for improvement but not mark it as patrolled. This needs to be added because there will be articles where a patroller is sure it is unreferenced but not sure whether to mark it as patrolled. Or where a patroller has adopted the "if in doubt categorise" policy. It is essential that articles are only marked as patrolled when someone consciously thinks tey are fit to be so marked.

Mouseover tooltips should show a preview of the intended template uw message (see talk, re Twinkle)
 * Tooltips


 * Meta info
 * Should show if a page under that title has already  been deleted.
 * Should show if the creator's talkpage has previously  been relinked, and official  warnings, or if creator has been previously blocked. (Possible through AJAX?)