Page Curation

This page describes the design of an interface for curating new pages in MediaWiki, called Page Curation (Extension:PageTriage). This product has been developed and was released by the Wikimedia Foundation in September 2012. It received a series of upgrades from June to October 2018. Read more on the overview page, watch this video tour – or try it for yourself on the New Pages Feed.

This quick video tour (3:30) shows how curators can use the New Pages Feed and Curation Toolbar to review new pages more productively – and give better feedback to page creators. You can also view it on [//www.youtube.com/watch?feature=player_detailpage&v=BfECMY67Eb0 YouTube] and [//vimeo.com/49947114 Vimeo]. Note: the video tour is out of date with many of the latest components of the features.

This design document is a work in progress, and we expect to make upgrades to this product in the future. Feedback is welcome on the talk page. Additional design specifications are provided on this Mingle board. (Note that this product was previously code-named 'Page Triage').

Notes on Nomenclature
This document has developed a new term, "reviewed".

This document will refer to "New Pages" as any page that has not been marked as "'reviewed" (or "patrolled"). Edit count on the page is not taken into consideration.

For the sake of verbiage, this document will assume two states of an article:
 * unreviewed - The article has not been marked as patrolled.
 * reviewed - The article has been marked as patrolled or auto-patrolled. Tags requesting improvement may or may not have been applied.
 * Marked for Deletion - The article has been nominated for deletion.

Rationale

 * "New Page Patrol" is a complicated process that is poorly supported by the MediaWiki software itself.
 * No two patrollers seem to utilize the same process.
 * Users who perform New Page Patrol report high levels of frustration and burn out due to feeling overworked for these reasons:
 * Inexperienced patrollers aggressively over-template, requiring work to be rechecked;
 * They often don't identify or fix major problems with new articles, requiring work to be rechecked
 * Too few users choose to become patrollers or curators.
 * Because education about the triaging process is difficult
 * Because optimizing a system for page patrol is a "Power User" job, requiring greater-than average computer savvy as well as (oftentimes) downloading of third party software

Upgrade 2018: adding AfC, ORES, and copyvio detection
During 2018, the WMF Community Tech and Growth teams jointly undertook a set of upgrades to the New Pages Feed. The project and its motivations are described in detail on this page. That page also contains links to discussions in the NPP and AfC project containing exact specifications and instructions for use of these features. The results of this project are:


 * Articles for Creation: the New Pages Feed now includes a toggle to switch between "New Page Patrol" and "Articles for Creation" (AfC). The new "Articles for Creation" section includes all pages that are in the Draft namespace, and can be filtered by their state in the community-built AfC process.  It can also be sorted by the dates of when drafts moved through different states.
 * ORES: the New Pages Feed now includes the ability to filter pages on their quality, both in terms of their likely issues (e.g. spam, vandalism, attack) and their likely class (stub, start, C-class, etc.) This ability is backed by ORES models built by the WMF Scoring Platform team.  The model classifications are also listed with each page in the feed.
 * Copyvio detection: the New Pages Feed now includes the ability to filter to pages that are likely to contain copyright violations. This ability is backed by the Turnitin service via the CopyPatrol tool.  If a page is flagged by CopyPatrol, there is a link present through which the reviewer can open CopyPatrol's interface for investigating potential violations.

Articles for Creation
Below are instructions for how to use the New Pages Feed in the Articles for Creation workflow.


 * 1) Go to the New Pages Feed.
 * 2) Select the "Articles for Creation" toggle at the top of the feed.
 * 3) Select a "State" to filter the list. These are the available states, all of which exclude redirects:
 * 4) * Unsubmitted: drafts that have never been submitted for review.
 * 5) * Awaiting review: drafts that have been submitted for review and correspond to this category. This is the option to select to find drafts to review.
 * 6) * Under review: drafts that have been marked as under review with the AFCH gadget, and are therefore being reviewed at that moment.
 * 7) * Declined: drafts that have been submitted and declined, but have not yet been resubmitted.
 * 8) * All: all drafts in English Wikipedia. This adds up to the other four categories combined.
 * 9) Sort the list by a date:
 * 10) * Created date (newest): drafts created most recently are first.
 * 11) * Created date (oldest): drafts created least recently are first.
 * 12) * Submitted date (newest): drafts submitted most recently to AfC are first. Only available when "State" filter is "Awaiting review" or "Under review".
 * 13) * Submitted date (oldest): drafts that have been waiting the longest for AfC review are first. Only available when "State" filter is "Awaiting review" or "Under review".
 * 14) * Declined date (newest): drafts that were declined most recently are first. Only available when "State" filter is "Declined".
 * 15) * Declined date (oldest): drafts that were declined least recently are first. Only available when "State" filter is "Declined".
 * 16) Click the title of a draft to open it in a new tab.
 * 17) Review as usual.

ORES
It's important to note, as was referenced many times in the community discussion around planning these enhancements, that these predictions are only predictions. Because they are only suggestions from an algorithm, they are often wrong. Reviewers are meant to use them to find pages that are more likely to have those characteristics, in order to help make reviewing work more efficient. They can also be taken into account when doing a review. But at the end of the day, as several experienced reviewers emphasized in the community discussion, human judgment is still what should be deciding whether a page is of high quality or not.

This component works are follows:


 * Scores all pages in the New Pages Feed with two ORES models:
 * Predicted class: this estimates a class for each page (Stub, Start, C-class, B-class, Good, Featured).
 * Potential issues: this identifies which pages are most likely to be spam, attack, or vandalism.
 * The predictions are listed along with each page in the feed for reviewers to reference.
 * Reviewers are also be able to filter the feed to only pages of certain predicted classes or with potential issues.
 * As pages change, so do their predictions. For instance, if a reviewer removes spam content from a page, that page would likely stop being shown as potentially spam in the feed.

Copyvio detection
To detect potential copyright violations, the feed uses the same system that backs CopyPatrol. CopyPatrol is backed by the external service Turnitin, which is primarily used by academic institutions to detect plagiarism. Turnitin scans books, articles, and websites for text matches. CopyPatrol runs all diffs over 500 bytes through Turnitin and flags diffs where there is over a 50% match with some other document.

Pages in the New Pages Feed get flagged as potential copyright violations if any of their diffs (including the initial creation of the article) are flagged by CopyPatrol. The flag will remain with the page in the New Pages Feed as long as the page is in the feed -- even if the violation is resolved in CopyPatrol. For a full explanation of the rules we're using and for the way Turnitin works, see the in-depth discussion from the planning process.

It is important to note, as many reviewers discussed during feature development, that these flags are meant only to draw reviewer attention to potential issues -- they are not meant to be taken as absolute truth. Since they are predictions from an algorithm, it is very common for CopyPatrol to flag an edit that is not a violation at all. In other words, this flag is for drawing reviewer attention to those articles that need their judgment. Similarly, when an article does not have the copyvio flag, it does not necessarily mean that it is not a copyvio.

Here is how to use the feature:


 * 1) Open the "Set filters" menu at the New Pages Feed.
 * 2) Check the "Copyvio" box under the list of "Potential issues". This filters the feed to articles that have potential violations.
 * 3) Those articles have a link in their entry in the feed that says "Potential issues: Copyvio". Clicking that opens up an entry in CopyPatrol where it is possible to investigate the violating text side by side with the source where it may have come from.

Hypotheses
A native, easy-to-use Page Curation interface would enhance the current page patrol process as follows:
 * increase the number of users who choose to become curators (or patrollers), reducing workload.
 * help establish better education about the process as is, resulting in lower "false positive" rates.
 * allow expansion and modification of the system to support different backend systems and logic screens.
 * serve as an engagement point for mobile and tablet users, for whom editing is currently not feasible.
 * utilize positive messaging features to reduce new editor bite, thus promoting editor retention.

Feature Requirements

 * Track the users who have reviewed a page and the dates that they did so.
 * Provide a list view of New Pages.
 * This list must be filterable.
 * This list must easily show the state of a page, including whether or not it has been reviewed or has been nominated for deletion.
 * This list view must provide as much useful information as possible about an article.
 * Eliminate the 30 day "expiry" to the New Pages queue.
 * Provide a "Mark as reviewed" link on every page that has not been reviewed without requiring the user to have first visited Special:NewPagesFeed.
 * Optionally mark every un-reviewed page as NOINDEX
 * Optionally mark every Marked for Deletion' page as NOINDEX

Easement of 30 Day Expiration
A major feature of Page Curation phase 1 is the removal and replacement of the current New Pages queue. The current implementation of the queue is an artifact of the Recent Changes queue, which is only kept for thirty days.

With Page Curation, a new queuing infrastructure will be implemented. This queue will operate thus:


 * When a new page is created, it will be added to this queue with the "reviewed" bit set to "false".
 * When a page is marked "reviewed", the "reviewed" bit will flip to "true", and the date of the event as well as the user who did the action will be stored.
 * Pages that are marked as "reviewed" will remain in the queue for an additional sixty days, after which they will be removed from the queue.
 * Pages that have not been marked as "reviewed" will continue to remain in the queue indefinitely.
 * Pages that have been nominated for deletion will naturally fall out of the queue when and if they are deleted.

Issues with Queue Modification
A driving factor for the continued reviewing of new pages is the existence of the 30 day expiry. This expiration date serves as a motivator: reviewers are encouraged to clear the queue.

It is unknown what the removal of this limit will do with regards to reviewer motivation. Other motivations will be experimented with. If statistics show that the removal of the expiration date radically impacts the performance of the triaging system, it is suggested that it be re-enabled until such time as better systems can be found.

Addition of "Mark as Reviewed" link
With the modification of the queuing infrastructure, it should be possible to add a "mark as reviewed" link to any unreviewed page and have it be persistent without causing cache fragmentation.

This control will be hidden by default. There will be a user preference created that, when enabled, will cause the control to appear.

Alternatively, this can be a system-set cookie: the controls will continue to appear as long as the user visits Page Curation or utilizes the control for a set time (say, seven days).

NOINDEX Flag on Un-Reviewed Pages
Since a major problem with new pages is the proliferation of potentially problematic content being indexed by search engines, a default stance of marking pages as NOINDEX until they have been reviewed is an option.

When an article is marked as "reviewed", the NOINDEX flag will be removed from the article.

Issues with NOINDEX
Of primary concern with the addition of an automatic NOINDEX flag on unreviewed edits is that of "Current Events" articles. It is desirable for these articles to be quickly indexed by search engines; hiding them until they have been reviewed may be undesirable.

User Experience: List View


The proposed List View interface explodes the current "untriaged" list into a more readable and scannable format.

Log:Marked for Deletion
A new system log, "Marked for deletion", will be created. This will be a "standard" MediaWiki log.

When a deletion nomination template is added to a page, an entry will be inserted into the log with the page name and the user who nominated the article for deletion. This functionality should work regardless of how the template is injected into the page; it should be an "on save" hook, rather than a special function of the Page Triage software.

If a deletion template is removed, an additional entry will be included in the log, indicating that the tag was deleted.

This log may be viewed in a global context or as one of the user's logs (like contributions, blocks, etc.)

Header Section
The header section will provide controls for filtering as well as the ability to switch queue direction (back or front).

The header primarily shows the currently engaged filters. If too many filters are enabled for full display, the list will be truncated with a "..." indicator that is "hot". Hovering over this will reveal the full list of filters.

Clicking on the "set filters" link will open a tooltip dialog for setting any and all filters.

By default, the only enabled filters are "show reviewed edits" and "show nominated for deletion".

A count of the number of unreviewed articles will also be displayed.

Filter Mechanisms
The following ways will exist for filtering the List Interface:


 * Show Unreviewed Pages
 * Show Reviewed Pages
 * Show pages Marked for deletion
 * Show Bot pages
 * Show Redirects
 * By Creator Username
 * By Namespace
 * By Tag
 * Only orphans
 * Only no categories
 * Only by blocked users
 * Only by non-autoconfirmed users

Footer
The footer of the list interface will show statistic information about the performance of the queue:


 * Median age of unreviewed article
 * Age of oldest article
 * A link/button that leads to a more detailed analysis of queue performance

Individual Entries


Each entry within the List View contains the following elements:


 * A quick-scannable "reviewed/unreviewed/marked for deletion" badge/notification
 * If a page has been reviewed, a green checkmark will appear
 * If the page has not been reviewed, a red alert mark will appear
 * If the page has been marked for deletion, a black trashcan mark will appear.
 * When the user hovers over the icon, if the article has been reviewed, a hover will appear showing who did the review and when.


 * The Page title, along with a link to its history, its size and number of edits
 * A count of categories
 * If there are no categories, this shall be called out in bold
 * If the page is an orphan, this too shall be called out boldly


 * The date the page was created
 * The user name of the page creator, their edit count, and when he or she started editing;
 * The first 500 characters of the page/article (not the edit summary)
 * A "review" button (in phase one this will open the article in a new tab)

The List Interface is envisioned to be infinitely scrolling. The filter controls will persist at the top of the screen and the statistics pane will persist at the bottom of the screeen. Scrolling within the page will infinite scroll through the entire queue.

Feature Requirements

 * Provide an, easy-to-use, and intuitive "toolbar" interface that allows page curation and tagging in situ
 * This interface must provide meta-data about the article
 * This interface must show the article in the interface
 * This interface must allow for editing in situ
 * This interface must work with Twinkle
 * This interface must promote positive feedback/welcoming actions
 * This interface enable you to skip to the next page
 * Ideally, the interface's "paging queue" will be smart and modify itself according to behaviors of other reviewers and their work.
 * This helps to prevent a race condition wherein two reviewers work on the same article simultaneously, and generate edit conflicts.

User Experience: Curation Toolbar


The Curation Toolbar attempts to address several issues involving the general page review and curation workflow. It is designed to be extensible, context-aware, and allow for modification by the greater community on wikis where it has been enabled.

The Curation Toolbar allows users to review pages, mark them as "reviewed/patrolled", tag them with various templates (including deletion templates), and provide gratitude to the authors of articles.

The Curation Toolbar requires Javascript.

Curation Mode
The Curation Toolbar introduces a new concept to the page patrol workflow: Curation Mode. When a user is in Curation Mode, the toolbar will appear on all pages (though various functions will be disabled, depending upon context).

A User can enter Curation Mode through the following ways:


 * Accessing a page via a link from the NewPagesFeed "list view"
 * Accessing a page via a link from w:Special:NewPagesFeed
 * Activating "Curation Mode" from a link the sidebar Toolbox, "Curate this page".

Exiting Curation Mode is handled by clicking the "X" control at the top of the Curation Bar. This will prompt the user for confirmation.

First Time in Curation Mode
The first time a user enters Curation Mode, a simple modal dialog box will appear to the user that explains what Curation Mode is, how it works, and how to exit Curation Mode.

Toolbar Buttons
The following buttons will exist in the first release of the Curation Toolbar:


 * Close - this will be a small button that will cause the user to close the Curation Toolbar when clicked. There will be a confirmation dialog.
 * Page Info - This will activate the "Page Info" flyout.
 * WikiLove - This will activate the "WikiLove" flyout
 * Mark as Reviewed - This will activate the "Mark as Reviewed" flyout.
 * Add Tags - This will activate the "Add Tags" flyout.
 * Mark for Deletion - This will activate the "Mark for Deletion" flyout.
 * Skip - This will skip the current article and move the user to the next article in the queue.

Specific details about the buttons and their behaviors will be included in their corresponding flyout sections, below.

Flyout Behaviors
Clicking on the various buttons on the toolbar will activate "flyout" dialogs with additional information or options.

Note that not all buttons will activate flyouts:
 * The "Close" control will cause the user to exit Curation Mode
 * The "Skip" button will simply advance the user to the next article in the queue.

Flyout: Page Info


The information flyout provides the user with meta information about the page being viewed, including a system-generated "possible problems list".

If the system detects that there are "possible problems" with the article, a red notifications badge will be displayed on the button with the number of detected problems highlighted.

The flyout panel itself can be broken up into three sections:

Metadata
This section displays several basic bits of information about the page being viewed:


 * Its reviewed status (as an icon)
 * Article size and number of edits
 * Date of creation
 * Author name, including the number of edits the author has.

Possible Problems
This section displays a list of system determined "meta data" problems that the article may have (such as "no images", "no references," etc.). Each entry will have a short description as to what the list entry means and why it may or may not be important.

Simple History View
This is section is a simplified view of the articles history, bundled by date. This section will scroll independently.

History entries will be collated by date, and then listed by time, user name, and commit summary.

Flyout: Tag & Tag Details


The tag flyout allows the reviewer to select multiple tags and add them to the article in bulk.

Various tags will be organized into tabs based on type or category. The user can flip between tabs, selecting tags as needed. The user does not need to click the "add selected tags" button until they are finished: the system remembers which tags have been selected in which categories.

If tags are selected in "deep" tabs, this information is provided to the user by markers in each tab indicating how many tags have been selected in that tab. Further, a "total" number of tags selected will be displayed in the pane next to the "commit" button.

For each tag, a simplified name is used. For example, the "Needs Cleanup" template is listed as "Cleanup" (the better for alphabetization). Each tag includes a short (max 200 character) description as to what it is to be used for.

When a tag requires parameters, checking it will open a sub-dialog indicating that such parameters are required and that the system cannot continue until the parameters have been filled out.

Clicking the "Add Selected Tags" button will add the selected tags to the article in a single edit, cause the page to refresh, and reopen the tag dialog.

The structure of the tag dialog should be definable within MediaWiki (similar to the way that WikiLove handles configurations). This will allow local communities to add or remove tags as needed or desired.

Certain tags such as Close Paraphrasing or Copy & Paste require details so there is better context for people who will use the tags to improve the article quality. In specific cases, editors need to provide these details in order to Add tags to articles.

As seen alongside, when an editor selects a "Required" tag, an inline panel 'automatically opens up where details must be saved. Details can be edited once saved using an "Edit Details" handle. If a tag with optional details is selected, then the panel will not auto expose. An "+ Add Details" action will appear below the link. Blue chevrons will serve as light actions that will open up the community template where one can get extensive information around the tags.

Adding details is not meant to slow down patrollers, it is meant to provide through & succinct feedback for critical tags.

Flyout: Mark as Reviewed/Unreviewed


The "Mark as Reviewed" flyout is a simple dialog. It simply requires confirmation that the user intends to mark the page as reviewed.

If the article has already been marked as reviewed, the icon will change state indicating such.

There will also be a "mark as unreviewed" feature, with a dialog will allow for the article to be marked for "unreview" if it is already reviewed.

Flyout: WikiLove


The WikiLove panel is a two-stage system. Activating the panel will show a list of all the users who have edited the article, in a roll-up format (e.g., showing a total of edits). The sort order should be based on the degree of contributions with the page creator first and then all other editors listed by the number of contributions, descending.

Next to each user will be a checkbox. Checking this box will include them in stage two of the system. A "check all" link will check everyone in the list.

Once the user has selected the users they desire, they can click the "send wikilove" button. This will then open the WikiLove dialog box, whereupon they can go about constructing the WikiLove award of their choice.

Note that the WikiLove extension must be modified in order to allow for multiple recipients.

Flyout: Mark for Deletion


The deletion flyout will carry the 3 main categories. These categories are mutually exclusive. Each category will carry light instruction for 'When to use'. This attempts to help new patrollers make clear decisions.
 * Speedy Deletion
 * Proposed Deletion
 * Discuss for Deletion

Once a category has been selected, more details are required before the article can be marked for deletion. Mark for Deletion is not a terminal delete action.

Preferences
Eventually, we should create an interface for settings PageTriage preferences. These preferences could include:
 * Use infinite scrolling
 * Save my filter settings
 * Open pages in new tabs
 * Float nav bars

To Do
Screen Mockups for:


 * Tag flyout behavior when reason is required
 * Deletion flyout
 * Exit Curation Mode confirmation
 * Welcome to Curation Mode