Article feedback/Public Policy Pilot Phase 2

What we've learned from Phase 1
Here is a quick summary of what we've learned/are learning from Phase 1. More detail may be found here.
 * Ratings by Anonymous users outpace ratings by Registered users by 10x.
 * For many articles, the number of ratings from Registered users is not enough to provide meaningful information about article quality.


 * Ratings by Anonymous users skew high, with most anonymous users giving either a 4 or 5 rating across all dimensions. We do not yet know whether this skew will prevent ratings from Anonymous users from being a meaningful measurement of article quality (e.g., if an article is significantly improved, yet ratings from Anonymous users doesn't change noticeable).
 * Ratings by Registered users are both lower and show less of a skew compared to ratings by Anonymous users. This could suggest that Registered users are more critical and/or give higher quality ratings, though more data is needed to support this assertion.
 * In its current form, the tool is not a good on-ramp for editing. The current version does not offer an explicit invitation to edit (e.g., "Did you know you can edit this article?").

Goals for Phase 2
To build on what we've learned in Phase 1, the goals of Phase 2 focus on the strategic objectives of Participation and Quality:
 * 1) Participation: How can this feature be improved so that it is a better on-ramp for editing?  The current interface fails at being an on-ramp, but as stated, there are no specific calls for a user to edit.
 * 2) Quality: How useful is this tool to help the community measure the quality of an article?  Specifically, how useful is this tool in measuring the quality of an article over time?  Are there specific segments of users that offer higher quality ratings than others?

These two goals will be achieved through feature development and the selective targeting of articles.

Scope
The proposed list of features for Phase 2 is here.

Target Articles
In order to better understand how these ratings reflect article quality, we are targeting articles that will undergo substantial revision. Ratings before the substantial revision may then be compared with ratings after the revision to see if there is a noticeable change in ratings based on the revisions. We will deploy this feature on two sets of articles:
 * 1) Public Policy Articles (Currently deployed):  We will continue putting the article on select articles as part of the Public Policy Project.
 * 2) Articles that are likely to undergo substantial change (To be deployed): We will put the feature on general Wikipedia articles which by nature are subject to substantial revision in the near future (e.g., upcoming movies, elections, etc.).  The current list of pages is here (please contribute!).

Target Users
The target users for Phase 2 of this feature are:
 * Readers:
 * Rate article
 * Edit article


 * Editors:
 * Rate article
 * Edit article (increase editing activity, though this is a second priority for Phase 2)

For Phase 2, we are not prioritizing the use of the Article Feedback to provide detailed feedback (i.e., more detailed than the four categories) from readers on what areas of the article need improvement.

Feeback Survey
The draft of the survey that will be used for Phase 2 is available here

Measurement
We will be able to conduct the following analytics on the feature:

Participation

To measure the effect of Rating an article on participation, we will need the following analytics:
 * (# of users who edit an article after rating AND seeing the Edit call to action) / (# of users who rated AND saw the Edit call to action)
 * If there are multiple edit calls to action, we'll need to have this ratio for each separate call to action.


 * Fallout of rating-to-edit flow:
 * Click-through rate of call to action: (# of clicks of Edit call to action) / (# of times Edit call to action is displayed)
 * Edit conversion: After clicking on Edit call to action, (# of Saves) / (# of times Edit pages is viewed)

Quality

As previously mentioned, the main test for Quality we are conducting for Phase 2 is to determine whether substantial changes in an article are reflected in the ratings. We are doing this by applying the rating tool to articles that are likely to undergo substantial revision in the future. Currently, we have the data to construct moving averages per article by Anonymous and Registered users. We will need to be able to easily construct moving averages per article, per Rater segment (based on survey responses). For example:



The above chart shows hypothetical data for Ratings of US Constitution given by users that have identified themselves from the survey as having a degree in the field. The actual categories (e.g., "degree in field") for the survey are currently being developed.

Timeline
Here is the proposed timeline (subject to change):
 * Design: Start Nov 8
 * Development sprint: Nov 15-29
 * Feature on protoype: Early December
 * Testing/bug fixing: Early-mid December (approx 3 weeks)
 * Launch: Second week of January