Article feedback/Research

Overview
Here is some preliminary data on the Article Feedback tool. It is based on approximately 1,470 ratings across 289 articles during the first ~week of the Pilot. A running list of articles is maintained here, but please keep in mind the list is subject to change.

Overall Ratings Data
The following table summarizes the aggregate rating data.


 * Overall, it’s difficult to conclude whether the differences in category averages are meaningful.  But on average, raters have a relatively similar view of each category (e.g., the perceptions of the articles in the Pilot, as a whole, are that they are about as well sourced as they are neutral, complete, and readable).
 * Completion rates for each category (defined as the number of ratings for the category divided by the total number of ratings) is between 87% and 93%. From a usability standpoint, it appears as though four categories is an acceptable number of categories for users to rate, though further research would help us better understand this (e.g., users may simply be clicking through, they may think rating all four categories is a requirement, etc.).  Here’s a table that breaks down the number of ratings by the number of categories completed:

The vast majority of ratings (83%) have all four categories rated, while 17% are missing at least one category.

Comparing Anon Reviewers to Registered Reviewers
In total, there were 1,300 users (defined by unique IPs and registered accounts). Of the 1,300, 1,138 (88%) were anon and 162 (12%) were registered accounts. When anons and registered reviews are analyzed separately, some interesting patterns start to appear.

A few things worth noting:


 * It appears as though registered users are “tougher” in their grading of the articles than are anon users. This is especially notable in the area of “well sourced” (3.7 mean for anon vs. 2.8 mean for registered) and “complete” (3.5 vs. 2.7).  It’s interesting to note that the means for “neutral” are almost identical.


 * The standard deviation of ratings across all categories is lower for registered than for anon. At an aggregate level, it appears as though there is slightly more consensus about the rating among registered users than anon users.


 * The completion rate for reviews is higher for registered users as well. It’s worth noting that “Neutral” had the lowest completion rate for both registered and anonymous users.

Finally, registered users are more likely to rate multiple articles.

Anon Reviewers

Registered Reviewers

10 most frequently rated articles
(Simply sorted by number of submitted "well sourced" ratings.)


 * http://en.wikipedia.org/wiki/United_States_Constitution - 80 ratings -- linked from Wikimedia blog post
 * http://en.wikipedia.org/wiki/Don't_ask,_don't_tell - 61 ratings -- linked from Wikimedia blog post
 * http://en.wikipedia.org/wiki/Capital_punishment - 37 ratings
 * http://en.wikipedia.org/wiki/Terrorism - 35 ratings
 * http://en.wikipedia.org/wiki/United_States_Declaration_of_Independence - 32 ratings
 * http://en.wikipedia.org/wiki/DREAM_Act - 32 ratings
 * http://en.wikipedia.org/wiki/LGBT_rights_in_the_United_States - 30 ratings
 * http://en.wikipedia.org/wiki/5_centimeters - 28 ratings -- third item in public policy category
 * http://en.wikipedia.org/wiki/Pollution - 27 ratings
 * http://en.wikipedia.org/wiki/Abortion - 22 ratings

To Do

 * Breakdown of ratings (particularly num. ratings) by user (username or IP)
 * Top 10 (most rated) article comparison
 * Top 10 (most prolific raters) user comparison
 * Short article (with rating tool visible) Vs. others comparison
 * Comparison of average ratings to current Wikipedia rating system (FA, GA, etc)
 * Investigate the 87+% 4 metric ratings (forced choice? felt mandatory?  confidence in some over others?)
 * Email questionnaire to users about confidence in the accuracy of their ratings
 * Investigate whether those rating articles have also contributed/edited that article (could be done in the questionnaire)
 * Ask Roan if we can have a cumulative Page View column in our CSV data pull
 * Investigate "neutrality" - changing the word? description? placement?
 * Investigate "completeness"' relation to article length