Article feedback

Project History and Timeline
The Wikimedia Foundation has been experimenting with a feature to capture reader quality assessments of articles in the past few months. Originally designed to support the Public Policy Initiative, the first phase of the feature was rolled out last September on the English Wikipedia. In November, the feature was put on another 50-60 articles in addition to those improved through the Public Policy Initiative.



In March 2011, v2.0 of the feature was released to approximately 3,000 English Wikipedia articles. The user interface of the feature was redesigned.



We also added calls to action. These are invitations presented to the rater, after the completion of the rating process, to take an additional action. The three calls we’ve tested so far are invitations to create an account, to take a survey, or to edit the page.



Finally, version 2 includes a checkbox (currently A/B tested to 50% of users) that can be used to qualify the expertise of the rater. The checkbox includes the following options:
 * I am highly knowledgeable about this topic (optional):
 * I have a relevant college/university degree
 * It is part of my profession
 * It is a deep personal passion
 * The source of my knowledge is not listed here
 * I would like to help improve Wikipedia, send me an e-mail (optional)

The e-mail option was recently added to test systematic e-mail calls-to-action targeting self-identified expert raters.

Since launching the tool in September, we’ve continually analyzed the results and made small modifications based on quantitative and qualitative data. We’re still learning from the feature, and based on the work so far, we think that reader feedback has a lot of potential:


 * Quality assessment: it can help complement internal quality assessment of Wikipedia articles with a new source of data on quality, surfacing content of potentially very high or very low quality, and measuring change over time.
 * Reader engagement: it represents a way to encourage participation from the reader community.

The current plan is to roll out the tool to approximately 100,000 articles during the second week of May, to continue to analyze data, and to conduct further tests especially with an eye to the second category, reader engagement. We are likely to continue to expand the rollout, both in the English Wikipedia and beyond, after that. We will also make anonymized data from the tool continually available.

Reader Feedback and Article Quality
Can readers meaningfully measure the quality of an article, e.g. complete it is? This is a challenging question to answer because the base measures of quality can be difficult to quantify (e.g., how can article completeness be measured?). We’ve made some simplifying assumptions,and based on the data we’ve analyzed so far, we find that in some categories, reader feedback appears to be correlated with some objective measures of an article.

For example, in articles under 50kb, there appears to be a correlation between the Trustworthy and Complete ratings of an article and the article’s length. The longer the article is, the more Trustworthy and Complete it is considered by readers. In other categories, there does not appear to be much of a correlation between objective measures of the article and ratings

At an aggregate level, there appears to be some alignment between reader assessments and community assessments of an article’s quality. Of the 25 most highly rated articles in the sample (average rating of 4.54), 10 are either Featured Articles or Good Articles (3 Featured and 7 Good). Of the 25 most poorly rated articles (average rating of 3.29), there is only one Good Article and no Featured Articles.



Feedback from Experts

We also provided a method for users to self-identify as knowledgeable about the topic. A set of checkboxes allow reader they are knowledgeable (in general) about a topic and, if so, the specific source of their knowledge. The main goal of this feature is to determine whether self-identified experts rate differently than users who did not self-identify as experts. While there are many more things we can do to verify expertise (see future plans - link to section below), self-identification is an easy first-step to understanding expert rating behavior.

Our preliminary results indicate that overall, experts show a similar distribution of ratings as do non-experts (see graph ). But when individual articles are analyzed, a different pattern emerges. It appears that users who claim general knowledge do not rate substantially different than non-experts. But users who claim specific knowledge from either studies or their profession appear to rate differently (see graph . Based on the limited data collected so far, it appears as though expert ratings are more diverse than non-expert ratings of the same article.

We have only scratched the surface of analyzing the correlation between ratings and actual changes in article quality. We hope that the Wikimedia community and the research community will use the rating data dumps, which we will make publicly available, to continue this analysis.

Ratings as a Way to Engage Readers
Invitations to participate

As part of the v2.0 release in March, we introduced some “calls to action”, or invitations to the reader to participate after once they’ve submitted a rating. There are three different calls to action currently being tested:
 * 1) Create an account
 * 2) Edit the article
 * 3) Take a survey

Here is a summary of the results (The detailed click-through analysis may be found here):

The data show that 40% of users who are presented with the option to take a survey after completing their rating end up clicking through. And even though the call-to-action asks the user to complete a survey, some readers took the opportunity to provide feedback on the content of the article via the open text field. We observed something similar during the first phase of feature. During the first phase, there was a “Give us feedback about this feature” link.

Though the link specifically asked for feedback on the feature (as opposed to the content of the article), some readers provided rather detailed comments on the article content. While the comments field of the survey had its fair share of vandals and useless comments, there are clearly some readers who want to provide constructive feedback about the content of the article. The notion of these readers wanting to contribute to Wikipedia is reflected both in our user interviews as well as the survey results. Forty-four percent of survey respondents indicated that they rated because they hoped that their rating “would positively affect the development of the page” and 37% of respondents rated because they “wanted to contribute to Wikipedia.” These results show that an easy-to-use feedback tool is a promising way to engage these readers.



The “Edit” call-to-action also received a 15% click-through rate. While lower than the 40% who completed the survey, a low-teens percentage click-through rate is still significant, especially considering that these users probably had no intention of editing the article at the time they submitted the rating. This result suggests that a certain set of users, when presented with the option, would like to edit the article they just rated. More analysis, however, needs to be done on the actions after a user clicks on the call-to-action. The preliminary data indicate approximately 20% of these users end up successfully completing an edit, though the absolute numbers are still very small and we need more observations to establish statistical significance. We also need a measurement of the quality of the edits. We don’t know whether these edits are constructive, vandalism, or bail actions (user clicks save just to get out of the screen).

We intend more experimentation with these and other calls to action in the future.

Volume of Ratings

The Article Feedback tool is currently on approximately 3,000 articles, less than 0.1% of the total number of articles on the English Wikipedia. Over the past 1.5 months, however, over 47,000 individual users have rated articles:



In comparison, the English Wikipedia has approximately 35,000-40,000 active editors each month. With this experimental deployment, we can see that the number of users willing to rate an article exceeds the number of users willing to edit an article by at least an order of magnitude. Not only does the feedback tool offer a way to engage more users, some of these users may end up editing, as the call-to-action data show.

Current Plan
Given the encouraging results of the trial so far, we are planning on rolling out the feedback tool to approximately 100,000 articles within the next week. Articles will be selected at random (versus the stratified-random sampling we used for the 3,000 articles). The primary goal of this rollout is to test the scalability of the feature. We will also continue to analyze the ratings, survey, and click-through data as we have in the past. We’re also working on an experimental dashboard. The idea behind the dashboard is to surface articles to the editing community that are being rated very highly or very lowly. The use of this information, if it is used at all, is up to the community. Please provide feedback on this feature on the talk page.

In the longer term, we hope to be able to explore some more complex ideas, including:


 * enabling readers to leave free-text comments specifying issues with the article
 * enabling readers to praise the authors of an article, and enabling authors to receive praise
 * enabling editors to collaboratively filter rater comments and issue reports, and to promote comments of special significance to the talk page
 * enabling raters to attach credentials to a Wikipedia user account, which will then be associated with their rating.
 * improving the user interface integration of the tool to reduce initial screen real estate usage while increasing visibility/discoverability.

These ideas are described in more detail on the extended review page, including draft wireframes for a more comprehensive rating and feedback tool.