Article feedback

June 22, 2011: First anonymized data dumps for the Article Feedback deployment on the English Wikipedia are available. (read more)

Article feedback is a Wikimedia Foundation project to engage Wikimedia readers in the assessment of article quality, one of the five priorities defined in the strategic plan. It is based on the WMF-developed ArticleFeedback MediaWiki extension and currently deployed on a subset of pages on the English Wikipedia. This page serves as a central hub to document research, project updates, and development timelines, and to allow for centralized community conversation. Frequently asked questions on this project can be found on this page.

Project History and Background

 * For a detailed log of Article Feedback milestones and releases, please refer to this page

The Wikimedia Foundation has been experimenting with a feature to capture reader quality assessments of articles since September 2010. Originally designed to support the Public Policy Initiative, the first phase of the feature was rolled out last September on the English Wikipedia. In November, the feature was put on another 50-60 articles in addition to those improved through the Public Policy Initiative.



In March 2011, v2.0 of the feature was released to approximately 3,000 English Wikipedia articles. The user interface of the feature was redesigned.



We also added calls to action. These are invitations presented to the rater, after the completion of the rating process, to take an additional action.



The three calls we’ve tested so far are invitations to create an account, to take a survey, or to edit the page.

Finally, version 2 includes a checkbox (currently A/B tested to 50% of users) that can be used to qualify the expertise of the rater. The checkbox includes the following options:
 * I am highly knowledgeable about this topic (optional):
 * I have a relevant college/university degree
 * It is part of my profession
 * It is a deep personal passion
 * The source of my knowledge is not listed here
 * I would like to help improve Wikipedia, send me an e-mail (optional)

The e-mail option was recently added to test systematic e-mail calls-to-action targeting self-identified expert raters.

Since launching the tool in September, we’ve continually analyzed the results and made small modifications based on quantitative and qualitative data. We’re still learning from the feature, and based on the work so far, we think that reader feedback has a lot of potential:
 * Quality assessment
 * article feedback can help complement internal quality assessment of Wikipedia articles with a new source of data on quality, surfacing content of potentially very high or very low quality, and measuring change over time.


 * Reader engagement
 * article feedback represents a way to encourage participation from the reader community.

Research Findings

 * For detailed research updates on the Article Feedback, please refer to this page

Reader Feedback and Article Quality
Can readers meaningfully measure the quality of an article, e.g. how complete it is? This is a challenging question to answer because the base measures of quality can be difficult to quantify (e.g., how can article completeness be measured?). We’ve made some simplifying assumptions, and based on the data we’ve analyzed so far, we find that in some categories, reader feedback appears to be correlated with some objective measures of an article.

For example, in articles under 50kb, there appears to be a correlation between the Trustworthy and Complete ratings of an article and the article’s length. The longer the article is, the more Trustworthy and Complete it is considered by readers (analysis may be found here). In other categories, there does not appear to be much of a correlation between objective measures of the article and ratings.

At an aggregate level, there appears to be some alignment between reader assessments and community assessments of an article’s quality. Of the 25 most highly rated articles in the sample (average rating of 4.54), 10 are either Featured Articles or Good Articles (3 Featured and 7 Good). Of the 25 most poorly rated articles (average rating of 3.29), there is only one Good Article and no Featured Articles.



Feedback from Experts

We also provided a method for users to self-identify as knowledgeable about the topic. A set of checkboxes allow readers to indicate whether they are knowledgeable (in general) about a topic and, if so, the specific source of their knowledge. The main goal of this feature is to determine whether self-identified experts rate differently than users who did not self-identify as experts. While there are many more things we can do to verify expertise, self-identification is an easy first-step to understanding expert rating behavior.



Our preliminary results indicate that overall, experts show a similar distribution of ratings as do non-experts (see analysis here). But when individual articles are analyzed, a different pattern emerges. It appears that users who claim general knowledge do not rate substantially different than non-experts. But users who claim specific knowledge from either studies or their profession appear to rate differently (see analysis here). Based on the limited data collected so far, it appears as though expert ratings are more diverse than non-expert ratings of the same article.

We have only scratched the surface of analyzing the correlation between ratings and actual changes in article quality. We hope that the Wikimedia community and the research community will use the rating data dumps, which we will make publicly available, to continue this analysis.

Ratings as a Way to Engage Readers
Invitations to participate

As part of the v2.0 release in March, we introduced some “calls to action”, or invitations to the reader to participate after once they’ve submitted a rating. There are three different calls to action currently being tested:
 * 1) Create an account
 * 2) Edit the article
 * 3) Take a survey

Here is a summary of the results (The detailed click-through analysis may be found here):

The data show that 40% of users who are presented with the option to take a survey after completing their rating end up clicking through. And even though the call-to-action asks the user to complete a survey, some readers took the opportunity to provide feedback on the content of the article via the open text field. We observed something similar during the first phase of feature. During the first phase, there was a “Give us feedback about this feature” link.

Though the link specifically asked for feedback on the feature (as opposed to the content of the article), some readers provided rather detailed comments on the article content. While the comments field of the survey had its fair share of vandals and useless comments, there are clearly some readers who want to provide constructive feedback about the content of the article. The notion of these readers wanting to contribute to Wikipedia is reflected both in our user interviews as well as the survey results. Forty-four percent of survey respondents indicated that they rated because they hoped that their rating “would positively affect the development of the page” and 37% of respondents rated because they “wanted to contribute to Wikipedia.” These results show that an easy-to-use feedback tool is a promising way to engage these readers.



The “Edit” call-to-action also received a 15% click-through rate. While lower than the 40% who completed the survey, a low-teens percentage click-through rate is still significant, especially considering that these users probably had no intention of editing the article at the time they submitted the rating. This result suggests that a certain set of users, when presented with the option, would like to edit the article they just rated. More analysis, however, needs to be done on the actions after a user clicks on the call-to-action. The preliminary data indicate approximately 17% of these users end up successfully completing an edit, though the absolute numbers are still very small and we need more observations to establish statistical significance. We also need a measurement of the quality of the edits. We don’t know whether these edits are constructive, vandalism, or bail actions (user clicks save just to get out of the screen).

We intend more experimentation with these and other calls to action in the future.

Volume of Ratings

The Article Feedback tool is currently on approximately 3,000 articles, less than 0.1% of the total number of articles on the English Wikipedia. Over the past 1.5 months, however, over 47,000 individual users have rated articles:



In comparison, the English Wikipedia has approximately 35,000-40,000 active editors each month. With this experimental deployment, we can see that the number of users willing to rate an article exceeds the number of users willing to edit an article by at least an order of magnitude. Not only does the feedback tool offer a way to engage more users, some of these users may end up editing, as the call-to-action data show.

Additional Findings

 * Three out of six raters in a small-scale user test did not complete their rating action, neglecting to press the "Submit" button. Recent revisions of the feature add a reminder to submit the rating.
 * Based on interviews with raters, in the second version of the feature, the category "readable" was changed to "well-written", "neutral" to "objective", and "well-sourced" to "trustworthy".
 * Raters who completed the survey call-to-action used the "other comments" section as a way to express opinions both about the tool itself, but also about the article they are rating, as well as Wikipedia as a whole. Among these responses are many which would make very useful talk page contributions, and many respondents seem also likely to be the kinds of people who could be motivated to edit. However, there is also a significant percentage of useless/noise responses, highlighting the need for moderation or filtering to the extent that free-text comments are integrated with the tool.
 * Our user studies have highlighted that readers do not consider rating necessarily to be a form of "feedback". The tool does not currently use the term "feedback" in the user interface, and we may add feedback features such as free-text comments in future, so this does not have major implications at this time.

Current Plan (June 2011)
Given the encouraging results of the trial so far, we rolled out the feedback tool to approximately 100,000 articles of the English Wikipedia in the week of May 9, 2011. Articles were selected at random (versus the stratified-random sampling we used for the 3,000 articles). The primary goal of this rollout was to test the scalability of the feature. We will continue to analyze the ratings, survey, and click-through data, and we are working on an experimental dashboard.

The idea behind the dashboard is to surface articles to the editing community that are being rated very highly or very lowly. The use of this information, if it is used at all, is up to the community. Please provide feedback on this feature on the talk page of the feature.

One June 30, 2011 the feature was enhanced to add explanatory tooltips for each rating (i.e., "What does one star mean?").

We're planning to ramp-up use of the feature starting July 12 (at the earliest) due to a dependency on some upd2log work that will help with the performance of this and other features. This schedule may still change.

In the longer term, we hope to be able to explore some more complex ideas, including:


 * enabling readers to leave free-text comments specifying issues with the article
 * enabling readers to praise the authors of an article, and enabling authors to receive praise
 * enabling editors to collaboratively filter rater comments and issue reports, and to promote comments of special significance to the talk page
 * enabling raters to attach credentials to a Wikipedia user account, which will then be associated with their rating.
 * improving the user interface integration of the tool to reduce initial screen real estate usage while increasing visibility/discoverability.

These ideas are described in more detail on the extended review page, including first wireframes for a more comprehensive rating and feedback tool.

Future Enhancements
As we are still learning about the use of this feature, no decisions on future enhancements have been made. Ideas are being collected in the Idea Log.

Data
Anonymized rating data are available for download.