Article feedback/FAQ

What is the "Article Feedback Tool"?
The Article Feedback Tool (AFT) is an experimental feature that allows any reader of an article (whether they're editors or not) to quickly and easily assess the sourcing, completeness, neutrality, and readability of a Wikipedia article on a five-point scale. It's a way to increase reader engagement by getting feedback from readers on how they view the article, and it gives editors an easy way to see where an article needs improvement. In general, the pilot of this tool also reflects a shift in the Wikimedia Foundation's development processes towards more systematic experimentation and trials with new technology. We believe small experiments like this can be very useful in helping Wikimedia to innovate and learn.

How do I use the tool?
Just click on the star you want to assign each article and press the submit button. You can see average ratings for the same article by other users by clicking the View page ratings link

How do I disable the tool?
To prevent the AFT from appearing in articles, you must be logged in as an editor to a site such as Wikipedia that uses the AFT. Go to the "Appearances" tab of "My preferences", and check the "Don't show the Article feedback widget on pages" option in the "Advanced options" panel.

What are the goals of this project?
The Wikimedia community cares a great deal about quality, and employs a number of mechanisms for article self-assessment (varying by project and language edition). The English Wikipedia, for example, utilizes processes for nominating and selecting the best articles, and for tagging articles associated with specific WikiProjects by quality class.

There is, to date, no standard mechanism by which large numbers of readers can easily engage in quality assessment. We believe that such a mechanism provides multiple potential advantages:
 * A simple rating process could be an entry point to provide strong invitations to readers to edit, discuss, or participate in other way, as the initial evidence suggests.
 * Extremely high or extremely low ratings may be useful indicators that support community cleanup or article nomination processes.
 * We may be able to build upon this tool to develop a strong, standardized rating framework for content.
 * Large numbers of readers who rate an article to be of high quality provide an element of external validation of our self-assessments.
 * Even if the ratings are imperfect, they may reveal useful trends over time.
 * An easy-to-use tool is likely to scale well to a large number of articles.

We are also experimenting with the use of an easy-to-use feedback mechanism as a way to engage readers. We are currently testing several invitations to readers after they submit a rating (e.g., edit this page, create an account) and are measuring the levels of participation resulting from these invitations. More information may be found on the project page.

How can I send comments and feedback on this project?
Feedback and suggestions to improve this tool are welcome on this talk page.

Where is it being rolled out?
The Article Feedback Tool is currently deployed on approximately 100,000 randomly selected articles on the English Wikipedia (see announcement and blog post). It was previously deployed on a smaller sample of 3,000 articles. The tool is also enabled on approximately 700 articles that are part of the WMF's Public Policy Initiative.

How are pages selected for the AFT?
The article feedback tool can be activated on individual pages by adding the article to a hidden category:Category:Article Feedback Pilot. The activation of the AFT on the random sample of 100K articles was done via a "lottery" mechanism, based on the article unique id. Every time a new page is created, MediaWiki associates a unique number to it (the page_id, whose value is preserved across edits and renames, but not deletion and recreation). This number is what the ArticleFeedback extension used (until the ramp-up) to determine if a page should be rateable or not. If the last three digits of the page_id of an article are in a specified range (e.g. [000,026]) the page is added to the list; If the last three digits are outside this range the page is not added. This implies that, from time to time, new pages are added to the list. E.g.: the article French cuisine is rateable because the suffix of its page_id is (|11002) is within the AFT activation range.

What are some anticipated obstacles?
We'll keep a close look at some possible issues:
 * There may be attempts to game the system. We will therefore carefully analyze rating behavior in this first pilot.
 * The tool may be used by people whose knowledge on the topic they are assessing is limited. We will survey the users of the system, and will also compare the rating information we will receive against other rating instruments.
 * Some rating designs may reduce the incentive to edit or discuss issues. In this pilot, we will not get things exactly right, but we are well aware of this potential issue, and plan to carefully study how ratings influence user interaction as a whole.
 * Low-traffic articles may receive an insufficient number of ratings, or ratings may date too quickly to be useful as articles change.

Won't these ratings measure information a computer could easily predict (e.g. number of citations)?
We believe that the ratings will correlate very strongly with heuristics that could be developed to predict the same quality characteristics of an article, such as standardized readability tests. Aside from some toolserver scripts, we don't currently have standardized heuristics of this type, and this may be an area of future development and exploration.

However, we also hypothesize that the most interesting human-generated ratings are those that substantially deviate from what the heuristics would tell us. For example, if a large number of readers rate an article as poorly sourced, even though it has many citations, that is likely an article worthy of further examination.

Do you have plans to roll it out to a larger number of articles?
As of July 2011, the tool was deployed on approximately 100,000 pages on the English Wikipedia. We continued to monitor both the performance characteristics of the feature and the value the feature provides. On July 13 we started a gradual deployment of AFT to the English Wikipedia. For those interested, we continue to do quantitative and qualitative research and use the results of the research to refine the tool. We also welcome members of the Wikimedia community to join our discussion to evaluate the feature and think about the future of content assessment in Wikimedia projects.

Are you suggesting that a high or low rating means that an article needs to be improved?
No. At this point, we're experimenting with this tool. It's possible, for example, that reader-driven quality assessment will only be useful on some types of articles, or will not be useful at all. We've introduced an experimental dashboard feature, which may help in assessing the usefulness of reader-submitted ratings.

How will the tool deal with multiple votes from the same account or IP (i.e., ballot stuffing)?
Only the most recent ratings from a single user are used in the calculation. If you rate an article with three stars, and then later rate it four stars, only the four star rating will be included in the calculations.

How will out-of-date ratings be handled?
Your rating will become "expired" after 30 revisions of the article. This will be denoted by a message advising you that you may wish to re-rate the article. Simply click stars again to submit your new rating.

How are the averages calculated?
Average ratings for each article are calculated based on an arithmetic average of all the ratings submitted against the last 30 revisions of the article (i.e., all "unexpired" ratings). The idea behind this calculation is to balance the need to reflect changes in the article (e.g., different revisions may have different qualities) with the fact that some articles receive very few ratings. Averaging over a longer series of revisions yields more ratings, but the ratings will correspond to more revisions, some of which may be substantially different. Averaging over fewer revisions yields fewer ratings, so the averages may be more easily skewed by ratings that reflect strong opinions (and perhaps vandalism). We will continue to experiment with different methods of showing averages and welcome discussion on the project talk page.

What's the ArticleFeedback dashboard all about?
We've introduced a dashboard which ranks the pages with the highest and lowest ratings over the past 24 hours. The idea behind the dashboard is to surface articles that have recently received both high and low ratings and experiment with different ranking algorithms. It is up to the community to determine the usefulness of this information (if it is useful at all).

Currently, a dashboard of highs-and-lows is updated every hour and includes articles that have received at least 10 edits within the past 24 hours. We decided to apply the 10-edit threshold to avoid the situation where the highs (as an example) simply consist of articles that, say, have received one ratings submission of all 5's. As a result, the articles that appear on the dashboard reflect, to a certain degree, what's popular on Wikipedia within a given 24 hour period (see post). We will continue to experiment with the dashboard, so please provide your feedback on the project talk page.

Is data generated by the AFT publicly available?
We are releasing regular weekly dumps of the data we collect via AFT as well as making anonymized data available on the Toolserver. Further information on the data dumps can be found on this page.