Talk:Article feedback/Public Policy Pilot/Workgroup

__NEWSECTIONLINK__  This page is a place for you to tell the Wikimedia Tech team what issues you encounter when using the Article feedback experimental tool during its trial period on Public Policy articles. See also the Frequently asked questions.

Please help us and leave a comment below and be done with it, or join the workgroup if you'd like to be further involved down the road.

We welcome your ideas, but please focus on the issues rather than on possible solutions.

→ Add your story

Other projects
What about testing the extension on smaller wikis? Some Wiktionaries use a JavaScript-based tool to gather feedback, I suppose they would be interested. --Nemo 06:55, 15 September 2010 (UTC)
 * Ok, now I'll add the link: wikt:en:Wiktionary:Feedback (see also interwiki); and don't forget strategy:Special:RatedPages (there are lots of comments on the wiki about it). I don't understand why you're developing a new feature with a pilot on (a small part of) Wikipedia while there are several other projects that eagerly need such a feature, and in fact are already using something similar (but much more crappy). --Nemo 07:01, 24 September 2010 (UTC)
 * Hey Federico!
 * I totally missed this given the flurry I've been trapped in over the past couple days and for that I apologize.
 * The answer is this: part of the reason we are doing this on such a small article subset is to actually ensure that the technology works and see immediate problems. While I don't see any moral or political reasons not to enable it in other places, the extension is slated for a series of rather rapid, iterative changes (hopefully improvements). So my advice is to wait a bit; I'm about to start on design for phase 2 (some of the feedback we've already gotten dovetails with what was expected, and we're going ahead and implementing it).
 * In the meantime, I'd love it if you joined the workgroup and gave some ideas. You're a smart guy and can see around corners a lot.
 * I know that's not the answer you were looking for but I hope that helps.--Jorm (WMF) 19:16, 24 September 2010 (UTC)

Assorted comments
This feature has a lot of potential, but the current implementation sucks.

A bit of background
First, we need to establish that rating articles/entries is not a new idea. The English Wiktionary, for example, has been doing this for years. You can look at wikt:User:Conrad.Irwin/feedback.js for the code that the English Wiktionary uses. A few key points about the English WIktionary's implementation:
 * because it's implemented in JavaScript, it only works for users who load and run JavaScript on this domain;
 * it only displays for anonymous users;
 * it displays in the sidebar;
 * it appears to only work in the (now antiquated) Monobook skin currently;
 * it uses a number of simple metrics for articles with a simple one-click interface; the options to choose from are: [this entry is] "good," "bad," "messy," "mistake in definition," "confusing," "could not find the word I want," "incomplete," "entry has inaccurate information," "definition is too complicated," and finally "if you have time, leave us a note."

Current ArticleAssessment implementation
The current implementation of ArticleAssessment has a few niceties:
 * it's implemented in PHP with a proper database backend;
 * it has a nice UI for rating an article (the stars are pretty).

But the main issues I see with it are:
 * it's enormous &mdash; the entire "view results" box shouldn't be shown at all until the user clicks something;
 * the metrics are terrible;
 * it's located at the bottom of lengthy articles, making it unlikely that anyone will see it; those who do see it will likely not want to participate because it looks complicated (as opposed to the one-click system that the English Wiktionary uses).

Room for improvement
My suggestions:
 * look at how a site like ted.com uses user feedback; the Wikipedias have hundreds of awesome articles that nobody knows about and they aren't sorted by anything useful currently; this tool could be adapted to create useful metrics, e.g., [this article is] informative, interesting, sloppy, boring, unintelligible, confusing (math articles, anyone?), biased
 * once you have ratings from users, you can generate all sorts of nifty tools; you can have the most interesting articles listed in a dynamic report; or you can have "select a random informative, well-sourced history article"; this is actually something that would be useful;
 * I understand and appreciate the desire to be unobtrusive, but the rating system needs to be more visible somehow; the sidebar is a good place to look at (esp. if you can reasonably collapse some of the interwiki links on long articles); it might also be possible to put an unobtrusive icon near the top of the page (the central focal point for nearly any article); mashable.com has been using a blue box at the top of articles&mdash;that's a bit much, I think;
 * further simplify the interface, but allow for more in-depth comments if the user wants to provide them.

Hope that helps, --MZMcBride 22:54, 24 September 2010 (UTC)

Response to MZMcBride's Comments
A couple or responses, so that the design rationale is better understood:

First, we decided specifically against allowing user comments with ratings. My opinion was (and I still hold it) that such comments will be either a) of little value or b) better as comments on the corresponding Discussion page. The options thus left us with two directions:


 * 1) They aren't stored anywhere except some random table. They would quickly become outdated or useless. They would require additional development to allow them to be visible, likely resulting in yet another tab ("View Rating Comments" or somesuch);
 * 2) We inject the comments as a new item on the Discussion page (either standard Talk or LiquidThreads). They are visible to be sure but since they are going to be left by users who do not normally engage in Discussion pages, any responses will be either ignored/unseen or become confusing to the user. (Users who understand Discussion pages will already know to leave comments there).

Further, comments in such a form are likely (at this stage) to be about the tool and not the article.

I agree that, from the viewpoint of a Wikitionary, that comments at rating would be valuable, but Wiktionary entries do not spawn the same types of discussions that Wikipedia entries have, and this tool is targeted at encyclopedic content.

Second, the placement of the ratings box. The placement of the box at the bottom of the article is not by chance; it is very specifically by design. Placing it above, within the article space (or even in the side bar) does not help to ensure that the article has actually been read. If the article is 7 screens long and the tool is located on the first screen (say, below the language links), then users will be encouraged to rate the article before they have read it completely.

I agree that the current placement is sub-optimal; I'd prefer it to arrive before the reference list. However, we decided for ease-of-impact to place it as low as possible.

We are not mashable, nor are we Netflix or even Yelp. They have entirely different motivations for their ratings tools (they boil down to generation of clicks, which generates ad revenue [with the exception of Netflix, whose rating system is interestingly outside of scope]).

The exact unobtrusiveness of the tool is specific as well, for a couple reasons:


 * 1) Community Acceptance. It was early on determined that a "loud" ratings box would be received negatively by the community. My initial design had the View Ratings box and the Rate this Page box completely decoupled, with the View box at the top of the article (which is where I expect it will eventually live, should the tool be accepted).  We decided that this was too much for the community to accept in one dollop, so I decided to visually connect them (I personally see the tool as two "tools" with discrete purposes - purposes that, for all intents, are at odds with one another).  The vulnerability of the system to information cascade and anchoring is why results are hidden at the outset.
 * 2) It's Not the Point. The point of the article is the article, not the ratings box.  Sure, the ratings are another aspect of an article, but they are not the article itself (just as the History is not, nor the Discussion - even though I believe those are just as important).  I personally view the ratings histogram to be another vector (hah) within the History.

Regarding the display of the "View Results" pane at the outset: it must be obvious to the user that they can see the results of the article. A primary goal was to make the tool as minimal to use as possible (and a planned design is even more minimal than this one).

I cannot speak to the choice of metrics except to say:


 * 1) They are configurable. We can change them at any time (pending translations, of course)
 * 2) They are an experiment. I personally believe that we can approximate the expected values of three of them using analytics; the outlier is "Neutrality," which we may find is entirely useless on a metric scale (but may still be useful as a type of "honeypot" for reader venom).  One of the answers we hope to get out of the workgroup is a better set of metrics. (The workgroup goes beyond metrics as well: I want to get better formulas for "stale" and "expired", for instance.)

This tool, too, is effectively implemented in Javascript. The design decision behind that was one of performance: we want to reduce calls to the server database as much as possible. There are two questions we have to ask each time:


 * 1) Should the tool be displayed? (handled in php)
 * 2) Does the tool need to display existing ratings? (this is the big one, and handled via javascript)

As a result, simply injecting the html as the page gets rendered would have been easier but would also have been more of a burden. A full-scale roll-out would clearly be implemented in php, but for now it's done client-side.

As far as graphs and histograms go, that's on the plan. We did not have sufficient design or development time to include them (though in the early design comps there are indications as to where they should go).

There's a lot of stuff that is "in the plan" that didn't make it into this revision, by the way. There is a roadmap, and I'm currently working to get a framework written for it. For example, the concept of "expired" ratings isn't in the current version, and I'm keen to get it into the design. Also, the idea of "self-identified experts" - we'd like to track that. Even if we don't apply weight to self-identified experts, it makes for an interesting line in the histogram. There are even more aspects that lie further in The Deep (ways to tie this into discussion systems, or viewing user rating histories and the like).

This became a book. Sorry. --Jorm (WMF) 05:33, 25 September 2010 (UTC)

Thanks for a 95% constructive and helpful comment, Mz. ;-) I'll add to Brandon's note that the design of the system reflects its primary intentions. These are explained in Article feedback/Public Policy Pilot/FAQs, but specifically, the quantitative assessment of change-over-time across defined quality dimensions is one of the objectives for this deployment. That's a lot easier when you're dealing with a four variable / five point scale where the vast majority of ratings submit complete data for all variables, as opposed to a tagging system with forced prioritization, where your objective is to highlight predominant characteristics (you're surfacing that a video is "inspiring", but you end up with very little information about how many people think it's "long-winded").

That is not to say that we didn't discuss tagging systems -- we did, and I thank you for bringing up TED; I had only seen the output side of it, and your post inspired me to look at the input side. It's a very cool system, and I agree that a system like this could be very useful for precisely the kind of purposes you describe: surfacing articles with specific characteristics. Another direction to explore is the system employed by Newstrust, which offers a similar initial rating system to ours, and expands to allow for additional input for those who would like to provide it.--Eloquence 07:45, 29 September 2010 (UTC)

Workgroup open
Who is welcomed to join the workgroup? Thorncrag 05:47, 27 September 2010 (UTC)
 * Hi. I already answered your question in the blog comments. guillom 13:50, 27 September 2010 (UTC)
 * Oh, sorry; I did not see that the last time I checked.  Thorncrag 20:15, 27 September 2010 (UTC)

Colour of the Stars
Good morning, I've come to the test from an article in the German Signpost. Not sure whether this is the right place for feedback, but here goes: I do not like the colour of the rating stars for three - partly cross-cultural - reasons: Would you consider changing the colour of the stars? To a dark green or a blue possibly? --Minderbinder 05:51, 29 September 2010 (UTC)
 * In Central Europe, the Red Star is perceived as a symbol of Communism in general, and the Russian Army in particular. Neither of these are very friendly connotations for large sections of the user population here. Now I realize that your star is a bit more bulky than a pentagram, but a five-pointed red star is a five-pointed red star.
 * When teachers grade term papers and the like over here, red is used to mark errors. The more red in your paper, the worse it is. This runs exactly counter to the meaning implied here.
 * Red means stop, green means go. Again: red flags as markers of quality are trouble signs, not good things.
 * colour significances vary. In the US, red is currently the symbol of the (right-of-center) Republican party  DGG 01:09, 30 September 2010 (UTC)

Feature Ideas

 * Include some context so that readers know why they are seeing the tool (e.g., a "What's this?" link with an explanation that the tool is part of the public policy project). See original post. Howief 20:08, 22 September 2010 (UTC)


 * Provide the ability to generate a graph of how the ratings have changed over time. The point of the software is to see how good an article is at any given time, but it would be super useful to actually see if the article was deeemed to have improved over time. -- Witty lama.

Shimgray's comments
See http://www.generalist.org.uk/blog/2010/article-ratings/. guillom 03:20, 1 October 2010 (UTC)

Comments from en-wiki

 * These are copied over from an ill-conceived discussion page on English Wikipedia. Sorry for the confusion, Nifboy and Peregrine Fisher. -Sage 


 * The one GA, Yucca_Mountain_nuclear_waste_repository, has mediocre ratings. Whatever that means. - Peregrine Fisher (talk) 20:21, 7 October 2010 (UTC)
 * Looking at the early data, are registered users and IPs rating the same articles or is there a difference in which articles IPs choose to rate versus registered users? If so, would this explain the discrepency between registered/IP ratings (if e.g. IPs are rating mostly good articles and registered users are rating mostly stubs)? Nifboy (talk) 21:04, 7 October 2010 (UTC)

Comments from Fetchcomms
I haven't seen anything too objectionable, and I'm not familiar with the technical side, but the one thing that bothers me is its placement. Can we move the feedback box after the categories? I think it makes the page flow better a bit. Also, is there a way to turn on the feedback tool for articles in a more "secure" way than just a category? I don't know what (maybe a MediaWiki-space listing of pages that need the feedback tool, or some special page to configure it, or something else), but that might be more useful from keeping people from inadvertently removing the category. Anyway, it seems to have worked fine for me so far. Fetchcomms 02:20, 8 October 2010 (UTC)

Re: Requesting feedback about the Article Feedback Tool
IMHO, the Article Feedback Tool looks and runs fine. It is placed in the right place, which is at the VERY bottom of the page and after every other boxes and templates. It "probably" is too large in width. When you are using a 700 px browser window or so a horizontal scrollbar appears. Probably, the two boxes, Your feedback and Page rating, could be merged into one box with "Your feedback" on the first column and "Page rating" on the second column. The whole box could be hidden by default showing the header only and a "show" link to reduce its impact on the page. The areas covered and the metric used for evaluating articles seem to be good enough, but I think that we can wait for the data analysis based on the results of the USPPI project to see if they need some changes. Probably, the correct sequence should be: 1. Complete; 2. Well-Sourced; 3. Neutral; 4. Readable. An article consisting of only a paragraph or two could be well-sourced, neutral and readable, but not complete. It would be good to have two types of results: 1. The historical data with all the ratings; 2. an updated version that takes into account "recent" ratings only. Obviously, the "significance" (or meaning) of "recent" has different meanings and values from article to article. Some articles are edited rarely and do not have more than a hundred of edits in all their history, while other articles have the same number of edits, but in one day. As the tool is a public tool, I think it would be better (and desirable) if we could use a measure system such as variance, standard deviation, median absolute deviation, ... that takes in account the deviations. For example, if an article has an average of 4.5 points in one area, then we could cut out all the new ratings with values less than 1 (or 2) for that same area. And, conversely, if an area has an average of 1.5 or so we could cut out future ratings with a value of 5 (or 4) for that area. More complicated is the case in which the area has not yet been rated, or if the area has an average of 2.5 points. I think that Amy can help on this point. In the future, should be taken into account the possibility of introducing those "horrible" CAPTCHAs or similar systems (for example making it available for registered users only) to ensure that the response is not generated by a computer. Well, that's all for now. All the best. –p joe f (talk • contribs) 10:42, 8 October 2010 (UTC)

P.S.: "Formatting" is not a subject matter for everyone including many "regulars", but what about adding "Illustrations"? –p joe f (talk • contribs) 13:20, 8 October 2010 (UTC)

Serious privacy concerns
Readers who do not have a Wikipedia account (or those who do, but aren't logged in) probably do not expect that their IP address is logged and then made publicly available when they vote on the feedback poll. I really can't see the benefit to anybody of displaying the IP addresses of previous voters, and I think that this is also a serious privacy/consent concern (people may expect their IP address to be logged when they participate in an online poll to prevent multiple voting etc, but not to have it displayed!). This would be a particular concern if "article feedback" ratings were extended to controversial content (e.g. sexual fetishes). When anonymous users make an edit to a page there is a large explicit warning that their IP address will be collected and put on public display with their edit in the article history. There is no such warning message for the poll. I'm not really convinced that usernames of logged-in users who voted need to be made public display either, but putting up IP addresses without an explicit warning sound like it could breach privacy expectations and maybe policy. TheGrappler 16:53, 9 October 2010 (UTC)

The neutrality field doesn't catch puff pieces
Public Citizen was, before I discovered and completely reverted/rewrote it, primarily a PR piece written mostly by Public Citizen staff (see w:User:HSDOnline). This was apparent to me the moment I saw the article. It had a neutrality rating of 3.5/5 across four ratings, which, if the early data is to be believed, about average.

After getting over my initial rage and disappointment, I think that, whatever data comes out of the neutrality field, it isn't even close to an actual measure of neutrality. I suspect what it will actually do is be a measure of conformity to public opinion, which will naturally be substantially lower for polarizing subjects (e.g. Abortion). Nifboy 04:44, 11 October 2010 (UTC)