Talk:Article feedback/Public Policy Pilot/Workgroup

Other projects
What about testing the extension on smaller wikis? Some Wiktionaries use a JavaScript-based tool to gather feedback, I suppose they would be interested. --Nemo 06:55, 15 September 2010 (UTC)
 * Ok, now I'll add the link: wikt:en:Wiktionary:Feedback (see also interwiki); and don't forget strategy:Special:RatedPages (there are lots of comments on the wiki about it). I don't understand why you're developing a new feature with a pilot on (a small part of) Wikipedia while there are several other projects that eagerly need such a feature, and in fact are already using something similar (but much more crappy). --Nemo 07:01, 24 September 2010 (UTC)
 * Hey Federico!
 * I totally missed this given the flurry I've been trapped in over the past couple days and for that I apologize.
 * The answer is this: part of the reason we are doing this on such a small article subset is to actually ensure that the technology works and see immediate problems. While I don't see any moral or political reasons not to enable it in other places, the extension is slated for a series of rather rapid, iterative changes (hopefully improvements). So my advice is to wait a bit; I'm about to start on design for phase 2 (some of the feedback we've already gotten dovetails with what was expected, and we're going ahead and implementing it).
 * In the meantime, I'd love it if you joined the workgroup and gave some ideas. You're a smart guy and can see around corners a lot.
 * I know that's not the answer you were looking for but I hope that helps.--Jorm (WMF) 19:16, 24 September 2010 (UTC)

Download Extension
I would be interested to download this as extension for other wikis. --Pdcemulator 09:34, 15 September 2010 (UTC)
 * There's no reason why you cannot. The extension itself is named "ArticleAssessmentPilot".  However, you should know that this is a pilot program, and the extension itself should not be considered "final". If we decide to move forward, it will likely be renamed as well.--Jorm (WMF) 18:52, 15 September 2010 (UTC)
 * FYI, there are other more "finalized" extensions available for this general purpose as well, such as Extension:ReaderFeedback and Extension:AjaxRatingScript, in case Pdcemulator is not aware. Thorncrag 02:37, 24 September 2010 (UTC)

Assorted comments
This feature has a lot of potential, but the current implementation sucks.

A bit of background
First, we need to establish that rating articles/entries is not a new idea. The English Wiktionary, for example, has been doing this for years. You can look at wikt:User:Conrad.Irwin/feedback.js for the code that the English Wiktionary uses. A few key points about the English WIktionary's implementation:
 * because it's implemented in JavaScript, it only works for users who load and run JavaScript on this domain;
 * it only displays for anonymous users;
 * it displays in the sidebar;
 * it appears to only work in the (now antiquated) Monobook skin currently;
 * it uses a number of simple metrics for articles with a simple one-click interface; the options to choose from are: [this entry is] "good," "bad," "messy," "mistake in definition," "confusing," "could not find the word I want," "incomplete," "entry has inaccurate information," "definition is too complicated," and finally "if you have time, leave us a note."

Current ArticleAssessment implementation
The current implementation of ArticleAssessment has a few niceties:
 * it's implemented in PHP with a proper database backend;
 * it has a nice UI for rating an article (the stars are pretty).

But the main issues I see with it are:
 * it's enormous &mdash; the entire "view results" box shouldn't be shown at all until the user clicks something;
 * the metrics are terrible;
 * it's located at the bottom of lengthy articles, making it unlikely that anyone will see it; those who do see it will likely not want to participate because it looks complicated (as opposed to the one-click system that the English Wiktionary uses).

Room for improvement
My suggestions:
 * look at how a site like ted.com uses user feedback; the Wikipedias have hundreds of awesome articles that nobody knows about and they aren't sorted by anything useful currently; this tool could be adapted to create useful metrics, e.g., [this article is] informative, interesting, sloppy, boring, unintelligible, confusing (math articles, anyone?), biased
 * once you have ratings from users, you can generate all sorts of nifty tools; you can have the most interesting articles listed in a dynamic report; or you can have "select a random informative, well-sourced history article"; this is actually something that would be useful;
 * I understand and appreciate the desire to be unobtrusive, but the rating system needs to be more visible somehow; the sidebar is a good place to look at (esp. if you can reasonably collapse some of the interwiki links on long articles); it might also be possible to put an unobtrusive icon near the top of the page (the central focal point for nearly any article); mashable.com has been using a blue box at the top of articles&mdash;that's a bit much, I think;
 * further simplify the interface, but allow for more in-depth comments if the user wants to provide them.

Hope that helps, --MZMcBride 22:54, 24 September 2010 (UTC)

Response to MZMcBride's Comments
A couple or responses, so that the design rationale is better understood:

First, we decided specifically against allowing user comments with ratings. My opinion was (and I still hold it) that such comments will be either a) of little value or b) better as comments on the corresponding Discussion page. The options thus left us with two directions:


 * 1) They aren't stored anywhere except some random table. They would quickly become outdated or useless. They would require additional development to allow them to be visible, likely resulting in yet another tab ("View Rating Comments" or somesuch);
 * 2) We inject the comments as a new item on the Discussion page (either standard Talk or LiquidThreads). They are visible to be sure but since they are going to be left by users who do not normally engage in Discussion pages, any responses will be either ignored/unseen or become confusing to the user. (Users who understand Discussion pages will already know to leave comments there).

Further, comments in such a form are likely (at this stage) to be about the tool and not the article.

I agree that, from the viewpoint of a Wikitionary, that comments at rating would be valuable, but Wiktionary entries do not spawn the same types of discussions that Wikipedia entries have, and this tool is targeted at encyclopedic content.

Second, the placement of the ratings box. The placement of the box at the bottom of the article is not by chance; it is very specifically by design. Placing it above, within the article space (or even in the side bar) does not help to ensure that the article has actually been read. If the article is 7 screens long and the tool is located on the first screen (say, below the language links), then users will be encouraged to rate the article before they have read it completely.

I agree that the current placement is sub-optimal; I'd prefer it to arrive before the reference list. However, we decided for ease-of-impact to place it as low as possible.

We are not mashable, nor are we Netflix or even Yelp. They have entirely different motivations for their ratings tools (they boil down to generation of clicks, which generates ad revenue [with the exception of Netflix, whose rating system is interestingly outside of scope]).

The exact unobtrusiveness of the tool is specific as well, for a couple reasons:


 * 1) Community Acceptance. It was early on determined that a "loud" ratings box would be received negatively by the community. My initial design had the View Ratings box and the Rate this Page box completely decoupled, with the View box at the top of the article (which is where I expect it will eventually live, should the tool be accepted).  We decided that this was too much for the community to accept in one dollop, so I decided to visually connect them (I personally see the tool as two "tools" with discrete purposes - purposes that, for all intents, are at odds with one another).  The vulnerability of the system to information cascade and anchoring is why results are hidden at the outset.
 * 2) It's Not the Point. The point of the article is the article, not the ratings box.  Sure, the ratings are another aspect of an article, but they are not the article itself (just as the History is not, nor the Discussion - even though I believe those are just as important).  I personally view the ratings histogram to be another vector (hah) within the History.

Regarding the display of the "View Results" pane at the outset: it must be obvious to the user that they can see the results of the article. A primary goal was to make the tool as minimal to use as possible (and a planned design is even more minimal than this one).

I cannot speak to the choice of metrics except to say:


 * 1) They are configurable. We can change them at any time (pending translations, of course)
 * 2) They are an experiment. I personally believe that we can approximate the expected values of three of them using analytics; the outlier is "Neutrality," which we may find is entirely useless on a metric scale (but may still be useful as a type of "honeypot" for reader venom).  One of the answers we hope to get out of the workgroup is a better set of metrics. (The workgroup goes beyond metrics as well: I want to get better formulas for "stale" and "expired", for instance.)

This tool, too, is effectively implemented in Javascript. The design decision behind that was one of performance: we want to reduce calls to the server database as much as possible. There are two questions we have to ask each time:


 * 1) Should the tool be displayed? (handled in php)
 * 2) Does the tool need to display existing ratings? (this is the big one, and handled via javascript)

As a result, simply injecting the html as the page gets rendered would have been easier but would also have been more of a burden. A full-scale roll-out would clearly be implemented in php, but for now it's done client-side.

As far as graphs and histograms go, that's on the plan. We did not have sufficient design or development time to include them (though in the early design comps there are indications as to where they should go).

There's a lot of stuff that is "in the plan" that didn't make it into this revision, by the way. There is a roadmap, and I'm currently working to get a framework written for it. For example, the concept of "expired" ratings isn't in the current version, and I'm keen to get it into the design. Also, the idea of "self-identified experts" - we'd like to track that. Even if we don't apply weight to self-identified experts, it makes for an interesting line in the histogram. There are even more aspects that lie further in The Deep (ways to tie this into discussion systems, or viewing user rating histories and the like).

This became a book. Sorry. --Jorm (WMF) 05:33, 25 September 2010 (UTC)

Thanks for a 95% constructive and helpful comment, Mz. ;-) I'll add to Brandon's note that the design of the system reflects its primary intentions. These are explained in Article feedback/Public Policy Pilot/FAQs, but specifically, the quantitative assessment of change-over-time across defined quality dimensions is one of the objectives for this deployment. That's a lot easier when you're dealing with a four variable / five point scale where the vast majority of ratings submit complete data for all variables, as opposed to a tagging system with forced prioritization, where your objective is to highlight predominant characteristics (you're surfacing that a video is "inspiring", but you end up with very little information about how many people think it's "long-winded").

That is not to say that we didn't discuss tagging systems -- we did, and I thank you for bringing up TED; I had only seen the output side of it, and your post inspired me to look at the input side. It's a very cool system, and I agree that a system like this could be very useful for precisely the kind of purposes you describe: surfacing articles with specific characteristics. Another direction to explore is the system employed by Newstrust, which offers a similar initial rating system to ours, and expands to allow for additional input for those who would like to provide it.--Eloquence 07:45, 29 September 2010 (UTC)

Workgroup open
Who is welcomed to join the workgroup? Thorncrag 05:47, 27 September 2010 (UTC)
 * Hi. I already answered your question in the blog comments. guillom 13:50, 27 September 2010 (UTC)
 * Oh, sorry; I did not see that the last time I checked.  Thorncrag 20:15, 27 September 2010 (UTC)

Colour of the Stars
Good morning, I've come to the test from an article in the German Signpost. Not sure whether this is the right place for feedback, but here goes: I do not like the colour of the rating stars for three - partly cross-cultural - reasons: Would you consider changing the colour of the stars? To a dark green or a blue possibly? --Minderbinder 05:51, 29 September 2010 (UTC)
 * In Central Europe, the Red Star is perceived as a symbol of Communism in general, and the Russian Army in particular. Neither of these are very friendly connotations for large sections of the user population here. Now I realize that your star is a bit more bulky than a pentagram, but a five-pointed red star is a five-pointed red star.
 * When teachers grade term papers and the like over here, red is used to mark errors. The more red in your paper, the worse it is. This runs exactly counter to the meaning implied here.
 * Red means stop, green means go. Again: red flags as markers of quality are trouble signs, not good things.