Ok, I'm a bit lost here between the info on this page and the pieces of information scattered in all the private emails I've missed. Email discussions are not the ideal channel for data analysis. It's very nice that we have data, but perhaps we could try to start by writing down the questions we want to answer? See usability:Multimedia:Preliminary user research for example. Can we do that here? It will give us a sense of what we want to know. guillom 16:54, 30 September 2010 (UTC)
- Thanks for the link. Parul and I do have a list of questions, but I don't think it's documented anywhere. We can definitely post some questions to help guide the analysis. Howief 20:57, 30 September 2010 (UTC)
- After discussing this with Howie and Parul, I've posted a list of questions at Article feedback/Public Policy Pilot/Workgroup#Questions. guillom 00:15, 1 October 2010 (UTC)
Note: there was an error in the formula for the standard deviations where 0's (no ratings) were included in the calculations. The data has been corrected. Also, the averages were checked to make sure they did not include 0's either, which they didn't. Howief 19:24, 4 October 2010 (UTC)
Difference in articles being rated?
I don't think the current data is enough to come to the preliminary conclusion that "registered users are “tougher” in their grading of the articles than are anon users"; I think that IPs are more likely to see and subsequently rate good articles while registered users are more likely to rate stubs, working off the main category or some other more representative sample. The charts for registered users looks a lot like what I expect a breakdown by article to look like, while I expect IPs to be landing on high-profile pages that have had a lot of editorial attention like w:United States Constitution. Nifboy 05:56, 8 October 2010 (UTC)
Forget about 5 stars, go for like/dislike
Hi, excellent work and study about ratings. I am happy to see progress in this direction. Google has already switched from 5 stars to the more binary like/dislike, same in facebook or slashdot comment ratings.
I am using these examples because these big names should help raising your eyebrows. There are also statistical arguments : First imagine nice users that really try to rate some content on a scale. Of course, we are all different, some user will set the "good" at 3 stars, another user sets the "good" at 5 stars, and since the best is rare, he will confuse best and good. In fact the diversity of judgment makes it hard to really use a scale, what happens most likely is that we end up with useless data. Statistically what do we expect ? a log normal curve ? a gaussian ? Scaringly, in the plots you report, some are very close to a uniform distribution, which means absolutely no exploitable information !
On the contrary, notice that _all_ the non anonymous users provide easily interpretable information : they rate in a quasi manichean, or binary fashion (either 5 or 1 stars, rarely trying to decide if it is 2 or 3...). Most of the plots for anonymous users fit a binomial distribution. If you take the mean of that kind of ratings, you lose the interesting information. Really, you lose it all ! What is interesting in a binomial distribution is to check how the two extremes compete. In other words, how many people like, how many people dislike a content.
We could go further into the statistical details, but I urge you to notice that these arguments are mathematical, and that a number of important players have already made a step in this direction.
Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data about Biographies
- What Makes a Good Biography? Multidimensional Quality Analysis Based on Wikipedia Article Feedback Data Lucie Flekova, Oliver Ferschke, and Iryna Gurevych. 2014