Talk:Reading/Multimedia/Media Viewer/Survey

From MediaWiki.org
Jump to navigation Jump to search

Bias[edit]

"Is this media viewer useful for viewing images and learning about them?" The response to such a question must not be summarised as a response to "Is it useful?", because it's not what you asked. The two questions would be equivalent only if "for viewing images and learning about them" was irrelevant, i.e. if the one and only criterion for usefulness of our wikis was the "viewing images and learning about them". --Nemo 06:59, 26 April 2014 (UTC)

+1. Moreover, the interesting question would have been "Is this media viewer better than the previous solution?" --84.130.143.14 09:39, 9 June 2014 (UTC)
Totally biased. I actually hate the interface, but I answered yes. Because it is "useful." Had I known they were using this as an approval survey I would have said no. I also submitted a response to the text field that I noticed was not included in the tabulated responses. "How do I turn it off for good?" 50.59.88.46 17:16, 1 August 2014 (UTC)

Biased survey wording[edit]

Copied from w:en:Wikipedia:Media_Viewer/June_2014_RfC#Biased_survey_wording

The survey pre-selects by only eliciting feedback from users who remain on the image page long enough to find the small feedback link; users who are not comfortable with the image viewer will have closed the page and sought other ways to get to the Commons page, and so would be least likely to leave feedback; users who are comfortable with the image viewer because they are image focused rather than information focused, will be more inclined to find the feedback link and leave feedback. And when the feedback page is found, the question is: "Is this media viewer useful for viewing images and learning about them?" rather than: "Is this media viewer more or less useful than going direct to the Commons page?" If the survey doesn't offer an alternative, but only focuses on the current item, then the response is going to be ill-informed and limited, and will incline to what the user is looking at. It's like putting $10 on a table and asking people: "Would this £10 be useful to you?" A fairer question would be: "Which is more useful to you: £10 or the equivalent in your own currency?" Offer people appropriate alternatives, and you get more accurate feedback. SilkTork ✔Tea time 16:21, 23 June 2014 (UTC)

I also note the survey is introduced with this wording: "We'd like your feedback on the 'Media Viewer' feature you are now using. This feature improves the way images are displayed on Wikipedia, to create a more immersive experience. What do you think about this new multimedia experience?" So, even before the user takes the survey they are planted with the assertion that the image viewer "improves" the way images are displayed rather than the more neutral "changes" the way images are displayed. SilkTork ✔Tea time 16:44, 23 June 2014 (UTC)
@SilkTork: these are good points. There are many reasons to question the usefulness of the statistics provided, as discussed in other sections; but I find your comments especially insightful. Just wanted to acknowledge. -Pete (talk) 16:45, 26 June 2014 (UTC)
Wholly agree the feedback option is fairly unnoticeable (took me a couple weeks). When I first encountered it, I did as you said "sought other ways to get to the Commons page". Finally frustrated beyond composure, I have been trying to find a way to eliminate this thing for the past hour (I still don't know how to disable this thing, BTW). The survey I finally discovered looked like it came from the offices of Goebbels. Talk about biasing the results in ones own favor. :( — al-Shimoni (talk) 00:42, 7 July 2014 (UTC)


Survey Renormalization To Match Our Readership[edit]

The survey tries to look at general Wikipedia use across languages, but the sampling was way way off from our actual readership. This has resulted in very misleading results. Readership firgures were obtained from https://stats.wikimedia.org/wikimedia/squids/SquidReportPageViewsPerLanguageBreakdown.htm Note that 3.3% of Wikipedia pageviews are for the language-neutral and content-free Wikipedia Portal page, this 3.3% is factored out in my final calculations. Hungarian accounts for 0.3% of Wikipedia readership, but makes up 5.8% of survey responses, a 19.3x over representation. Hungarian is the 19th most used Wikipedia language, I have no idea how it landed on a top-8 survey language list. Dutch accounts for 1% of Wikipedia readership, but makes up 9% of survey responses, a 9x over representation. Dutch is the 11th most used Wikipedia language, it doesn't belong on a top-8 language list. Catalan accounts for 0.09% of Wikipedia readership, but makes up 0.7% of survey responses, a 7.7x over representation. Catalan is the 33rd most used Wikipedia language, ranking it a solid almost-one-tenth-of-one-percent above Klingon. French accounts for 4.7% of Wikipedia readership, but makes up 21.2% of survey responses, a 3.05x over representation. Portuguese accounts for 2.8% of Wikipedia readership, but makes up 7.7% of survey responses, a 1.85x over representation. Portuguese is the 9th most used Wikipedia language, and shouldn't appear in a list of 8 either. Spanish accounts for 7.1% of Wikipedia readership, but makes up 18.6% of survey responses, a 1.77x over representation. German accounts for 6.6% of Wikipedia readership, but makes up 6.3% of survey responses, 0.96x under representation. English accounts for 45% of Wikipedia readership, but makes up 30.8% of survey responses, 0.68x under representation. The 3rd largest Wikipedia readership, Russian, is completely unsampled. Japanese (5th), Chinese (7th), and a long list of rare (but cumulatively significant) languages are also absent. All told 26.4% of Wikipedia readership was left out. There is no way truly fix this absence, but we can recalculate the over/under representation as if this 26.4% of the world didn't exist. Restricting our work to the available 8 sampled languages we get he following adjusted normalization factors:

Hungarian is over represented by 13.07x

Dutch is over represented by 6.08x

Catalan is over represented by 5.26x

French is over represented by 3.05x

Portuguese is over represented by 1.85x

Spanish is over represented by 1.77x

German is under represented at .65x (The survey counted the German Readership as less than 2/3rds of a person each)

English is under represented at .46x (The survey counted English Readership as less than half a person each.)

When the raw survey data is reweighted using the above factors to account match our actual readership I get the following final survey totals:

  • Useful for viewing images and learning about them: 39%

  • Not useful for viewing images and learning about them: 50%

  • Not Sure: 10%

The fairly dramatic shift in percentages is due to the unfortunate coincidence that both languages where MediaViewer was heavily rejected were both grossly undersampled, the fact that these two languages alone comprise more than half of all Wikipedia readership (!), and the fact that there was wild over representation of fringe languages where MediaViewer polled positively. I couldn't even begin to speculate why MediaViewer polled so much higher in fringe languages, but whatever the reason, the polling methodology generated very misleading results. The conspicuous absence of major languages Russian Japanese and Chinese also undermines the value of the results, but probably not by more than a few points.

As a significant side note, this reexamination of survey data may help explain some of the division between between WMF's positive office view of MediaViewer statistics, and WikiEditor's "in the field" view that MediaViewer is worse than our normal image page.

I'm a math geek, but renormalizing survey data isn't something I've had to do very often, chuckle. I heartily invite anyone to recheck my results. I only used two significant figures in some steps, but any rounding effects should be negligible compared to the noise inherent in the original data. Alsee (talk) 23:47, 23 August 2014 (UTC)

Thanks for the calculations: I've not reviewed the method or numbers, but it's true this is common practice and it was useful. I have a doubt now: what's the result if you consider number of active editors instead of page views? The reader doesn't exist, but responding to a survey is definitely some form of activity, probably comparable to at least one edit: well, en.wiki had, in June, 104032 users making 1 edit vs. 206280 for all Wikipedias. I guess it doesn't change much compared to portion of visits, but for some wikis the difference may be larger. --Nemo 07:29, 29 August 2014 (UTC)
@Nemo I did a few preliminary calculations. Catalan has an exceptionally high proportion of editors (or an exceptionally low readership), German has a notably low proportion of editors (or a notably high readership), but overall it just doesn't matter much. Across all languages, larger readerships consistently have larger editorships. Across all languages, editors are consistently more negative on Media Viewer. (Maybe readers are more likely interpret "Is it useful" as asking "Does it work", whereas editors are more likely to interpret the question as asking "Is it better".) Once you account for the horribly skewed sampling in the survey, any tweaks in interpretation don't substantially alter the final results. Alsee (talk) 01:48, 6 September 2014 (UTC)
Thanks for validating the gut feeling. --Nemo 07:49, 6 September 2014 (UTC)

Survey Results Update[edit]

Hi Alsee, I appreciate your perspective on the survey results, but think it is inappropriate for you to edit the Media Viewer survey results page to present your own interpretation of the approval ratings, based on weighted results — and to keep removing our original findings based on unweighted results.

I am not comfortable with these edits, because they unfairly favor your interpretation of the data over our original interpretation, when both are reasonable ways to analyze the data. So I have updated the survey results to present both weighted and unweighted approval ratings, in order to strive for a neutral point of view and present both sides of the analysis. And to avoid confusion, I have removed both your graph and ours, inviting readers to read the text notes instead. I believe this is a reasonable compromise that presents both perspectives in this matter, without taking sides.

More importantly, I have noted that these approval ratings should not be cited as conclusive evidence for this optional survey, as they are subject to self-selection bias, as explained in detail in these research clarifications. Keep in mind that the primary purpose of the survey was to collect user comments to improve the tool -- not to use approval ratings as a definitive measure of success. Though these ratings can point to some informative trends, they are not very reliable because the survey was not mandatory.

So I encourage all parties in this discussion to refrain from citing these approval ratings as primary evidence, particularly given that they are for a much older version of the software, which was very different than the new version. The new behavioral metrics we have been citing in our recent updates are much more reliable, and we recommend focusing our discussions on this new data instead. Thanks for your understanding. Fabrice Florin (WMF) (talk) 23:02, 9 October 2014 (UTC)

Fabrice Florin (WMF), Thanks for responding. I accept your compromise version. I would merely like to add a comment for your consideration. It would be silly at best, and deceptive at worst, to say "90% of global respondents consider combs to be not-useful" after surveying 900 bald people and 100 non-bald people. I hope you can see the problem there, chuckle.

Maybe I'm being sensitive, but I'd like to defend myself when you called my edits "inappropriate", and that I "keep removing". We are talking about only two instances. My first edit was a good-faith attempt to improve the clarity and value of information on the page. You seem to agree that my efforts were not unreasonable, you agreed to accept the substance of my edit into the page. The second instance was a revert to repair a conflict between the image and the text. Normally such an edit would be made immediately with merely an edit summary or a comment on the talk page. Out of an abundance of respect for the WMF's page, and a desire to avoid an appearance of warring over the page, I left a comment on your page and waited for you to comment on the issue or to fix it. After leaving the problem on the page for a week I fixed it myself without any apparent opposition. You seem to agree some sort of fix was appropriate, you just applied a different fix.

The only thing I see as possibly "inappropriate" is that I was bold enough to edit the page at all. But if reasonable good faith edits are viewed as "inappropriate" then I have to wonder why the WMF website is a wiki at all. Alsee (talk) 01:50, 10 October 2014 (UTC)
Indeed "inappropriate" sounds unwarranted. I've read the update on the other talk page but I don't see how it's related to this survey. If this survey stops being referenced to say things it doesn't prove, however, all the better. --Nemo 12:37, 10 October 2014 (UTC)