Topic on Talk:Reading/Readers contributions via Android

Let's look at Commons, what does the project affected says about this

9
Sänger (talkcontribs)

I'd like to show you this thread, and will cite the (at this time) last post in there:

Totally inadequate. Proponents need to explain what they've learned from c:Commons:Village pump/Archive/2013/04#Mobile Web Uploads turned off in stable, c:Commons:Village pump/Archive/2013/04#Missing author/source parameters on mobile uploads: fix coming, c:Commons:Village pump/Proposals/Archive/2016/08#Rfc: Should we request a configuration change to shut down cross-wiki uploads?, c:Commons:Mobile access/Mobile upload needing check#Background and c:Category:MobileUpload-related deletion requests and how they plan to avoid making the same mistakes all over again. LX (talk, contribs) 12:22, 23 January 2017 (UTC)

Who came up with this new selfie-bomb-device this time? Grüße vom Sänger ♫(Reden) 14:57, 23 January 2017 (UTC)

Melamrawy (WMF) (talkcontribs)

@Sänger, thanks for keeping an eye and making sure you deliver part of the feedback here from Commons here. Just to echo what @Jkatz (WMF) says, these are ideas, and why are sharing them in an elaborate and visual way, because visual is more clear than just text. The idea is to listen to feedback, then decide whether we need to reject, iterate, or radically change, or just continue to develop certain idea. This is not meant to "sneakily" push certain ideas, it is just our need to start a conversation very early, in "thinking" phase, as a better practice of collaborative planning. Let's not panic :) ideas don't hurt, as long as they are ideas, and are not decided upon without discussion. Grüße vom :)

Jkatz (WMF) (talkcontribs)

@Sänger Thank you for cross-posting this and alerting us to the activity on the other thread. We posted on commons exactly to make sure we got this kind of feedback- many of us here at the foundation have heard about the various rounds of selfie-pocalypse, but don't have all of the details. Thanks to @LX for the list and I totally agree with your stated requirement. The reason we are talking so early in the development process---the idea stage, is to make sure we are aware of all of the details and specific concerns before we take any further steps. This will something we are committed to doing, just something we wanted to explore. Along those lines, the current wireframes are not intended to be prototypes, but simply ways of illustrating the ideas because we had heard that the textual descriptions were inadequate. I also want to recognize your reasonable frustration that we're talking about this again. I think the reason this comes up time and time again, because it represents a meaningful opportunity--if we can address the issues raised. I haven't read through them yet, so will refrain from further comment until I do. Rest assured, these are not being built and will not be built until we acknowledge and address the core issues raised. I'll this to the commons thread as well.

197.218.80.182 (talkcontribs)

This is a matter of lack of resources. It isn't simply a thing that can be solved by better tooling. Here's whyː

  • Copyright is not clear cut - Even experienced lawyers may need to do extensive research for days to determine the copyright status of certain items
  • Users lack training - the notion that one can just wake up one day and understand all the ins and outs of copyright is naive.

This is quite simply a case were people have bitten off more than they can chew, and this can be frustrating. To give some examples on the scale of work:

Content Type number Average Size Type of work time taken
Articles 20 300 words Proofreading 400 minutes
Video 20 120 minutes Check for usefulness > 2000 minutes
Articles 20 300 words Copyright checking > 600 minutes (less with automation)
Video 20 120 minutes Copyright checking > 2500 minutes
Images 20 500KB Check for usefulness < 30 minutes
Images 20 500KB Check for copyright > 60 minutes

̈Now these numbers seem like they've been pulled out of nowhere and they could be somewhat off. But they simply serve to illustrate that no matter what degree of automation one has, a whole video needs to be viewed to verify that no frame infringes on copyright, the same would be applicable to audio. Images like gifs may contain more than one frame that could be copyrighted. Then there's also the prohibitive bandwidth cost, e.g. +/- 20GB (20 videos) vs +/- 2 MB ( 20 articles).

This is the reason why most free video / image upload services rely on reports for takedowns rather than checking every single upload. Even if the Commons community got 100 more contributors, better tools, and WMF completely disabled uploads for a month, they wouldn't be able to completely clean up the existing backlog. There is also the political debate that while content may be free it may not be wanted there, e.g. a selfie may be in the public domain, yet the "community" decides it doesn't want it.

The issue here is that media curation needs many more contributors (possibly 1000s) AND easy to use media curation tools.

Jkatz (WMF) (talkcontribs)

@"IP address", do you happen to have source for this? I couldn't find it and would love to see additional context. Thanks!

197.218.88.75 (talkcontribs)

Ha , interesting, this topic is still ongoing. These were mostly common sense facts that evaluated using simple estimates on the cost of reviewing text vs the cost of reviewing media. Anyway, as an individual who in the past done academic research, it is always enjoyable to provide evidence to hypothesis. So below are some articles from non-verifiable sources, and others from what seem like legitimate academic sources do provide evidence that those assertions may have even been too conservative:

Non-academic (but recommended read) for all wikimedia staff members

Academic

Media content

Text based content

The discussion on the verge site clearly notes how many thousands of man-hours and contributors are needed to clean up multimedia content (both day shift and night shift workers). The text based studies on plagiarism, and by extension copyright infringement (if enough is copied) makes a mention that "the average time for finding a match was 3.8 minutes"[1].

Note that some academic sources are full theses, and in great detail highlight how simple measures are ineffective at truly detecting copyright infringements, and how attempting to curate content by forcibly allowing only free content is time intensive and ends ignoring legitimately useful and free content.

Even wikia (with thousands of wikis) has something [2] to review images. No matter how desperately wikimedia wishes to take a laissez-faire approach to media curation (it needs proper well trained staff reviewers) such an approach will come back to bite the organization in the end, as it did in the past.

As lawyers (in movies) like to say, this undeniably provides reliable evidence to prove the point, and "I rest my case".

[1] - http://search.proquest.com/openview/f8d85f41657225e2ed943f3cc0455166/1?pq-origsite=gscholar&cbl=35369

197.218.88.75 (talkcontribs)

It is also easy to provide evidence for the note above about frames. A study [1] looked into 2000 videos for harmful content, and captured 630,733 at 10 second intervals. They then manually (using humans) identified that "the proportion of harmful images contained in the harmful videos is 66%". This means that they had see at least 5.3 minutes per video, and their video was about 53 minutes average. That doesn't count download time, and connection speed waiting for it to download. Depending on resolution it may be around 50 MB per video, or 1 GB, to download just 20 videos.

The differences are that they took a sample of the videos, and humans can't possibly analyse it that fast as Wikimedia / mediawiki extensions provide no way of skipping through video like that. Also by legal standards, even if the whole video contains 100% free images, someone could stash 120 minutes of several copyrighted albums. In fact, the study concluded that audio analysis was one relevant area that they could look into for future research.

[1 - https://www.researchgate.net/publication/226536139_A_Comparative_Study_of_the_Objectionable_Video_Classification_Approaches_Using_Single_and_Group_Frame_Features]

Jkatz (WMF) (talkcontribs)

Thank you for this! You can rest your case ;)

Jkatz (WMF) (talkcontribs)

Thanks for this. It definitely helps me understand the scope of the problem. I think the close-to-zero tolerance policy for for potential copyright violations is very onerous. Its a topic for another thread, but I wonder if we are all okay with the cost (in terms of labor, limited contributions, and upset innocents) of such a stringent approach. The reliance on user reports for takedowns seems to work legally for other orgs, but I know we have higher standards. Perhaps there is something in between.

Reply to "Let's look at Commons, what does the project affected says about this"