Casual uploading has been tried before both for mobile and through the cross-uploader in VE, and caused outrage from the Commons community each and every time. Have you reviewed the previous discussions? What would your implementation bring to the table that hasn't been tried before? One idea that could be tested would be not to allow the user to upload a previous picture, but only an image taken by the app accessing the camera, but that would not really solve copyright issues in non-FoP friendly countries..
Talk:Reading/Readers contributions via Android
Hi @Strainu . Thank you! We are familiar with the fact that there has been outrage about content and copywrite violations in the past, but I have not been able to find the specific discussions. You seem to have a strong grasp, however. Can you send links or convey your understanding of the specifics.
You can start with phab:T120867 and related bugs and read the links given there. Not all of them will be relevant, of course, but you will most likely get a good picture of previous attempts and why they are considered as failed by the Wikimedia Community.
One of the best ways to obtain new uploads for articles would be based on wikidata. For example, if someone reads an article about a Tree in Brazil that doesn't have an image, the interface itself could urge the reader to help out by sharing an image of the tree if they are close to it (geoip).
One way to reduce the copyright issues is for example to allow even anonymous editors to suggest images from public domain / free resources for a particular article, e.g. pixabay, flicker, etc.
Instead of immediately adding them to an article, they'd be placed in a queue that editors can choose to use, if deemed useful.
@22.214.171.124 - Interesting idea. We hadn't thought of connecting to other public domain or other free resource API's as a tool for pulling in individual images. Something to look into for sure.
Hey there, the ability to look for a more suitable image from Wikimedia Commons rather than uploading from the user's own gallery is something we have incorporated in the mocks as an option (see wireframe: https://wikimedia.invisionapp.com/share/8H9YRVDZF) but we could probably flesh out this use case further given it is potentially as/more likely that an image may exist already in Commons for an article.
@126.96.36.199 – agree with your point as well regarding Moderation. The proposal for both the image and audio contribution ideas is to indeed add a 'moderation queue' to the mobile app first for acceptance/rejection rather than allowing immediate contribution. I've updated the page to show the specific wireframes detailing how the queue could work.
There is one interesting finding of this academic study of video uploads :
It shows that media popularity is a good signal to detect copyright infringement. Illegal media, or even just a basic movie, music will likely be shared somewhere, and if it becomes popular there will be an an odd surge of views and downloads.
Currently contributors have no way to even filter any related to media (e.g. movies, images or audio) metrics that will facilitate detection of copyright. So one possibility is simply to surface and prioritize such popular media, so readers and editors can quickly help evaluate it. Wikimedia seems to have this data, yet it is not surfaced anywhere.
Exposing such data (e.g. through an API or special page) will help both even desktop editors prioritize their efforts. In this specific case it would help prioritize audio for review because there is a reasonable chance that a sudden surge of popular audio (downloaded by users) is related to some copyrighted music.
The link doesn't work, but its a cool idea for videos and music--it probably doesn't apply to the random image that happens to have a coca-cola logo in the background. I wonder @Halfak (WMF) and his team of learned machines would be interested in tackling this?
That's odd, it can still be downloaded from here. The study looked into various indicators of copyright including the video metadata and search results. It is certainly an interesting study on automatically detecting video copyright violations:
Swati Agrawal and Ashish Sureka. 2013. Copyright Infringement Detection of Music Videos on YouTube by Mining Video and Uploader Meta-data. In Proceedings of the Second International Conference on Big Data Analytics - Volume 8302 (BDA 2013), Vasudha Bhatnagar and Srinath Srinivasa (Eds.), Vol. 8302. Springer-Verlag New York, Inc., New York, NY, USA, 48-67. DOI=http://dx.doi.org/10.1007/978-3-319-03689-2_4
This can also apply to images though. If someone shares a copyrighted image of maybe some nude famous individual and shares this somewhere it may quickly increase views. It can even be used to fight more sophisticated copyright infringers who embed computer games or other content inside images or other seemingly unpopular media (see T12847).
Some practical uses include detecting seemingly innocuous files that were uploaded ages ago, and then are suddenly downloaded at a very high rate, a new filtering tool to facilitate editor moderation (on the desktop), and scoring search results.
Specifically, for reader contributions this would fit in rather well by prioritizing some images or audio with very high views in the moderation queue, instead of adding all files to the queue randomly or by upload date.
There are many practical uses of this data.
Let's look at Commons, what does the project affected says about this
I'd like to show you this thread, and will cite the (at this time) last post in there:
Totally inadequate. Proponents need to explain what they've learned from c:Commons:Village pump/Archive/2013/04#Mobile Web Uploads turned off in stable, c:Commons:Village pump/Archive/2013/04#Missing author/source parameters on mobile uploads: fix coming, c:Commons:Village pump/Proposals/Archive/2016/08#Rfc: Should we request a configuration change to shut down cross-wiki uploads?, c:Commons:Mobile access/Mobile upload needing check#Background and c:Category:MobileUpload-related deletion requests and how they plan to avoid making the same mistakes all over again. —LX (talk, contribs) 12:22, 23 January 2017 (UTC)
@Sänger, thanks for keeping an eye and making sure you deliver part of the feedback here from Commons here. Just to echo what @Jkatz (WMF) says, these are ideas, and why are sharing them in an elaborate and visual way, because visual is more clear than just text. The idea is to listen to feedback, then decide whether we need to reject, iterate, or radically change, or just continue to develop certain idea. This is not meant to "sneakily" push certain ideas, it is just our need to start a conversation very early, in "thinking" phase, as a better practice of collaborative planning. Let's not panic :) ideas don't hurt, as long as they are ideas, and are not decided upon without discussion. Grüße vom :)
@Sänger Thank you for cross-posting this and alerting us to the activity on the other thread. We posted on commons exactly to make sure we got this kind of feedback- many of us here at the foundation have heard about the various rounds of selfie-pocalypse, but don't have all of the details. Thanks to @LX for the list and I totally agree with your stated requirement. The reason we are talking so early in the development process---the idea stage, is to make sure we are aware of all of the details and specific concerns before we take any further steps. This will something we are committed to doing, just something we wanted to explore. Along those lines, the current wireframes are not intended to be prototypes, but simply ways of illustrating the ideas because we had heard that the textual descriptions were inadequate. I also want to recognize your reasonable frustration that we're talking about this again. I think the reason this comes up time and time again, because it represents a meaningful opportunity--if we can address the issues raised. I haven't read through them yet, so will refrain from further comment until I do. Rest assured, these are not being built and will not be built until we acknowledge and address the core issues raised. I'll this to the commons thread as well.
This is a matter of lack of resources. It isn't simply a thing that can be solved by better tooling. Here's whyː
- Copyright is not clear cut - Even experienced lawyers may need to do extensive research for days to determine the copyright status of certain items
- Users lack training - the notion that one can just wake up one day and understand all the ins and outs of copyright is naive.
This is quite simply a case were people have bitten off more than they can chew, and this can be frustrating. To give some examples on the scale of work:
|Content Type||number||Average Size||Type of work||time taken|
|Articles||20||300 words||Proofreading||400 minutes|
|Video||20||120 minutes||Check for usefulness||> 2000 minutes|
|Articles||20||300 words||Copyright checking||> 600 minutes (less with automation)|
|Video||20||120 minutes||Copyright checking||> 2500 minutes|
|Images||20||500KB||Check for usefulness||< 30 minutes|
|Images||20||500KB||Check for copyright||> 60 minutes|
̈Now these numbers seem like they've been pulled out of nowhere and they could be somewhat off. But they simply serve to illustrate that no matter what degree of automation one has, a whole video needs to be viewed to verify that no frame infringes on copyright, the same would be applicable to audio. Images like gifs may contain more than one frame that could be copyrighted. Then there's also the prohibitive bandwidth cost, e.g. +/- 20GB (20 videos) vs +/- 2 MB ( 20 articles).
This is the reason why most free video / image upload services rely on reports for takedowns rather than checking every single upload. Even if the Commons community got 100 more contributors, better tools, and WMF completely disabled uploads for a month, they wouldn't be able to completely clean up the existing backlog. There is also the political debate that while content may be free it may not be wanted there, e.g. a selfie may be in the public domain, yet the "community" decides it doesn't want it.
The issue here is that media curation needs many more contributors (possibly 1000s) AND easy to use media curation tools.
@"IP address", do you happen to have source for this? I couldn't find it and would love to see additional context. Thanks!
Ha , interesting, this topic is still ongoing. These were mostly common sense facts that evaluated using simple estimates on the cost of reviewing text vs the cost of reviewing media. Anyway, as an individual who in the past done academic research, it is always enjoyable to provide evidence to hypothesis. So below are some articles from non-verifiable sources, and others from what seem like legitimate academic sources do provide evidence that those assertions may have even been too conservative:
Non-academic (but recommended read) for all wikimedia staff members
- Copyright in the Real World: Making Archival Material Available on the Internet
- Detecting Copyright Infringement on YouTube Videos using YouTube Metadata
Text based content
The discussion on the verge site clearly notes how many thousands of man-hours and contributors are needed to clean up multimedia content (both day shift and night shift workers). The text based studies on plagiarism, and by extension copyright infringement (if enough is copied) makes a mention that "the average time for finding a match was 3.8 minutes".
Note that some academic sources are full theses, and in great detail highlight how simple measures are ineffective at truly detecting copyright infringements, and how attempting to curate content by forcibly allowing only free content is time intensive and ends ignoring legitimately useful and free content.
Even wikia (with thousands of wikis) has something  to review images. No matter how desperately wikimedia wishes to take a laissez-faire approach to media curation (it needs proper well trained staff reviewers) such an approach will come back to bite the organization in the end, as it did in the past.
As lawyers (in movies) like to say, this undeniably provides reliable evidence to prove the point, and "I rest my case".
It is also easy to provide evidence for the note above about frames. A study  looked into 2000 videos for harmful content, and captured 630,733 at 10 second intervals. They then manually (using humans) identified that "the proportion of harmful images contained in the harmful videos is 66%". This means that they had see at least 5.3 minutes per video, and their video was about 53 minutes average. That doesn't count download time, and connection speed waiting for it to download. Depending on resolution it may be around 50 MB per video, or 1 GB, to download just 20 videos.
The differences are that they took a sample of the videos, and humans can't possibly analyse it that fast as Wikimedia / mediawiki extensions provide no way of skipping through video like that. Also by legal standards, even if the whole video contains 100% free images, someone could stash 120 minutes of several copyrighted albums. In fact, the study concluded that audio analysis was one relevant area that they could look into for future research.
Thank you for this! You can rest your case ;)
Thanks for this. It definitely helps me understand the scope of the problem. I think the close-to-zero tolerance policy for for potential copyright violations is very onerous. Its a topic for another thread, but I wonder if we are all okay with the cost (in terms of labor, limited contributions, and upset innocents) of such a stringent approach. The reliance on user reports for takedowns seems to work legally for other orgs, but I know we have higher standards. Perhaps there is something in between.
Awkward user model
I've been thinking about commenting on "reader contributions" for weeks, but I keep putting it off because I want to be positive, and because it's hard to explain what I'm trying to say. But I'll try.
The current Wikipedia model (basically) has two categories of users:
- Readers: Simple. They come seeking information, they are hopefully well served, and they hopefully leave happy. Our core mission is accomplished.
- Editors: Editors have free access to powerful tools, editors are largely trusted to do almost anything at will.
Why it works:
- Editing tools are publicly accessible, but they are presented with low-profile links in a basically Pull technology. Someone who isn't particularly interested in editing will ignore the links. If they do randomly explore the links they'll realize it's not want they're looking for. They move on. The tools are generally only used by people who take a serious interest in the project.
- We expect to be able to communicate with other editors. That is how we teach and socialize them.
- People who make bad edits generally quit, or eventually get blocked. Either way, the inflow of bad contributions is terminated. That quality review happens in the natural course of editor-editor interaction.
- We don't complain (much) about poor edits from newbies because it's part of the learning process. If they stick around, we expect them to start learning policies and how we work. It's the on-ramp to becoming a skilled, powerful, and trusted editor. One experienced, skilled, trusted editor is more valuable than a horde of contributions by a horde of random internet yahoos.
The Readers Contributions idea basically suggest creating a third class of user.
- The "reader tools" are closer to a Push technology. Some prominent call-to-action is splashed on top of each article served, actively soliciting contributions from people who didn't come to Wikipedia to contribute.
- There is zero expectation for these users to communicate. Anyone who is unwilling or unable to participate in discussions is a problem... a problem that we can only fix with the BLOCK button.
- There is zero expectation that these users know anything.
- There is zero expectation that these users learn anything.
- There is zero expectation that these users have any competence.
- There is zero expectation that these users have any meaningful interest in the project.
- The target audience is explicitly "causal", with no real expectation of progress. The only avenue of progress is to abandon the "reader tools" and find the edit button & community pages.
- These users are walled-off from, or shielded from, the general powerful editing tools. (When in fact the tools are just a click or two away.)
- These users are treated like children who need extraordinary supervision by "real editors", because of the above points.
It's almost silly to impose elaborate "editor moderation" systems when these people could simply make the edit themselves by clicking the edit button. But those systems are necessary because of how these features are being presented to users, because of the audience you are targeting.
Perhaps it's not politically correct to say this out loud, but the average editor is more intelligent than the average internet commenter. Seriously... editors are people who think writing an encyclopedia is a fun hobby. That's a pretty unique group with a rather intellectual interest. That's a major reason that Article Feedback Tool failed. The average Article-feedback contribution was significantly below the average editor contribution, and half of the feedback contributions were below it's own average. I wasn't active during Article-feedback, but I know some of the feedback was referred to as borderline illiterate. It wasn't worth spending editor time reviewing it for anything of value. Skilled editor time is better spent on editing.
I'll discuss probably the best proposal, and perhaps the worst proposal. You may be surprised by my comments on worst :)
Geolocation picture contributions. This is a high value contribution. This is not work that can be done, and be done better, simply by having a random experienced editor show up at the article. This is a case where it's worth it if one person takes the picture and they need someone else to add it to the article. This is a highly targeted call-to-action, which is likely to draw minimal garbage-uploads. This is also something where people can develop a real motivation and investment in the project. I can see this contributor seeking out many other important things to photograph, and perhaps exploring editing in general.
One of the proposals mentioned identifying typos/spelling-errors. The obvious point here, and the wrong point, is that it's a really lousy division of labor to have skilled-labor review those submissions to carry out trivial edits. That work is best done by an editor spotting the issue and simply doing it themselves. However there's a less obvious aspect here. Let's say there's a typo. Ninty-nine readers spot it, and do nothing. Then the next reader comes along, they see the obvious and trivial typo, and they think it's dumb to leave it there. They are motivated by that typo to try out the EDIT button. They find they're able to fix it, and they save the edit. They think "Oh wow! I just edited Wikipedia!". That's how people get hooked. That's how we get new editors. We want the reader to make the edit.
I've heard stories from the early days of Wikipedia were people would deliberately add typos and spelling errors to articles. They did it explicitly to bait readers into trying the edit button. Now we're so fast at cleaning up vandalism, and we're so good at cleaning up the trivial stuff, that we undermine our best on-ramp for new editors.
The last thing we would ever want is a dead-end interface for non-editors to summon an editor to fix a typo.
@Alsee Thank you for this thoughtful perspective. More than anything, I appreciate you prefacing it with the genuine desire to be supportive - that goes a long way. The dead-endedness of these explorations is the only thing I would push back against*, saying that the lack of progress for a reader-contributor is temporary--if this were successful we would want to look at moving people up the engagement ladder or through the funnel (choose your metaphor). The same goes for the walled-off garden.
Your point about the futility of creating a project for people we have zero expectations is very good--I think in some ways we have lapsed into thinking about making the 'easiest' experience and your comment actually convinces me we need to be more proactive and targets.
Here is why I think the answer is to be more targeted...not to abandon the concept. If we expect these users to be mindless idiots who don't care, I agree that this would be futile. However, I also I believe there is a lot of human space between the <1% of the population that are capable of being encyclopedia editors and the garbage-spewers we see on most comment boards. We need to find a way to set barriers so that the we let people in who can actively contribute--people we have serious expectations of. In other words, I don't think there are only two buckets of users: the <1% of people capable of using pull technology and becoming encyclopedia editors and the > 99% of zero-expectation-worthy users. I think within that 99% of non-enyclopedia writers is a spectrum and that maybe as low as 5% of the total population aren't encyclopedia writers, but have something to contribute and would be interested in contributing or may have even tried and been rebuffed because they didn't read 10 pages of rules. We get more than 1M people who give us donations every year, so we know that readers want to help beyond just getting their knowledge: the push model works for some people for some things. Even if it is just 500k people who might have something to offer, we will be adding value-provided we find the right tasks. I, for instance, love taking pictures and uploading them with the commons app, but I am not and never will be a big encyclopedia editor. We don't want or need the 100% of readers to contribute--we just need the 1% who love us and have something small to give. Based on your comment, I think the trick is how to weed them out besides having confusing or obtuse literature and controls--there have to be other ways, like fun tests where you prove your skills.
As to best and worst. I agree. The problem with the nearby images is that there are 1M entities in Wikipedia with geo-coordinates, and 750k of them already have images. So there are just 250k in English (presumably the largest)...I don't know if we can build a feature that has such a low opportunity. It might be that we need to identify when there is a picture that is too small, too old, etc.
*Philosophically, we could also argue about whether or not people who write useless things are less intelligent, less educated or depends, but that would be a long sidetrack.
Re "there are just 250k in English (presumably the largest)...I don't know if we can build a feature that has such a low opportunity." Adding images is an ongoing process - new articles without images keep being added and millions of new images flood in to commons. But if your ambition is only to add features where the opportunity is greater than the hundreds of thousands of Wikipedia articles without images then you might as well give up now. None of the other opportunities being discussed are close to that size, or that impact. Adding the first image to an article doesn't just double the number of people who open the thumbnail in search engines. In many cases it adds huge amounts of information to the article. Fixing typos for example isn't just a smaller opportunity, and one with a lot of competition from existing editors, but it doesn't make much impact on an article unless it is being viewed through Google translate. When you fix a typo on an article you are rarely fixing something that has persisted for more than a few months, but many of our articles without images have been so for years. There are some tasks such as adding alt text that are bigger in scale, but adding alt text only helps a small minority of readers. Referencing unreferenced information is bigger as an opportunity, but traditionally we have treated that as the advanced task for experienced editors, cracking that with an app that encourages people to fact check wikipedia would be brilliant but I'm not sure how to do it. Adding commons images to articles is by several orders of magnitude the biggest task anyone is envisaging doing by app.
@WereSpielChequers Thanks - this is useful perspective!
If you want big tasks that could be apped another would be upgrading images in use. To some extent this could be semi automated, there are lots of articles still illustrated by 50kb image files from up to a decade ago despite 5mb pictures now being available. Images can be very "sticky" once in use. In theory even a minor improvement in image quality is worthwhile, but there are sensitivities - some people have gone to a lot of effort to get photos they took into articles, and you need to be cautious about replacing such images unless you can clearly show that the replacement is "better". However some of our lowest quality images in use in articles are from some low quality mass imports such as most of the geograph import. For example most of the Geograph images are less than 100kb. Replacing one of them with another, better photo of the same object could be a big plus for the project, but you would also need to review the caption and the alt text, and of course you need a human to look at the images - sometimes the photos may be taken at the same place and even facing in the same direction, but if one image is illustrating an author then the low quality image of their grave is better to use than a high quality image of the church behind it.
Biggest and least contentious task that I know of is adding alt text. Most images don't have it, almost all images could do with some sort of additional information that isn't in the caption but which would be useful to the blind and other users of screen readers. The slight problem is that getting people to add useful alt text is a non trivial task.
I'm skeptical of the new tier of user, and hardcoding microtasks, but here's what I picture as the best chance for success.
You could have something like
You can help Wikipedia: Tasks available, which expands when clicked. That won't have as high engagement, but it would largely avoid "not interested" people playing with a call-to-action pasted on top of the article. The good aspect here is that you can always ramp up engagement of a successful product, rather than risking a poor contribution stream on a high profile product.
Another big thing is the potential to communicate. The community has pretty well abandoned Flow, and the WMF has pretty much abandoned any work on Talk pages. But what could a minimal solution look like?
They obviously need to be able to read their talk page. There's an option for "basic" and "advanced", where advanced lets them edit as usual. In "basic" they can tap
Reply to someone. The page is scanned for links to user pages, user_talk pages, and maybe user_contributions. You have just identified signatures, mentions, and possibly some harmless false-positives. You merge consecutive links to the same user as one. You overlay a
Reply on each. Then they can type plaintext. Tapping save adds the message to the end of the section like this:
@[[user:ReplyTo|]] This is the message. ~~~~
The user mention and the signature are added automatically. This doesn't let them put a reply into the middle of a thread, and they probably don't know how to indent, but it gets the job done. Formatted discussions are nice, but we can deal with a person who can append plaintext.
Maybe show a preview of the section, with the new comment, before save.
If you want to get more involved, you could put a template on the page to activate our talk-archiver bot, to prevent pages from getting too long. And you could scan their "plaintext" message for any wiki markup.... in the most minimal mode you might even warn/block/nowiki any markup, so they don't accidentally mangle the page. There should probably be a way to turn off the markup protection, but people wanting to use markup are probably using advanced mode anyway.
@Alsee Thanks for this about talk pages. You're so right: think this is definitely a huge barrier we have to address at some point if we are going to pull people from very limited tasks into more meaningful contributions, because ultimately communication is at the heart of this work. There are at least two approaches that have been considered: complete overhaul (flow) and incremental changes (liquid threads). Given the recency of Flow, it feels like incremental changes, like the one you described are what we would want to approach. I don't know how succesful an attempt at structuring what are essentially unstructured talk pages in order to abstract them a little for new users, but we can try. As others have mentioned on this talk page, however, unless changes happen or are compatible with desktop, there are going to be major issues.
Thanx. This sounds promising. I'd like to clarify my thoughts a bit. I wasn't suggesting anything remotely as elaborate as Liquid threads. I was suggesting going after some very very low hanging fruit. Basically just expanding on the trivial "New Section" button. Add some newbie-mode buttons that append signed text to sections. The only fancy part is for the software to be able to grab a username to prepend a reply-ping. This concept is effectively limited to the user interface, with zero change to pages themselves. That virtually eliminates any possibility of objections. Not only is it compatible with desktop, the next step would be to add the buttons on desktop too.
Evolving Talk pages to support genuine structure may have potential, but I struggle to come up with concrete ideas. It's hard to mix structured with unstructured. There are reasons we like our unstructured wiki-world. A talk page is simply an article page at a different address. There is a certain simplicity, power, and flexibility in that. It's part of the wiki appeal.
One of the proposals mentioned identifying typos/spelling-errors. The obvious point here, and the wrong point, is that it's a really lousy division of labor to have skilled-labor review those submissions to carry out trivial edits.
This seems like a severe case of cognitive dissonance. If it is really true that this is the "worst concept", then it would be prudent to create discussions in English wikipedia perhaps even at meta and all the particular wikis to get rid of all "cleanup" type of templates:
These are created out of a desperate need to plead for others to come by and clean it up. If people should "just fix them" then those templates are unneeded, as it is easy enough to just add categories. The only other conceivable uses of such templates is to either inform readers that the article might be bad (and wrong) or to create a "pseudo-worklist" of articles to clean up.
Regardless of how desperately editors enjoy abusing templates for such reasons, both of those potential "use cases" are not what templates were designed for. In fact, such abuse is one of the reasons such templates are sometimes hidden on mobile devices (T76678).
Not only did you miss the point, you literally copy-pasted text stating it was "the wrong point".
The hardest and most important step to becoming an editor is making a first edit. People will often be worried if they can or should make an edit. When it comes to something like a spelling fix, they can be motivated that it should definitely be fixed, and confident that no one could possibly blame them for writing something they shouldn't have. They don't need to know anything except being able to find and type-over a spelling error. And once they do that, they realize "wow I edited Wikipedia", and that feeling of accomplishment encourages them to keep going.
<blockquote>When it comes to something like a spelling fix, they can be motivated that it should definitely be fixed, and confident that no one could possibly blame them for writing something they shouldn't have.</blockquote>
The first edit is indeed a hard step, a harder step is finding what to help out with. A list of typos or useful simple tasks that experienced editors may not be interested in doing may be a perfect starting point for a novice editor to gain confidence and experience before performing complex edits. It may also directly highlight things that are perfectly fine to fix without needing to read huge guidelines.
As a counter example of trivial cleanup, editing an infobox should be trivial except for the fact that a good portion of the infobox data isn't even on the page itself. A typo on an infobox may be coming from a deeply nested template, wikidata or lua module, or some other protected page. An editor would have to go through the rabbit hole to find said page, and finally when they do, they'll either give up because they can't understand it, or they may add something to the talk page " summoning editors" to fix it.
If experienced editors find such spelling issues a waste of time then it can simply be surfaced to readers and require deliberate action from editors to actually see them.
I find it strange that you believe that some meaningful percentage of editors consider it a "waste of time" to fix a spelling error. If they are editing the page and see the spelling error they would almost certainly include that fix. And even if they weren't planning to edit the page, almost any editor would take ten seconds to make that trivial edit on sight. (Note: Virtually everyone uses the wikitext editor for good reasons. One of those reasons is that VE load time ranges from poor to browser-time-out errors.)
If we did implement this, the outcome is obvious from a community perspective. What would happen is that a hat collector or other editor would grab this trivial list and do a speed-run to rack up their edit count.
The list cannot be "simply be surfaced to readers", you need someone to do that work. It makes no sense for an experienced editor to spend ~10 seconds tagging a spelling error when they can fix it in ~10 seconds, and we don't want new users dead-ending here. We want the new user to click the edit button and discover that they can (and did!) make the edit themselves.
This really is the most counter-productive idea here. We don't want to divert "non-editors" away from actually making the easiest possible first edit themselves, and you would just end up with editors doing speed runs blanking the entire list for potentially disreputable reasons. It would diminish our on-ramp of editors.
I do a lot of typo fixing, and I have various tools and searches to find typos. I haven't yet given up hope of designing a newbie friendly typo finding and fixing app, but there are some barriers.
Much closer to market for a new entry path for newbies are ideas like adding images to articles.
Whilst I'd agree that most maintenance templates should be replaced by hidden categories, and ones like dead end and orphan should be auto generated, good luck getting consensus for such a change. There is a strong lobby of template bombers who consider their activity to be worthwhile, convincing them that what they are doing is not a net positive will not be easy, and yes it has been tried.
Broken connection - readers to OTRS to wiki
This is an old and basic problem which I wish to raise again.
Many readers simply wish to make a comment and fail to find a way to access the talk page.
OTRS, Wikimedia's own information queue, receives lots of these comments by email. Many or most of the people who write to OTRS really would post to Wikipedia talk pages if (1) they knew that was an option and (2) somehow they could get a notice by email that someone had replied on their talk page.
Ideally, OTRS should be used by people who either need confidentiality or as a last resort for people who need to communicate by email due to inability to post to wiki. Instead, lots of readers email valuable editing suggestions to OTRS, where they are lost and never get cross posted to wiki. The reason why OTRS respondents do not post editing suggestions to talk pages is because emails are copyrighted and people who write to OTRS have no idea that they are involuntarily claiming copyright over an editing suggestion that they wish to make public.
To complement OTRS, there should be a suggestion queue which seems like an email form, but also includes an option for the user to cross post their comment to an article talk page.
To make a guess, I think that 25% of emails to OTRS are actually the sorts of comments that submitters would want on wiki. If we could somehow prompt people to create wiki accounts in order to make an OTRS-like public request, then we might get access to a major new communication channel.
Thanks @Bluerasberry. I was totally unaware of this phenomenon on OTRS. I would add that the barrier isn't just about finding the talk page, but knowing how to make sense of it.
Hello - would you please look at https://en.wikipedia.org/wiki/Wikipedia_talk:Manual_of_Style/Dates_and_numbers#400_year_frequency_problem
If you have OTRS access please see the email. A user writes in with a suggestion when what they really should do is post to the talk page, but see how many steps it takes for a third party to post someone's message to the talk page.
A lot of labor is offered and lost when so many people write in by email and then at OTRS, most editing suggestions for 10+ years have just been trashed.
Check out the insightful response on the talk page. Obviously this was a good comment, and it belonged there, but if it were not for my posting it there with huge trouble then it would not have connected to those respondents.
Great example. Thanks!
I don't know if there is a good *bulletproof* solution for this problem, or not. But there are three different barriers here, one of which is a helpdocs kind of barrier (potential-contributor education), one of which is a technological barrier (ton of work for OTRS volunteers to transition an email-based-comment over to an on-wiki-comment), and the final one that is a legal barrier (copyright release of the email'd paragraphs and privacy concerns w.r.t. the email-metadata such as email-address and email-header-IP-addresses and such).
I suggest attacking all the barriers separately, since lowering any of the 3 main barriers will be helpful to improving the overall situation. First and easiest, why not just improve the helpdocs? The helpdocs which explain how talkpages work, are already good enough probably, but clearly plenty of people don't use talkpages (either they prefer email or they don't wanna learn from the talkpage-helpdocs). So why not improve the helpdocs which explain how to email OTRS? Put a big note, smack dab in the middle of the how-to-contact-OTRS-helpdocs, which says "IF YOU WANNA make a suggestion about article content then please paste THIS EXACT sentence into the top of your email to wikipedia: Dear wikipedia, the textual body-prose of this email, not counting headers and footers which contain personally identifiable information, are explicitly dual-licensed under CC-BY-SA v3 or later as well as GFDL". Modify the phrasing to taste, obviously, but my point should be clear: some people are emailing OTRS about a private matter, but a lot of people are just emailing them about a content-matter. Why not suggest that the person sending an email, *prior* to sending it, add a bit of text which will properly license that body-prose? It would still be up to the OTRS agent whether to *actually* copy the body-prose over into some enWiki talkpage, and they might still need to be careful when doing so to properly elide any email-sigs or email-header-info, but better helpdocs would eliminate some of the hassle (if the person emailing followed the instructions).
Second avenue of attack, why make it more difficult than necessary, for the OTRS folks? It would certainly be possible, using community tech team devs for instance, to whip up a system which would speed up the process of "routing content requests" that happen to arrive via email. First of all, it makes sense to have a gadget which identifies whether a particular email contains the CC-BY-SA release boilerplate (mentioned above), and also a gadget which attempts to automagically-extract just the CC-BY-SA portion (stripping off the email-header and the email-body-sig portions plus inserting a properly wikified OTRS ticket-link at the top to comply with the atribution-required-part of the CC-BY-SA license that the email's body-prose was released under). Then, with a couple of clicks, the OTRS agent will be able to get the properly-licensed properly-anonymized portion of the email, into their clipboard, and can then paste it into an appropriate talkpage. Additional speedups would also be possible, such as having the gadget automatically insert a boilerplate "otrs edit request" header similar to the "semi-protected edit-request" headers used for locked articles, and having the talkpage post auto-watchlisted by the OTRS agent, but those are nice-to-have features.
Finally, methinks there might be some ways to overcome the legal hurdles. Currently there is an OTRS ticket-system. It contains copyrighted emails, which are not meant for publication. Adding the body-prose of such an email, into an enWiki talkpage, would require *re-licensing* the body-prose of that email, under a wikipedia-compatible license. However, I note that it is fine to HYPERLINK from any enWiki page, to copyrighted material hosted elsewhere -- we have plenty of refs in enWiki mainspace (as well as talkspace) which are EL'd unto. It is difficult to explain to a person that is emailing OTRS about a content-matter, that the OTRS agent needs them to "blah blah CC-BY-SA blah blah GFDL blah blah sign here". It would likely be considerably simpler, to simply have a specific domain name such as https://emails.wikimedia.org which contained *copyrighted* body-prose from properly-anonymized emails, that the sender was willing to make public. This obviously ought not be automatic, some emails to OTRS are private. But in cases where the sender was specifically wanting to "make a suggestion to the wikipedia editing community" about a content-matter, is seems likely that they would be happy to have the body-prose of their suggestion displayed on a world-visible website. The downside here is that, if they are making a specific suggestion like "please add sentence1 + sentence2 into articleXyz" it will be essential to get the full CC-BY-SA dual-licensed with GFDL release of that stuff. But many questions are of the form "this seemed incorrect / non-neutral / awkward / etc to me can somebody please fix it?" ...and in those latter sorts of cases, the OTRS agent ought to be able to quickly strip of the headers & footer with personally identifying info (see gadgets in previous paragraph), shove the bodyprose excerpt over the email.wikimedia.org or some similar site (probably still requires permission of the sender of the email -- but is far easier of a request-for-permission to EXPLAIN to them), and then in the talkpage the OTRS agent can link to the OTRS ticket, and for people unable to see the raw ticket, link to the anonymized-but-copyrighted-body-prose over on the separate domain-name.
I recommend fixing the helpdocs first (which is pretty easy and needs no programmers), and then incrementally upgrading the OTRS tools & gadgets second (programmers required), then if-and-only-if-necessary considering the creation of a separate "copyrighted email bodies" domain name (significant obvious downsides but *might* have a practical net upside... try the other options first however). Best, ~~~~
Add similarity checks to the app using Perceptual hashing (similarity analysis)
Something that would benefit both commons, and editors of every wiki is some form of similarity analysis (perceptual hashing) for media:
- Images - http://en.cnki.com.cn/Article_en/CJFDTOTAL-DZXU200807028.htm
- Video - https://github.com/rednoah/VASH , http://iosrjournals.org/iosr-jce/papers/Vol18-issue6/Version-5/P1806058486.pdf
- Audio - http://link.springer.com/article/10.1155/ASP.2005.1780
While most of these would require some research to evaluate and integrate it into an app or even commons. Perceptual hashing for images is already supported by the backend software that wikimedia currently uses:
If exposed using an API or even a special page that enhances or supersedes Special:FileDuplicateSearch, this would be a huge gain for curation, and readers.
This can also be exposed in the app, by asking editors whether these images are similar, or even implemented as image captcha. It can detect attacks such as " rotation, skew, contrast adjustment and different compression/format". In some cases it will even identify but useful images. A similar algorithm is the blockhash ( http://blockhash.io/) that claims to be more efficient in some instances. A non-academic comparison of the two is here:
Other usecases include surfacing similar images in search results (like google), or even in media search tool within VisualEditor / new wikitext editor, or within the image file page, or in the mediaviewer. The applications are endless.
Even if these tools for the app fail and are eventually removed this technology can continue being used in other products. Win Win !
As far as curation is concerned. Some ways to help moderators / editors / admins include:
- Fingerprinting deleted files ( create an index of deleted files)
- For readers in the app - quickly get them to determine false positives, and help mark them for deleting
- Identify duplicates - possibly shown in recent changes or special:newfiles
- Surface this in abusefilter - to block or tag images that are proven to be the same, e.g. rotation
- Automatic tag them
- Suggest adding them to categories related to file it matches
- Tag these - As files of interest
- Adding Images to article - Inform readers / editors that possible duplicate exists, and encourage them to reuse instead of upload
- Make finding duplicates more efficient
- When determining files to delete, an editor could search for duplicates to make the cleanup more efficient
Of course some of these depend on the speed at which it can process the newfiles and which can it match them against the existing fingerprints. But even it is a scheduled operation running hours or days later it will still be way better than the status quo.
At least the demo at phash is very fast:
Automated image labelling
Mobile operating systems can be leveraged for their existing image recognition libraries, e.g.:
And many others. Perhaps it would be possible to automatically detect, classify , and reject selfies if there's a need for it.
It could also be used to automatically help provide classifications and descriptions for existing commons images by leveraging thousands of mobile processors.
Thanks. Yes, we currently use face detection libraries on both apps to better center the images we show at the top of each page. There are still errors though, so getting help recentering images (or confirming/retagging them in the case of object detection) would be useful.
A great idea that I found on Phab that also might be a possible contribution
This ticket is the suggestion to turn captchas into actually useful micro-edits, like the classic reCAPTCHA (http://en.wikipedia.org/wiki/ReCAPTCHA) did with scanned images. this could be a number of different things, image editing/cropping, OCR (commons to wikisource), or edits as currently developed by WikiGrok.
For example you could first show a Wikidata item without an "instance of" to a number of users and let them decide if it's a person or not. Then once some data has been gathered you could use the same question as a captcha and check against the previous answers you got.
Yep, similar idea is below:
Although as far as media is concerned this kind of thing will well only work well once commons has a structured way of storing information about media.
The trick is to surface user tailored captchas, e.g. if someone is reading an article about bats, and wikidata has no information about what type of animal it is, then it could easily surface a question, like 'A bat is a "fish", "insect", or "bird"?'.
Oh wow, I love the idea of relevancy when thinking about this.
Comments on the mockups
Some comments on the mockups:
- Image via nearby - Looks good
- Add/edit lead image to the article - Reasonable, although it might be a good idea to surface images from commons before uploading
- Moderation queue for image submissions - Addressed in another post, but mockups seem reasonable. Make sure not to surface other user's moderation (e.g. +1 / -1), as this may cause bias and bad moderation, e.g. most experienced editors are negatively biased towards new contributions (despite claiming or believing that they aren't).
- Lead image editing - Good concept. Looks uncontroversial, and easy to revert.
- Article Feedback - This is complicated, +1 or -1 votes may be inaccurate because they may refer to a prior revision that changed wildly, may make people vote on the topic and not the article, and may cause issues.
- Downvoting regardless - People against the theory of evolution may "downvote severely" despite the quality of the topic.
- People also don't read instructions - This is better done indirectly, for example, if an article has lots of spelling /grammar mistakes, incorrect information or is unintelligible, then the user is basically noting that the rating is low.
- It is considerably harder to identify a good article.
- Wikilabels, wikigrok - This seems good.
- Report an issue
- The good : Nice implementation of taping the problem directly, and structured problems.
- The bad : Again, freeform input is a problem, it is better to focus on common issues and add onto them later than allowing free form and dealing with the endless drama and headache with "i hate Puppies" or "No, the true God is ..." kind of comments. Structured comments don't need any moderation, and will not place an extra burden on editors.
It might also be a good idea to review how Extension:UIFeedback works, and incorporating it for visible UI issues, maybe a blank page, a red error on the page, a mediawiki fatal error, etc.
Thanks! point taken re: open fields and good point about voting on previous versions and the other concerns...not sure how we would handle that. We've thought about algorithmically providing article quality scores, but that seems like a massively controversial and potentially harmful approach. We'd have to move verrrry slowly on that one if it is deemed of value.
> We've thought about algorithmically providing article quality scores, but that seems like a massively controversial and potentially harmful approach
Indeed, I'd suggest staying away from the word "quality", as it will always be contested. Instead focus on things that can be quantified scientifically. For example, for a good number of languages it is possible to detect with reasonable accuracy the number of spelling mistakes some text has, so you can have a metric of things such as:
- Spelling accuracy
- readability - https://en.wikipedia.org/wiki/Flesch%E2%80%93Kincaid_readability_tests , an educated person may still have a lot of trouble understanding nuclear fission if it uses excessive jargon.
- References - This is a bit trickier, but clearly, references from academic journals and scientific publications should rate higher than some links to youtube or facebook
- illustration - a picture is worth a thousand words, and while some concepts are not easy to illustrate, it is incredibly hard to explain some without them.
Of course references are troublesome because people may just dump things that don't contain any cited materials, but good references make the content verifiable.
Such ratings may be very useful to help readers identify inaccurate content, and turn them into editors by making it easy to find and fix it.
Article Feedback is indeed complex. Questionnaire needs careful design so thousands of readers who are far more interested in a topic than in its presentation will provide information which dozens of editors can easily use.
Adding an image by phone also has complexities. I have done a few by current methods, which stink. A new system should provide ways to alert us logged-in users when we're near an article that lacks a picture, and automatically categorize and geotag the photo we snap in response. Not load it directly into the article, as field work tends to be too rough, but alert it to list watchers (and put it on our watchlist). Also when we deliberately look at an article about a nearby place, give an upload button doing those same things.
Reporting an issue
I've experimented with anonymous issue reporting in a list I maintain and found out there is some good information to be had from "popular wisdom" but also a lot of garbage. Do you plan to add some way to help communities handle that garbage? Otherwise, smaller wikis might soon be overwhelmed by the quantity of feedback and people sending useful information would become disappointed by the lack of follow-up. Also, a way to say "thank you" or "you change has been acted upon" seems appropriate.
@Strainu For most of these approaches, the moderation and validation approaches we are considering try to make up for the weakness of readers (lack of context/expertise) by leveraging their strength (scale):
- multiple readers saying the same thing - for instance, we might only validate an issue if more than one reader highlights it.
- readers moderate readers as a first step - for instance, we might show an upload or a reported issue to another reader and ask them to confirm. Only if X users confirm do we show it to an editor... something like that.
Do you think that would/could address the concern? As to thank yous, I totally agree that this would be an important element of the user experience. We're trying to sort out a way that we can provide people with satisfaction/accomplishment without promoting the unhelpful or harmful contributions that incentives can sometimes generate.
The steps you describe seem to address the concern I have. I'd love to see some AI solution that identify what "the same thing" means, but I guess on Wikipedia's scale, just plain text comparison would work.
> readers moderate readers as a first step - for instance, we might show an upload or a reported issue to another reader and ask them to confirm. Only if X users confirm do we show it to an editor... something like that.
Great idea. I'd say that this is the most important concept. I'd just be wary of bias on the tool, for example, people from arabic countries are negatively biased towards topics / content related to nudity, porn, and people from some african countries are negatively biased towards topics on homesexuality, and North americans may be negatively biased towards certain concepts related to certain asian cultures.
It is quite important to use proper sampling to prevent this sort of systematic bias.
I don't have any particular thoughts regarding volume or quality of reports, not having been around for the original Article Feedback tool. However I see the current mockup says that reports are posted to the talk page. This is excellent because it means that established moderation tools, i.e. reverting and suppression (or Flow equivalents) could be used without having to build a new system.
P.S. the current mockup involves an optional "include a screenshot". This is very problematic because the file, which I guess would have to be automatically uploaded, has licensing questions. Text is fine, but if any images are included, their licenses and authorship would have to be determined and automatically included in the screenshot description page. Also if there is a fair use image, it would be unlikely to be allowable according to a wiki's non-free content rules, when used in a screenshot of a page at large.
@BethNaught Argh! Once again, we've forgotten the need to attribute Wikipedia content. Thank you for the reminder. There is also the potential that someone would take a screenshot of something non-wiki or inappropriate...Given the complexities, a link will probably have to suffice.
Surely this faces the same problems as doomed the AFT, in particular how could this be done without diverting potential editors from improving articles to critiquing them for others to edit, how do you avoid this being a swamp of fan comments and hates re particular pop stars, and how do you maintain neutrality when various political candidates have their bios liked on Wikipedia?
@WereSpielChequers All valid points here.
Regarding junk or popularity, per the thread on user model elsewhere on this page, raising the bar for feedback (have to sign in, have to complete a tutorial) might be needed for anything beyond a simple thank.
As for diverting potential editors, if anything proved to be succesful (in terms of adoption or impact) I think we would immediately explore the next step for bringing these users into the fold both with regard to cultural expectations and more meaningful contributions.
Rather than "Report an issue", I would prefer a more neutral message, like "How can we improve the article?"
Use readers to choose images from wikidata for articles
One common problem with Wikipedia articles is that they often don't have images or in some cases editors use images that don't really represent the concept covered, for instance, the article about money contains an ambiguous image in the infobox that doesn't necessarily mean money. It will also be hard to edit a page to change images due to them being added from templates such as infoboxes.
Instead, one approach is to use the image from wikidata to surface the general image of the concept, e.g. "money" regardless of what editors choose locally, in a similar manner to how wikidata descriptions are being surfaced regardless of what editors write in their lede. Indeed, currently the image in English wikipedia is not neutral, as it represents only two currencies used in two continents, so a non-native English speaker who may have never seen either a dollar or a Euro may be confused. This could also be designed to only show up whenever none of the article images are the same as wikidata.
With the help of readers, and an interface similar or exactly like wikigrok this could instantly add commons images to tens of thousands, if not millions of pages with little effort, by applying the same idea of only allowing it once a certain threshold of users agrees with it. For example, it could ask questions like "is this the image of Albert Einstein", "Is this the image of the Eiffel tower", using the image labels / descriptions / metadata as a hint.
It would also help with the problem of Extension:Popups in cases where it doesn't have any image to show by just falling back to a wikidata image. Perhaps it could be implemented as the idea described in (https://phabricator.wikimedia.org/T95026).