User:NeilK/UploadWizardMisdesigns

Misdesigns in UploadWizard

UploadWizard has been good enough to be used for millions of uploads, but it could be much better.

Since UW is now getting serious attention again, I write this so that the Multimedia Team will not feel overly constrained by decisions made before.

Why is it so monolithic?
At the time, some senior Wikimedia employees would revert you for adding any project which required JS. I had to keep it all in a branch for about six months.

A community-based attempt to improve JS also went down in flames around this time. Perhaps due to an excess of caution, I made UploadWizard more monolithic.

With Resource Loader some of the monolithic-ness was alleviated.

Why is it so ugly?
No design resources, and a rush job.

UploadWizard was designed in about 1.5 days with the intention that it would be in production in 1.5 months. Basically, everything the Multimedia Team have spent the last few years on, was supposed to be done by me in one year, including MMV.

I tried to keep everything in jQuery UI so it could be styled at some later date.

Why isn’t it embeddable?
I did a half-assed attempt at this, but it still depends on divs with IDs. Probably if that was solved, it would be embeddable.

Why can’t we use this on wiki pages?
At the time there was an entirely separate media uploader project, Add Media Wizard, that handled uploads from within article editing.

But for political reasons as well as concerns about code quality, Add Media Wizard was shelved.

UploadWizard is possibly flexible enough to have a mode that could work in Wiki editing. But it would take work.

Why is UploadWizard optimized for uploading batches at a time?
The product manager of UploadWizard used to take lots of pictures, especially at Wikimedia events, and upload them all later. But as far as I can tell he’s one of the few who had this user interaction pattern.

The code would be dramatically simpler if it was one upload at a time. Then you could just add multiple widgets on the page to get a multi-upload effect.

A GroupProgressBar (i.e. estimate of when all uploads will complete) would still be possible if the widgets published events.

Why is there an option for the batch license applied to all uploads, and for each upload to have its own license?
See above. If I was doing it again today I’d make it one license per batch, if it even had batches at all.

If you really have heterogenously-licensed files, do separate upload batches. Easy.

Why (in its original incarnation) did it not upload items immediately?
The idea was that because you might upload images by mistake, you should see the preview thumbnail to verify that’s what you wanted.

The Multimedia Team have done better to make that behavior configurable, the default is now instant-upload? But it’s probably a dumb idea to have delayed-upload. It should be removed.

Why does UploadWizard use its own “transports” for getting fiels across to the server?
There were plenty of good upload transport libraries around at the time. Even ones that did a kind of chunking.

It was forbidden to consider any upload-transport library that used Flash as a fallback.

IE6 and other such browsers couldn’t really do uploads and show progress, let alone multiple uploads, in any sane way, without using Flash. Most people considered that to be important, visually. Programmatically it helped hide the differences between browsers to the main logic - you got the same kinds of progress update and completion events.

However, the Foundation considered being 100% open source to be more important. so UploadWizard had to have a more complex design.

Possibilities for doing it differently: If we jettison IE6, maybe there are some 3rd party libraries that would then be Flash-free. Or, we could abandon this policy for Flash shims. Flash is dead as a doornail, and we’re not prolonging its life by using it to interface with obsolete browsers.

Why is it all callback hell?
Purely my fault. I could have done things more simply with events.

Alternative patterns like Promises were not really a thing yet.

I invented a notion called “Ready Events” (events where you can check if they fired before you started listening to them) and used some halfbaked pubsub. Promises and proper events might solve all those issues. (MarkTraceur may have fixed this with his epic refactor).

Why is data stored in the DOM?
I have myself to blame here. I knew better, but the general jQuery style encourged that. This was my first big jQuery project.

Why does it use jQuery UI?
That was the state of the art back then. There are better UI frameworks now.

I’m not convinced OOUI is the way to go (I’m worried about NIH syndrome; I’d rather use a more established framework). But we should consider moving off of jQuery UI.

Why is it single-threaded?
We worked on this from 2009-2011. For most of that period, Web Workers were not very useful on most of our target browsers. We tried using Web Workers to process images and it still locked the browser up. However, I’ve used them since then, and they seem to work great now.

Why do lots of people not like UploadWizard?
The purpose of UploadWizard is to make it very easy to upload things.

However, administrators at sites like Wikimedia Commons are already overburdened with the existing number of uploads which are inappropriate, poorly licensed, or simply stolen from Google searches. Consequently, they tried to block UploadWizard for years. (I’m not sure what has changed to allow its use.)

IMO it isn’t fair to them to open the floodgates to more images, without *also* making it much easier to combat spam and other inappropriate uploads.

UI is important, but we could also dramatically simplify UploadWizard if we had better workflow tools on Commons, to have images in a kind of holding pen before being published.

Then UploadWizard could be a very, very simple uploader, and you could get licensing right at your leisure, later, perhaps with help from other people in the community.

Why does this always feel like we’re putting a square peg in a round hole?
MediaWiki is the wrong tool for media. MediaWiki is based on this user interaction pattern:


 * Blobs of text, freely modified
 * Uniform licensing
 * Contributions are easy to improve and modify
 * Publishing contributions immediately helps the content get better faster

Image uploading should look like this:


 * Binary data
 * With structured metadata
 * Different licenses are required for each media object.
 * However, only some licenses are approved
 * In most cases, only the original uploader can improve it or provide accurate data about it
 * Publishing the work immediately may be ignored; it might be used to host images for other websites, and a copyvio may last for years if not caught.

All of the above constraints lead to some designs that are brittle and frustrating for the user. We have to jump up and down to make the user notice any error, because no one else can check their work. We preserve freeform text as our final format, but also allow almost any custom data, which again complicates the interface.

If you want to make the frontend of UploadWizard simpler, the best thing you can do is make the backend more structured, especially with licensing. Licenses should have rich metadata that also tell MediaWiki what kinds of sharing they allow and what sort of works they are good for. Perhaps Wikidata?

Why do chunked uploads keep failing?
PHP's uploading model makes this pretty hard. The stash code is complicated, because of this.

I personally would consider writing an entirely new daemon for handling uploads, perhaps in Node.js, especially for large files, getting thumbnails, chunked uploads, and such. Node.js is much better at handling streams.