UploadWizard/Operations assessment

= What Wikimedia Ops needs to know about UploadWizard =

I was asked about the ops impact of Extension:UploadWizard. Here's what we know, which amounts to a lot of educated guesses without any measurement.

Executive summary: nothing much will change. We are not any more vulnerable to floods of uploads than we were before. However, increased activity will demand more attention to metrics, and more policing of temporary storage.

Potential for some kinds of resource exhaustion
This is not a new problem, but UploadWizard may make it easier for users to upload zillions of small files, and leave them in the stash (the FileRepo temporary zone). See 26063. Note: this potential problem exists in the API, and has since late fall 2010; it is not actually UploadWizard-specific.

The proposed solution, to be implemented by UploadWizard and/or MediaWiki developers, is to have a maximum on the number of files a user can have in the stash at any given time (say, 100 or so). When any more are uploaded, some are removed, even if incomplete.

Recommendation for ops: the per-user check mentioned above can be implemented in MediaWiki software, so no ops action is needed there. But for extra security against this class of problem, cronjobs should be employed to sweep that area clean of files older than 6 hours or so, and to also maintain the total number of files stashed below some threshold.

Increased volume of uploads
We expect that the increased ease of use will, over time, accelerate the number of uploads. We have no idea how much.

Recommendation: monitor growth of number of files and average size, see if increased volume will accelerate storage growth needs

Increased 'burstiness' of uploads from a single user
UploadWizard allows users to upload multiple files relatively easily. This will cause numerous files to be uploaded within a few seconds of each other.

The tool is currently limited to only allow ten uploads in total per invocation. However, this configuration is on the client and thus can be changed by the client. Or, the user can open the tool in multiple browser tabs.

We can block abuse server-side with the measures discussed above under "Potential for some kinds of resource exhaustion".

Recommendation: monitor daily/hourly/minute-by-minute variation in number of uploaded files, see if increased burstiness needs extra capacity

More simultaneous uploads from the same user
The tool is designed to allow simultaneous uploads. Currently, these are turned off (see 26179 for the exciting (not) details).

When simultaneous uploads are turned on, the tool will allow some configured number of simultaneous transactions. For example, if the user is trying to upload seven items, it will at first start three of them. When one finishes, the tool will start another upload, and so on, until the full list of seven has been uploaded.

This can be circumvented client side by hacking the configuration (it's just Javascript) or opening multiple browser windows.

We can block abuse server-side with the measures discussed above under "Potential for some kinds of resource exhaustion". However, in general, we are not aware of any design limitations in MediaWiki or its backend file store that cause problems when the same user makes simultaneous accesses.

Recommendation for ops: none, other than cronjobs already mentioned

Increased number of files going to temporary zone (and/or being abandoned there)
Currently, the temporary stash area is only used when a file has a problem that can be corrected by some user action. For UploadWizard, it is the first step on the journey to publishing the file. It is possible for the user to abandon the file in the stash if they don't complete the process.

Recommendation for ops: none, other than cronjobs already mentioned