Extension:SwiftMedia

http://OpenStack.org has an object store system called Swift. This code allows you to use a Swift repository to store MediaWiki media files. There are two parts to this code. The first is middleware for Swift's proxy server which converts the MediaWiki image URLs into the URL format needed by Swift. The second is an extension to MediaWiki.

Swift middleware
Swift hands its files out to users via a proxy. You can actually access the cluster directly, but you need to know as much about the system as the proxy knows, so unless you want to go to that effort, you should use the proxy, and we do. The proxy requires a URL with three parts: an account name, a container name, and an object name. The account name is a function of the authentication system, and is a long hex string; effectively a UUID. The container is simply an opaque string which doesn't have slashes. The object name may have anything in it.

Our media store URLs, on the other hand, start with the name of the host (or possibly a separate host), the string 'images' (by default), possibly several hashed subdirectory levels, and the name of the object. In the case of Wikipedia, the host is 'upload.wikimedia.org, followed by "wikipedia/commons" instead of 'images', followed by two levels of hashing, and the name of the file. Thumbnails, archived files, and deleted files have a prefix on the hashing. These URLs have been published, and people will link directly to them. We have decided to preserve these links, hence the middleware to rewrite the URLs.

The middleware inserts the account name into the URL, converts the "wikipedia/commons" section into a Swift container name by replacing slash with %2F, adds "%2Fthumb" or "%2Farchived" or "%2Fdeleted" to the container name and adds the rest of the hashing and filename as the object name. Swift doesn't need the hashing since it does its own hashing; it can take or leave our hashing. For backwards compatibility and ease of finding files, we leave it there. Once the URL has been rewritten, it gets handed to the remainder of the Swift proxy, which then hands the file back.

So yes, Swift's proxy is serving up image files to our caching front-ends. Usually a token is needed to access files, but we've marked some containers as "public", meaning that no token is needed.

404 handler
The middleware intercepts the return value from Swift, and looks at the result. If it's a 404 error, the 404 handler is invoked. Currently it contacts the existing thumbnail server and fetches the file. In the future it will create a scaled version of the file.

MediaWiki Extension
Swift provides no access to a filesystem; it is an object server, not a file server. In order to allow our media handlers to do their work, The extension pulls files in from Swift, runs the media handler, and writes the resulting file out to the object store in the appropriate location. When a file is uploaded, rather than store it in the filesystem, it gets uploaded as a Swift object.

Several configuration variables are needed for LocalSettings.php in the $wgLocalFileRepo array.

You must let MW know that the class of the repo is SwiftRepo: 'class' => 'SwiftRepo', MUST be 'local': 'name' => 'local', Your swift username. 'user' => 'system:media', Your swift password 'key' => 'secret', A URL pointing to a proxy which is also running the auth server. 'authurl' => 'http://alsted.wikimedia.org/auth/v1.0', This wiki's base container name. Must not contain a forward slash. Other container names will be generated by appending %2Fthumb, %2Ftemp, and %2Fdeleted. 'container' => 'images%2Fswift', The URL pointing to scripts. 'scriptDirUrl' => $wgScriptPath, 'scriptExtension' => $wgScriptExtension, The URL containing the container name, so "http://alsted.wikimedia.org/images/swift" 'url' => $wgUploadBaseUrl ? $wgUploadBaseUrl. $wgUploadPath : $wgUploadPath, 'hashLevels' => $wgHashedUploadDirectory ? 2 : 0,       'transformVia404' => !$wgGenerateThumbnailOnParse, 'deletedHashLevels' => 3

Use Cases
Since the plan is to switch Wikipedia over to this media storage system, we're trying to be as conservative and not-break-it as possible. If you are in the habit of manipulating files on your MediaWiki, or on Wikipedia itself, could you take a few minutes to document your particular combination of operations? Obviously, we've got test cases for "upload a file", "delete a file", "upload another file", "revert an older file". Those are the simple things to test. We're looking for your "idioms" or "use cases", where you do things we don't expect. Please add four tildes to the end of your description in case we need clarification.

Your help is appreciated. I'll prime the pump with two entries:
 * I will adjust the brightness of an image if it doesn't look good with the other images on a page, and then upload the edited file under the same name. RussNelson 01:15, 10 August 2011 (UTC)
 * Sometimes upload one version of an image, decide I don't like it, upload another version, decide that I don't like that, change my mind and revert back to the original image. RussNelson 01:15, 10 August 2011 (UTC)
 * The only other thing, although more of a MediaWiki side of things, is protecting files from being (re)uploaded or touched (eg: reverted). KPeachey
 * Sometimes we use a file to keep track of something that continuously changes, such as a chapter map or organization chart. We upload many different versions under the same file name over a long period of time.  Cbrown1023  talk  01:44, 10 August 2011 (UTC)
 * Embed a file using [[File:Foo.jpg]] syntax
 * Link to a file:
 * Using File:Foo.jpg
 * As well as [[Media:Foo.jpg]]
 * Also make sure &#123;&#123;filepath}} works