Requests for comment/Extensionless files


 * Date: 2010-03-31
 * Author: RobLa
 * Status: checked in on extensionless-files branch
 * Tracking bug: 4421

Problem
Currently all images must include an extension that specifies the format of the image (such as .jpg, .png, .gif, .svg, etc.)

If a new version of the image is uploaded which is in a different format, it must be uploaded under a different image name. Then all the pages that use the image have to be changed, and the history of the old image is lost. This is a lot of unnecessary hassle.

Ideally the image name should not have to include this information, since it doesn't matter to those who use the image whether it's a JPEG or a PNG. For example, it would be much better to be able to say  than to have to say. The author of the article shouldn't have to know (or care) what format the image is in.

User Perspective
From a user's perspective, this removes many common error messages without adding new ones. Previously, when the user uploaded a file and gave it a name that didn't conform to the extension naming rules for a particular file type, an error would be reported. After implementing this change, the extension of the uploaded file still needs to conform to whatever whitelist/blacklist rules are in place, and the detected MIME type needs to also conform, but the ultimate page title for the file can be any valid page title.

Design
Most of the complexity comes from needing to store the files on the filesystem with appropriate file extensions, since these get served directly from the filesystem from Apache. Thus, there's a lot of extra logic for tacking on the file extension when it's needed.

Generally speaking, the design involves:
 * Adding new getFilename* counterparts to getName* functions, and using getFilename* in place of getName* where appropriate
 * Storing the file extension in the database

The file extension is stored in a new 'img_file_ext' field in the 'image' table (and similar fields to oldimage and filearchive). This field defaults to null. When it is set to null, the file name and the page title are the same.

On upload of a new file and upon rename/move, the page title is still reconciled against the MIME type. However, instead of this being an error condition, instead it is merely used as a trigger for storing a file extension in img_file_ext. When the File object for an file is queried for the filename,

New APIs:
 * MimeMagic::getPreferredExtensionForType( $mime ) - maps the MIME type back to the preferred file extension.
 * File::getNormalizedExtensionFromName( $name ) - Given a file name, return the normalized extension. (e.g. for "foo.JPeG", return "jpg")
 * File::getFilenameFromTitle( $title, $mime = NULL ) - Return the file name, given the page title, and possibly a MIME type. This function replaces getNameFromTitle for those uses where the actual on-disk filename is what is needed (e.g. in LocalFileMoveBatch)
 * Splitting out prepTarget from publishBatch in FSRepo. This was some generally good code hygiene anyway (it replaces some duplicated code blocks with prepTarget function calls), but became essential because the duplicated parts were what needed to be expanded with more complicated logic.
 * (not done yet...)

Test plan

 * Image renaming:
 * Upload Foo.jpg
 * Rename Foo.jpg to Foo
 * Rename Foo to Foo.jpeg
 * Rename Foo.jpeg to Foo.gif
 * Upload Bar (GIF file)
 * Rename Bar to Bar.gif
 * Set $wgSaveDeletedFiles=true
 * Set $wgFileStore['deleted']['directory'] to valid directory
 * Delete, then undelete an image
 * Upload a new version of an image
 * With no extension
 * with proper extension
 * Change configuration of default extension from "jpg" to "jpeg". Deal with images from before transition
 * Install previous major version of MediaWiki, set wgCheckFileExtensions=false, upload images (with/without matching extensions) then upgrade to new version and check images
 * Fresh install of MediaWiki uploading both images with/without matching extension in title