Requests for comment/Extensionless files


 * Date: 2010-03-31
 * Author: RobLa
 * Status: checked in on extensionless-files branch
 * Tracking bug: 4421

Problem
Currently all images must include an extension that specifies the format of the image (such as .jpg, .png, .gif, .svg, etc.)

If a new version of the image is uploaded which is in a different format, it must be uploaded under a different image name. Then all the pages that use the image have to be changed, and the history of the old image is lost. This is a lot of unnecessary hassle.

Ideally the image name should not have to include this information, since it doesn't matter to those who use the image whether it's a JPEG or a PNG. For example, it would be much better to be able to say  than to have to say. The author of the article shouldn't have to know (or care) what format the image is in.

User Perspective
From a user's perspective, if the feature is enabled (which it is not by default, see "Sysadmin's Perspective" below), this removes many common error messages without adding new ones. Previously, when the user uploaded a file and gave it a name that didn't conform to the extension naming rules for a particular file type, an error would be reported. After implementing this change, the extension of the uploaded file still needs to conform to whatever whitelist/blacklist rules are in place, and the detected MIME type needs to also conform, but the ultimate page title for the file can be any valid page title.

Sysadmin's Perspective
This modification alters the behavior of the $wgCheckFileExtensions variable. In MediaWiki 1.15, there were four different behaviors based on the setting of this variable and $wgStrictFileExtensions:

This change alters that behavior

Design
Most of the complexity comes from needing to store the files on the filesystem with appropriate file extensions, since these get served directly from the filesystem from Apache. Thus, there's a lot of extra logic for tacking on the file extension when it's needed.

Generally speaking, the design involves:
 * Adding new getFilename* counterparts to getName* functions, and using getFilename* in place of getName* where appropriate
 * Storing the file extension in the database

The file extension is stored in a new 'img_file_ext' field in the 'image' table (and similar fields to oldimage and filearchive). This field defaults to null. When it is set to null, the file name and the page title are the same.

On upload of a new file and upon rename/move, the page title is still reconciled against the MIME type. However, instead of this being an error condition, instead it is merely used as a trigger for storing a file extension in img_file_ext. When the File object for an file is queried for the filename,

API modifications
The main API change is to publishBatch and publish. The previous versions of these APIs did not accommodate the possibility that the new archive name might be different than the old archive name.


 * Old function:
 * function publish( $srcPath, $dstRel, $archiveRel, $flags = 0 ) {
 * New function:
 * function publish( $srcPath, $dstRel, $currentRel, $archiveRel, $flags = 0 ) {

The new function has a new "$currentRel" parameter, which is the current relative path to existing file. Usually the same as dstRel, but may be different if the MIME type (and thus file extension) changes.

Similarly, publishBatch needed to be modified. The previous version took at 3-tuple (triplet) as its first parameter. The new version takes a 4-tuple:
 * Old version:
 * function publishBatch( $triplets, $flags = 0 ) {
 * $triplets is a list of triplets, each containing ($srcPath, $dstRel, $archiveRel)
 * New version:
 * function publishBatch( $tuples, $flags = 0 ) {
 * $tuples is a list of tuples, each containing ($srcPath, $dstRel, $currentRel, $archiveRel)

New methods

 * Dealing with MIME types and file extensions:
 * MimeMagic::getPreferredExtensionForType( $mime ) - maps the MIME type back to the preferred file extension.
 * File::getNormalizedExtensionFromName( $name ) - Given a file name, return the normalized extension. (e.g. for "foo.JPeG", return "jpg")
 * LocalFile::getAddedFileExt {
 * getAddedExtensionFromTitle( $title, $mime = NULL ) {
 * function getPreferredExtensionForType( $mime ) {
 * function getNormalizedExtensionFromName( $name ) {
 * getName counterparts
 * File::getFilenameFromTitle( $title, $mime = NULL ) - Return the file name, given the page title, and possibly a MIME type. This function replaces getNameFromTitle for those uses where the actual on-disk filename is what is needed (e.g. in LocalFileMoveBatch)
 * File::getFilename - Replacement for getName in contexts where the filename is what is really needed.
 * OldFile::getArchiveFilename - Replacement for getArchiveName in contexts where the filename is what is really needed.
 * Refactoring to accommodate new code
 * prepTarget( $targetRel ) - this was split out from publishBatch in FSRepo. This was some generally good code hygiene anyway (it replaces some duplicated code blocks with prepTarget function calls), but became essential because the duplicated parts were what needed to be expanded with more complicated logic.
 * New thumb directory functions
 * File::getThumbRel - Thumb directory is named after page title, rather than file name, so new function needed to replace getRel in places where the thumb directory is really what is needed (getRel returns a path to a filename)
 * File::getThumbUrlRel - Thumb directory counterpart to getUrlRel
 * Upgrade functions
 * LocalFile::fixAddedOldFileExtensionsInDB - done at the end of LocalFile::dbDBUpdates

Test plan

 * Image renaming:
 * Upload Foo.jpg
 * Rename Foo.jpg to Foo
 * Rename Foo to Foo.jpeg
 * Rename Foo.jpeg to Foo.gif
 * Upload Bar (GIF file)
 * Rename Bar to Bar.gif
 * Set $wgSaveDeletedFiles=true
 * Set $wgFileStore['deleted']['directory'] to valid directory
 * Delete, then undelete an image
 * Upload a new version of an image
 * With no extension
 * with proper extension
 * Change configuration of default extension from "jpg" to "jpeg". Deal with images from before transition
 * Install previous major version of MediaWiki, set wgCheckFileExtensions=false, upload images (with/without matching extensions) then upgrade to new version and check images
 * Fresh install of MediaWiki uploading both images with/without matching extension in title