Requests for comment/Extensionless files

From mediawiki.org
Request for comment (RFC)
Extensionless files
Component General
Creation date
Author(s) RobLa
Document status
See Phabricator.

This was implemented in a development branch against MediaWiki 1.15, but is now stale, abandoned.

Problem[edit]

Currently all images must include an extension that specifies the format of the image (such as .jpg, .png, .gif, .svg, etc.)

If a new version of the image is uploaded which is in a different format, it must be uploaded under a different image name. Then all the pages that use the image have to be changed, and the history of the old image is lost. This is a lot of unnecessary hassle.

Ideally the image name should not have to include this information, since it doesn't matter to those who use the image whether it's a JPEG or a PNG. For example, it would be much better to be able to say [[Image:Map of Europe]] than to have to say [[Image:Map of Europe.png]]. The author of the article shouldn't have to know (or care) what format the image is in.

Implementation[edit]

User Perspective[edit]

From a user's perspective, if the feature is enabled (which it is not by default, see "Sysadmin's Perspective" below), this removes many common error messages without adding new ones. Previously, when the user uploaded a file and gave it a name that didn't conform to the extension naming rules for a particular file type, an error would be reported. After implementing this change, the extension of the uploaded file still needs to conform to whatever whitelist/blacklist rules are in place, and the detected MIME type needs to also conform, but the ultimate page title for the file can be any valid page title.

Sysadmin's Perspective[edit]

This modification alters the behavior of the $wgCheckFileExtensions variable. In MediaWiki 1.15, there were four different behaviors based on the setting of this variable and $wgStrictFileExtensions:

Current Behavior (MediaWiki 1.15)
$wgCheckFileExtensions $wgStrictFileExtensions Behavior
false false $wgFileExtensions variable is not checked at all. All page titles must include a file extension which is checked to make sure it corresponds to the MIME type advertised upon upload. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.
false true (default) This is effectively equivalent to setting both $wgCheckFileExtensions and $wgStrictFileExtensions to false. $wgFileExtensions variable is not checked at all. All page titles must include a file extension which is checked to make sure it corresponds to the MIME type advertised upon upload. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.
true (default) false $wgFileExtensions variable checked, and a warning is displayed if the upload doesn't match $wgFileExtensions. User is given the option to override the warning. All page titles must include a file extension which is checked to make sure it corresponds to the MIME type advertised upon upload. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.
true (default) true (default) $wgFileExtensions variable is checked, and the upload is rejected if it doesn't match $wgFileExtensions. All page titles must include a file extension which is checked to make sure it corresponds to the MIME type advertised upon upload. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.


This change alters that behavior

Proposed behavior
$wgCheckFileExtensions $wgStrictFileExtensions Behavior
false false $wgFileExtensions variable is checked, and a warning is displayed if $wgFileExtensions is not empty and the uploaded file doesn't match one of the listed extensions. The page title of an uploaded file can be any valid title. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.
false true (default) $wgFileExtensions variable is checked, and the upload is rejected if it doesn't match $wgFileExtensions. The page title of an uploaded file can be any valid title. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.
true (default) false $wgFileExtensions variable is checked, and a warning is displayed if $wgFileExtensions is not empty and the uploaded file doesn't match one of the listed extensions. All page titles must include a file extension which is checked to make sure it corresponds to the MIME type advertised upon upload. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.
true (default) true (default) (same as MediaWiki 1.15) $wgFileExtensions variable is checked, and the upload is rejected if it doesn't match $wgFileExtensions. All page titles must include a file extension which is checked to make sure it corresponds to the MIME type advertised upon upload. Any file matching $wgFileBlacklist or $wgMimeTypeBlacklist is rejected.


Design[edit]

Most of the complexity comes from needing to store the files on the filesystem with appropriate file extensions, since these get served directly from the filesystem from Apache. Thus, there's a lot of extra logic for tacking on the file extension when it's needed.

Generally speaking, the design involves:

  • Adding new getFilename* counterparts to getName* functions, and using getFilename* in place of getName* where appropriate
  • Storing the file extension in the database

The file extension is stored in a new 'img_file_ext' field in the 'image' table (and similar fields to oldimage and filearchive). This field defaults to null. When it is set to null, the file name and the page title are the same.

On upload of a new file and upon rename/move, the page title is still reconciled against the MIME type. However, instead of this being an error condition, instead it is merely used as a trigger for storing a file extension in img_file_ext. When the File object for an file is queried for the filename,

API modifications[edit]

The main API change is to publishBatch and publish. The previous versions of these APIs did not accommodate the possibility that the new archive name might be different than the old archive name.

  • Old function:
    • function publish( $srcPath, $dstRel, $archiveRel, $flags = 0 ) {
  • New function:
    • function publish( $srcPath, $dstRel, $currentRel, $archiveRel, $flags = 0 ) {

The new function has a new "$currentRel" parameter, which is the current relative path to existing file. Usually the same as dstRel, but may be different if the MIME type (and thus file extension) changes.

Similarly, publishBatch needed to be modified. The previous version took at 3-tuple (triplet) as its first parameter. The new version takes a 4-tuple:

  • Old version:
    • function publishBatch( $triplets, $flags = 0 ) {
    • $triplets is a list of triplets, each containing ($srcPath, $dstRel, $archiveRel)
  • New version:
    • function publishBatch( $tuples, $flags = 0 ) {
    • $tuples is a list of tuples, each containing ($srcPath, $dstRel, $currentRel, $archiveRel)


New methods[edit]

  • Dealing with MIME types and file extensions:
    • MimeMagic::getPreferredExtensionForType( $mime ) - maps the MIME type back to the preferred file extension.
    • File::getNormalizedExtensionFromName( $name ) - Given a file name, return the normalized extension. (e.g. for "foo.JPeG", return "jpg")
    • LocalFile::getAddedFileExt() {
    • getAddedExtensionFromTitle( $title, $mime = NULL ) {
    • function getPreferredExtensionForType( $mime ) {
    • function getNormalizedExtensionFromName( $name ) {
  • getName counterparts
    • File::getFilenameFromTitle( $title , $mime = NULL ) - Return the file name, given the page title, and possibly a MIME type. This function replaces getNameFromTitle for those uses where the actual on-disk filename is what is needed (e.g. in LocalFileMoveBatch)
    • File::getFilename() - Replacement for getName() in contexts where the filename is what is really needed.
    • OldFile::getArchiveFilename() - Replacement for getArchiveName() in contexts where the filename is what is really needed.
  • Refactoring to accommodate new code
    • prepTarget( $targetRel ) - this was split out from publishBatch in FSRepo. This was some generally good code hygiene anyway (it replaces some duplicated code blocks with prepTarget function calls), but became essential because the duplicated parts were what needed to be expanded with more complicated logic.
  • New thumb directory functions
    • File::getThumbRel() - Thumb directory is named after page title, rather than file name, so new function needed to replace getRel() in places where the thumb directory is really what is needed (getRel() returns a path to a filename)
    • File::getThumbUrlRel() - Thumb directory counterpart to getUrlRel()
  • Upgrade functions
    • LocalFile::fixAddedOldFileExtensionsInDB() - done at the end of LocalFile::dbDBUpdates()

Test plan[edit]

  • Image renaming:
    • Upload Foo.jpg
    • Rename Foo.jpg to Foo
    • Rename Foo to Foo.jpeg
    • Rename Foo.jpeg to Foo.gif
    • Upload Bar (GIF file)
    • Rename Bar to Bar.gif
  • Set $wgSaveDeletedFiles=true
  • Set $wgFileStore['deleted']['directory'] to valid directory
  • Delete, then undelete an image
  • Upload a new version of an image
    • With no extension
    • with proper extension
  • Change configuration of default extension from "jpg" to "jpeg". Deal with images from before transition
  • Install previous major version of MediaWiki, set wgCheckFileExtensions=false, upload images (with/without matching extensions) then upgrade to new version and check images
  • Fresh install of MediaWiki uploading both images with/without matching extension in title

External links[edit]