Manual:MIME type detection

From MediaWiki.org
(Redirected from Manual:Mime type detection)
Jump to: navigation, search

MediaWiki tries to detect the MIME type of the files you upload, and rejects the file if the file-extension does not match the mime type ("The file is corrupt or has an incorrect extension"). If you are getting this error for valid files, try using an external command for detecting the MIME type (see below).

Note: Before the configured method for MIME detection is called, some hard-coded checks are applied. Use debug logging to find out if those checks cause false-positives. (For example, 1.15.3 may misdetect .doc-files from MS Word 2007 as ZIP files.)

For configuring which types of files MediaWiki will accept for uploads, use $wgFileExtensions.

MIME detection[edit | edit source]

If installed, MediaWiki uses PHP's FileInfo module, or the older MimeMagic module. If you are getting an error like mime_magic could not be initialized, magic file is not available, this module is not configured correctly — refer to the PHP documentation for information on how to fix this, or use an external mime detector command instead (see below). In case you have the FileInfo module installed, but not loaded automatically, you can also try to set $wgLoadFileinfoExtension = true;, so the modules is loaded by PECL at runtime.

Alternatively, an external command can be configured for detecting the mime type by setting the $wgMimeDetectorCommand option. The most common setting is:

$wgMimeDetectorCommand = "file -bi";
$wgMimeDetectorCommand = "file -bI"; (on MacOSX)

This uses the GNU file utility to determine the type of the file, which should work right away under Linux. Note that the file utility provided by other Unixes may not support the -i option, and will thus not work. The GNU file utility is also available for Mac OS-X, and for Windows via Cygwin.

If no mime module is installed, and no external mime detector command is configured, MediaWiki relies on PHP's GD module to detect the mime type. Note that this only works for some well known image types (see [1]), other files will be accepted without any additional checks!

You can also disable the MIME type check completely by setting $wgVerifyMimeType = false; — note however that this is very insecure: arbitrary files can then be uploaded with a "harmless" file extension, but may possibly still executed/interpreted in a harmful way on the client computer, or the web server. Pending: how does this relate to $wgCheckFileExtensions?

MIME type validation[edit | edit source]

MediaWiki uses two files to check and interpret the mime type — both are plain files, with one entry per line, and items in one line separated by whitespace; they are located in the includes directory of your MediaWiki installation. If you want to upload uncommon types of files, you may need to add the appropriate information here:

mime.types is used to map MIME types to file extensions, and vice versa. It contains one line per mime type; the first item on the line is the (canonical — see below) MIME type, the items following that are file extensions that are allowed for this mime type (this is the same format used for the standard mime.info files on Linux/Unix systems). For example, for JPEG files, the following line applies:

 image/jpeg jpeg jpg jpe

Note that the MIME type of some file formats may be detected too broadly — any XML-based format may show up as text/xml, any ZIP-based format as application/zip, etc. Consequently, the file extensions for such formats must be associated with the broader MIME type, e.g.:

 text/xml xml xsl xslt rss rdf
 application/zip zip jar xpi  sxc stc  sxd std   sxi sti   sxm stm   sxw stw odt ott oth odm odg otg odp otp ods ots odc odf odb odi oxt
 application/msword doc xls ppt

mime.info is used to resolve aliases for MIME types, and to assign a media type to them. It contains one line per mime type; the first item on the line is the canonical MIME type name (which will be used internally), the last item is of the form [XXX] and defines the media type for the mime type. All items in between are secondary names of the MIME type. Some examples:

 image/png image/x-png     [BITMAP]
 image/svg image/svg+xml application/svg+xml application/svg    [DRAWING]
 audio/mp3 audio/mpeg3 audio/mpeg       [AUDIO]

Note that for OGG files, the media type is determined programmatically: AUDIO for vorbis, VIDEO for theora, MULTIMEDIA otherwise.

The media type is specific to MediaWiki, and determines what kind of media is contained in the file, as opposed to what format the file is in. This information is stored in the image table, along with the mime type. It is currently not used for much, but could be used in the future to determine how to present a file to the user. The following types are defined:

 UNKNOWN     // unknown format
 BITMAP      // some bitmap image or image source (like psd, etc). Can't scale up.
 DRAWING     // some vector drawing (SVG, WMF, PS, ...) or image source (oo-draw, etc). Can scale up.
 AUDIO       // simple audio file (ogg, mp3, wav, midi, whatever)
 VIDEO       // simple video file (ogg, mpg, etc; no not include formats here that may contain executable sections or scripts!)
 MULTIMEDIA  // Scriptable Multimedia (flash, advanced video container formats, etc)
 OFFICE      // Office Documents, Spreadsheets (office formats possibly containing apples, scripts, etc)
 TEXT        // Plain text (possibly containing program code or scripts)
 EXECUTABLE  // binary executable
 ARCHIVE     // archive file (zip, tar, etc)

Forbidden files[edit | edit source]

In addition to the $wgFileExtensions option, the following settings may cause files to be rejected (even if $wgStrictFileExtensions = false; is set):

In addition, MediaWiki rejects all files that look like scripts that could be accidentally executed on either the web server or the user's browser. Notably, anything that looks like one of the following formats will be rejected, regardless of detected mime type or file extension: HTML, JavaScript, PHP, shell scripts. Note that the detection of HTML and JavaScript is rather broad, and may report false positives — this is so because the Microsoft Internet Explorer is known to interpret files that look like HTML, regardless of file extension or MIME type reported by the web server, which would lead to the site being vulnerable to cross-site scripting attacks. If you really want to allow even such dangerous files, you can hack the detectScript function in the UploadBase.php file to always return false.

Virus scans[edit | edit source]

Pending. for now, see $wgAntivirus and $wgAntivirusSetup

MIME types when downloading[edit | edit source]

Note that the MIME type used when the actual file is served to the user's browser is not determined by MediaWikis MIME-detection: files are not served through MediaWiki, but directly by the web server. Thus, the web server must be configured to use the correct MIME type for each file extension — for example, if you are having trouble viewing SVG files in your browser, make sure the server is configured to deliver them as image/svg+xml. (For Apache, read about mod_mime).

See also[edit | edit source]

Older discussion on meta:

Language: English  • 日本語