Jump to content

Manual:MIME type detection

From mediawiki.org
(Redirected from Manual:Mime type detection)

MediaWiki tries to detect the MIME type of the files you upload, and rejects the file if the file-extension does not match the MIME type ("The file is corrupt or has an incorrect extension"). If you are getting this error for valid files, try using an external command for detecting the MIME type (see below).

Before the configured method for MIME detection is called, some hard-coded checks are applied. Use debug logging to find out if those checks cause false-positives. (For example, MediaWiki 1.15.3 may misdetect .doc-files from MS Word 2007 as ZIP files.)

For configuring which types of files MediaWiki will accept for uploads, use $wgFileExtensions .

MIME detection

[edit]

If installed, MediaWiki uses PHP's FileInfo module, or the older MimeMagic module. If you are getting an error like mime_magic could not be initialized, magic file is not available, this module is not configured correctly — refer to the PHP documentation for information on how to fix this, or use an external mime detector command instead (see below).

Alternatively, an external command can be configured for detecting the MIME type by setting the $wgMimeDetectorCommand option. The most common setting is:

$wgMimeDetectorCommand = "file -bi"; # on Linux
$wgMimeDetectorCommand = "file -bI"; # on macOS

This uses the GNU file utility to determine the type of the file, which should work right away under Linux. Note that the file utility provided by other Unixes may not support the -i option, and will thus not work. The GNU file utility is also available for Mac OS-X, and for Windows via Cygwin.

If no mime module is installed, and no external mime detector command is configured, MediaWiki relies on PHP's GD module to detect the MIME type. Note that this only works for some well known image types (see [1]), other files will be accepted without any additional checks!

You can also disable the MIME type check completely by setting $wgVerifyMimeType = false; — note however that this is very insecure: arbitrary files can then be uploaded with a "harmless" file extension, but may possibly still get executed/interpreted in a harmful way on the client computer, or the web server. Pending: how does this relate to $wgCheckFileExtensions ?

Improve MIME type detection

[edit]

In case of mis-detection of more specific types like chemical/x-jcamp-dx as text/plain MimeMagicImproveFromExtension or MimeMagicGuessFromContent can be helpful:

/**
 * Example for adding extra file extension based mime detection via LocalSettings.php
 * @param MimeAnalyzer $mimeAnalyzer
 * @param str $ext: File extension.
 * @param str &$mime: MIME type (out).
 */
$wgHooks['MimeMagicImproveFromExtension'][] = static function ( $mimeAnalyzer, $ext, &$mime ) {
    if ( in_array( $ext, ['dx', 'jdx', 'jcm'] ) ) {
        $mime = 'chemical/x-jcamp-dx';
    }
};
/**
 * Example for adding extra file content based mime detection via LocalSettings.php
 * @param MimeAnalyzer $mimeAnalyzer
 * @param str &$head: 1024 bytes of the file in a string (in - Do not alter!).
 * @param str &$tail: More or equal than last 65558 bytes of the file in a string (in - Do not alter!).
 * @param str $file: File path.
 * @param str &$mime: MIME type (out).
 */
$wgHooks['MimeMagicGuessFromContent'][] = static function ( $mimeAnalyzer, &$head, &$tail, $file, &$mime ) {
    if ( str_contains( $head, '##JCAMP' ) ) {
        $mime = 'chemical/x-jcamp-dx';
    }
};

MIME type validation

[edit]

MediaWiki stores its default MIME types and media types in MimeMap.php.

To support extra MIME types for uploads on your wiki, you can use the MimeMagicInit hook since MediaWiki 1.24.

/**
 * Example for adding extra MIME types via LocalSettings.php
 * @param MimeAnalyzer $mime
 */
$wgHooks['MimeMagicInit'][] = static function ( $mime ) {
    $mime->addExtraTypes( 'text/plain md' );
    $mime->addExtraInfo( 'text/example [OFFICE]' );
};

Extra types

[edit]

The MIME type define file extensions that are allowed for a given MIME type. To recognise .md files as text/plain for Markdown:

$mime->addExtraTypes( 'text/plain md' );

Remember to also add the extension to $wgFileExtensions to allow it to be used for new uploads on your wiki.

You can specify multiple file extensions as well, for example the following is what MediaWiki internally would have done for JPEG:

$mime->addExtraTypes( 'image/jpeg jpg jpeg jpe' );

Note that the MIME type of some file formats may be detected too broadly - any XML-based format may show up as text/xml, any ZIP-based format as application/zip. Consequently, the file extensions for such formats must be associated with their broader MIME type, e.g.:

text/xml xml xsl xslt rss rdf
application/zip zip jar xpi
application/msword doc xls ppt

Extra info

[edit]

The "mime info" data is used to resolve aliases for MIME types, and to assign a media type to them. It contains one line per MIME type; the first item on the line is the canonical MIME type name (which will be used internally), the last item is of the form [XXX] and defines the media type for the MIME type.

To assign text/example under the "OFFICE" media type:

$mime->addExtraInfo( 'text/example [OFFICE]' );

Some examples:

image/png image/x-png	[BITMAP]
image/svg image/svg+xml application/svg+xml application/svg	[DRAWING]
audio/mp3 audio/mpeg3 audio/mpeg	[AUDIO]

Note that for OGG files, the media type is determined programmatically: AUDIO for vorbis, VIDEO for theora, MULTIMEDIA otherwise.

The media type is specific to MediaWiki, and determines what kind of media is contained in the file, as opposed to what format the file is in. This information is stored in the image table, along with the MIME type. It is currently not used for much, but could be used in the future to determine how to present a file to the user. The following types are defined:

UNKNOWN unknown format
BITMAP some bitmap image or image source (like psd, etc). Can't scale up.
DRAWING some vector drawing (SVG, WMF, PS, ...) or image source (oo-draw, etc). Can scale up.
AUDIO simple audio file (ogg, mp3, wav, midi, whatever)
VIDEO simple video file (ogg, mpg, etc; do not include formats here that may contain executable sections or scripts!)
MULTIMEDIA Scriptable Multimedia (flash, advanced video container formats, etc)
OFFICE Office Documents, Spreadsheets (office formats possibly containing applets, scripts, etc)
TEXT Plain text (possibly containing program code or scripts)
EXECUTABLE binary executable
ARCHIVE archive file (zip, tar, etc)

Forbidden files

[edit]

In addition to the $wgFileExtensions option, the following settings may cause files to be rejected (even if $wgStrictFileExtensions = false; is set):

In addition, MediaWiki rejects all files that look like scripts that could be accidentally executed on either the web server or the user's browser. Notably, anything that looks like one of the following formats will be rejected, regardless of detected MIME type or file extension: HTML, JavaScript, PHP, shell scripts. Note that the detection of HTML and JavaScript is rather broad, and may report false positives — this is so because the Microsoft Internet Explorer is known to interpret files that look like HTML, regardless of file extension or MIME type reported by the web server, which would lead to the site being vulnerable to cross-site scripting attacks. If you really want to allow even such dangerous files, you can hack the detectScript function in the UploadBase.php file to always return false.

Virus scans

[edit]

Pending. For now, see $wgAntivirus and $wgAntivirusSetup

MIME types when downloading

[edit]

Note that the MIME type used when the actual file is served to the user's browser is not determined by MediaWiki's MIME-detection: files are not served through MediaWiki, but directly by the web server. Thus, the web server must be configured to use the correct MIME type for each file extension - for example, if you are having trouble viewing SVG files in your browser, make sure the server is configured to deliver them as image/svg+xml. (For Apache, read about mod_mime.)

See also

[edit]

Older discussion on meta: