Manual:MIME タイプ検出

From mediawiki.org
This page is a translated version of the page Manual:MIME type detection and the translation is 29% complete.

MediaWikiはアップロードファイルのMIME typeを検知することを試み、ファイルの拡張子がmimeタイプにマッチしないファイルを拒否します("The file is corrupt or has an incorrect extension")。 妥当なファイルのためにこのエラーを取得する場合、MIMEタイプを検知するために外部コマンドを使用することを試してみて下さい(下記を参照)。

Before the configured method for MIME detection is called, some hard-coded checks are applied. Use debug logging to find out if those checks cause false-positives. (For example, MediaWiki 1.15.3 may misdetect .doc-files from MS Word 2007 as ZIP files.)

For configuring which types of files MediaWiki will accept for uploads, use $wgFileExtensions .

MIME 検出

インストールされている場合、MediaWiki は FileInfo モジュールまたはより古い MimeMagic モジュールを使用します。 mime_magic could not be initialized, magic file is not available のようなエラーを得る場合、このモジュールは正しく設定されていません — どのように修正をするのか PHP 説明文書を参照するか、外部の mime ディテクター コマンドを使用して下さい(下記を参照)。

代わりに、mime タイプを検出するために $wgMimeDetectorCommand オプションを設定することで外部コマンドを利用できます。 もっとも一般的な設定は以下:

$wgMimeDetectorCommand = "file -bi"; # on Linux
$wgMimeDetectorCommand = "file -bI"; # on macOS

これはファイルのタイプを決定する GNU file ユーティリティを使用します。このユーティリティは Linux 上ですぐに動作します。 他の Unix によって提供される file ユーティリティは -i オプションがサポートされないが故に動作しないことがあります。 GNU file は Mac OS-X でも利用可能です。Windows では Cygwin を通して利用可能です。

mime モジュールがインストールされいないで、外部の mime ディテクターのコマンドが設定されていない場合、MediaWiki はmime タイプを検出するために PHP の GD モジュールに依存します。 これはよく知られている画像タイプにしか動作せず ([1] を参照)、他のファイルは追加のチェックがされることなく受け取られます!

$wgVerifyMimeType = false; を設定することで MIME タイプ チェックを完全に無効にすることもできます — しかしながらこれはとても不確かであることに注意してください: 任意のファイルが"有害な"ファイル拡張子でアップロードされ、クライアントのコンピューターまたはウェブ サーバーで実行/解釈される可能性があります。 保留: $wgCheckFileExtensions に関連したこれをどうするか?

Improve MIME type detection

In case of mis-detection of more specific types like chemical/x-jcamp-dx as text/plain MimeMagicImproveFromExtension or MimeMagicGuessFromContent can be helpful:

/**
 * Example for adding extra file extension based mime detection via LocalSettings.php
 * @param MimeAnalyzer $mimeAnalyzer
 * @param str $ext: File extension.
 * @param str &$mime: MIME type (out).
 */
$wgHooks['MimeMagicImproveFromExtension'][] = static function ( $mimeAnalyzer, $ext, &$mime ) {
    if ( in_array( $ext, ['dx', 'jdx', 'jcm'] ) ) {
        $mime = 'chemical/x-jcamp-dx';
    }
};
/**
 * Example for adding extra file content based mime detection via LocalSettings.php
 * @param MimeAnalyzer $mimeAnalyzer
 * @param str &$head: 1024 bytes of the file in a string (in - Do not alter!).
 * @param str &$tail: More or equal than last 65558 bytes of the file in a string (in - Do not alter!).
 * @param str $file: File path.
 * @param str &$mime: MIME type (out).
 */
$wgHooks['MimeMagicGuessFromContent'][] = static function ( $mimeAnalyzer, &$head, &$tail, $file, &$mime ) {
    if ( str_contains( $head, '##JCAMP' ) ) {
        $mime = 'chemical/x-jcamp-dx';
    }
};

MIME タイプ検証

MediaWiki stores its default MIME types and media types in MimeMap.php.

To support extra MIME types for uploads on your wiki, you can use the MimeMagicInit hook since MediaWiki 1.24.

/**
 * Example for adding extra MIME types via LocalSettings.php
 * @param MimeAnalyzer $mime
 */
$wgHooks['MimeMagicInit'][] = static function ( $mime ) {
    $mime->addExtraTypes( 'text/plain md' );
    $mime->addExtraInfo( 'text/example [OFFICE]' );
};

Extra types

The MIME type define file extensions that are allowed for a given MIME type. To recognise .md files as text/plain for Markdown:

$mime->addExtraTypes( 'text/plain md' );

Remember to also add the extension to $wgFileExtensions to allow it to be used for new uploads on your wiki.

You can specify multiple file extensions as well, for example the following is what MediaWiki internally would have done for JPEG:

$mime->addExtraTypes( 'image/jpeg jpg jpeg jpe' );

Note that the MIME type of some file formats may be detected too broadly - any XML-based format may show up as text/xml, any ZIP-based format as application/zip. Consequently, the file extensions for such formats must be associated with their broader MIME type, e.g.:

text/xml xml xsl xslt rss rdf
application/zip zip jar xpi
application/msword doc xls ppt

Extra info

The "mime info" data is used to resolve aliases for MIME types, and to assign a media type to them. It contains one line per MIME type; the first item on the line is the canonical MIME type name (which will be used internally), the last item is of the form [XXX] and defines the media type for the MIME type.

To assign text/example under the "OFFICE" media type:

$mime->addExtraInfo( 'text/example [OFFICE]' );

例:

image/png image/x-png	[BITMAP]
image/svg image/svg+xml application/svg+xml application/svg	[DRAWING]
audio/mp3 audio/mpeg3 audio/mpeg	[AUDIO]

Note that for OGG files, the media type is determined programmatically: AUDIO for vorbis, VIDEO for theora, MULTIMEDIA otherwise.

The media type is specific to MediaWiki, and determines what kind of media is contained in the file, as opposed to what format the file is in. This information is stored in the image table, along with the MIME type. It is currently not used for much, but could be used in the future to determine how to present a file to the user. The following types are defined:

UNKNOWN 不明な形式
BITMAP some bitmap image or image source (like psd, etc). Can't scale up.
DRAWING some vector drawing (SVG, WMF, PS, ...) or image source (oo-draw, etc). Can scale up.
AUDIO simple audio file (ogg, mp3, wav, midi, whatever)
VIDEO simple video file (ogg, mpg, etc; do not include formats here that may contain executable sections or scripts!)
MULTIMEDIA Scriptable Multimedia (flash, advanced video container formats, etc)
OFFICE Office Documents, Spreadsheets (office formats possibly containing applets, scripts, etc)
TEXT Plain text (possibly containing program code or scripts)
EXECUTABLE binary executable
ARCHIVE archive file (zip, tar, etc)

Forbidden files

In addition to the $wgFileExtensions option, the following settings may cause files to be rejected (even if $wgStrictFileExtensions = false; is set):

In addition, MediaWiki rejects all files that look like scripts that could be accidentally executed on either the web server or the user's browser. Notably, anything that looks like one of the following formats will be rejected, regardless of detected MIME type or file extension: HTML, JavaScript, PHP, shell scripts. Note that the detection of HTML and JavaScript is rather broad, and may report false positives — this is so because the Microsoft Internet Explorer is known to interpret files that look like HTML, regardless of file extension or MIME type reported by the web server, which would lead to the site being vulnerable to cross-site scripting attacks. If you really want to allow even such dangerous files, you can hack the detectScript function in the UploadBase.php file to always return false.

ウィルス スキャン

保留中。 For now, see $wgAntivirus and $wgAntivirusSetup

MIME types when downloading

Note that the MIME type used when the actual file is served to the user's browser is not determined by MediaWiki's MIME-detection: files are not served through MediaWiki, but directly by the web server. Thus, the web server must be configured to use the correct MIME type for each file extension - for example, if you are having trouble viewing SVG files in your browser, make sure the server is configured to deliver them as image/svg+xml. (For Apache, read about mod_mime.)

関連項目

Older discussion on meta: