User:Melissa 4.0/Extension:FileFetcher

From mediawiki.org
MediaWiki extensions manual
File Fetcher
Release status: experimental
Implementation Parser function, Tag
Description Fetches files from outside the wiki for inclusion in pages using standard image syntax functionality
Author(s) Douglas Williams (Melissa 4.0talk)
Latest version 0.2.1 (2009-04-12)
MediaWiki 1.14.0
License GPL
Download No link
Translate the Melissa 4.0/Extension:FileFetcher extension if it is available at translatewiki.net

Description[edit]

Whilst MediaWiki has solid support for the display of uploaded media, it does not always extend this functionality to external files (which have not been uploaded to the wiki). This can be an inconvenience for users who wish to maintain their media files separately from their wiki (for example, within a hierarchical directory structure on their hard disk), but would nonetheless like to take advantage of MediaWiki's image syntax functionality for formatting (border, frame, thumb, frameless), resizing, and captioning images.

This extension works by fetching any specified file from its path or URI, storing it into a repository, and making it accessible through the standard image syntax. Before using this extension, make sure to check whether any of the simpler alternatives would suffice for your needs.

Warning[edit]

This extension is only intended for use in personal or corporate wikis deployed in a high-trust environment. The aim of this extension is to facilitate the retrieval of files from external sources, including the local filesystem. If used in an inappropriate setting, this extension would pose a severe security breach since all users with write access to the wiki would be capable of accessing any file on the local filesystem (including those exposed through network shares) and injecting malicious content.

Usage[edit]

The {{fetchfile:}} parser function fetches the file from the specified path or URI, and returns its file name as stored in the repository. Thus, it could easily be used to combine a web address with image syntax; for example:

[[File:{{fetchfile:http://upload.wikimedia.org/wikipedia/mediawiki/a/a9/Example.jpg}}|thumb|left|200px|This is a sample image.]]

The same syntax may be used with local paths:

[[File:{{fetchfile:C:\Downloaded Images\Example.jpg}}|thumb|left|200px|This is an sample image.]]

Either of the above would give a result similar to:

This is an sample image.

The same syntax may also be used for image galleries, using the <fetchgallery> parser extension tag:

<fetchgallery widths="200px" heights="150px">
File:{{fetchfile:http://upload.wikimedia.org/wikipedia/mediawiki/a/a9/Example.jpg}}|Image fetched from web address.
File:{{fetchfile:C:\Downloaded Images\Example.jpg}}|Image fetched from local path.
</fetchgallery>

Download instructions[edit]

Please download the archive containing the PHP source files from the link given in the infobox above, and extract it into the $IP/extensions directory. (Note: $IP stands for the root directory of your MediaWiki installation, the same directory that holds LocalSettings.php.)

Installation[edit]

To install this extension, add the following to LocalSettings.php:

require_once("$IP/extensions/FileFetcher/FileFetcher.php");
# Add configuration settings here.

Documentation[edit]

Most of the extension is implemented through the following classes (with indentation indicating subclassing):

  • FileFetcher: The main file fetcher class, instantiated using the singleton pattern. Responsible for fetching a file from its source using the appropriate SourceHandler subclass instance. In order to support the fetching of files from diverse sources using specific protocols (wrappers), all the functionality required for handling source files has been delegated to the SourceHandler class (as implemented in its protocol-specific subclasses). Similarly, in order to support the storing of files in diverse repositories, all the functionality required for handling target files has been delegated to the FileRepoHandler class (as implemented in its repository-specific subclasses).
  • SourceHandler: Base class defining methods and members for handling the fetching of files from sources using a specific protocol (wrapper).
    • FileSourceHandler: Base class defining methods and members for handling the fetching of files from file-based sources using the PHP in-built file functions.
      • FSSourceHandler: Source handler responsible for the fetching of files from the local filesystem.
      • HttpSourceHandler: Source handler responsible for the fetching of files over the HTTP or HTTPS protocol.
  • FileRepoHandler: Base class defining methods and members for handling the storing of files into the associated file repository.
    • ForeignRepoHandler: Base class defining methods and members for handling the storing of files into a foreign file repository registered with the MediaWiki core through the $wgForeignFileRepos array.
      • FSRepoHandler: File repository handler for storing files through a FSRepo (filesystem foreign repository) instance initialized by the MediaWiki core.

Both source handlers and file repository handlers may be configured by modifying the values of their public class members. Source handlers are instantiated using the singleton pattern; the singleton instance of each concrete source handler class (namely, FSSourceHandler and HttpSourceHandler) may be accessed through its getInstance() static method. Each source handler has an associated file repository handler, which may be accessed through the former's $mFileRepoHandler member.

Source handlers[edit]

The main configuration members of the source handlers are the following:

  • SourceHandler::$mFileRepoHandler: The file repository handler associated with this source handler. It is allowed for multiple source handler instances to share the same file repository handler instance; in such cases, the settings for the latter would also be shared.
  • SourceHandler::$mFileSizeMax: The maximum size of a fetched file, in bytes. Files exceeding this size will not be fetched, and will throw an exception. Set to a negative value to refrain from enforcing a maximum size (not recommended).
  • SourceHandler::$mTolerateUnknownFileSize: Whether to tolerate unknown file sizes. A file size may be unknown either because the source does not support reporting it, or because an error was encountered whilst the source handler was determining it. If set to true, the file should be fetched anyway if the file size is unknown. If set to false, the file should not be fetched if the file size is unknown.
  • FSSourceHandler::$mHandleNonUris: Whether to handle any path which is not recognized as a URI. Set to true to handle non-URI file path representations, such as "C:\Image.jpg" on Windows and "/home/name/Image.jpg" on Linux. Set to false to refrain from handling any path which is not recognized as a URI. Valid URLs belonging to the file URI scheme (starting with "file:") are always handled in either case.

For example:

# Do not allow the filesystem source handler to handle non-URI paths.
# All filesystem paths would need to be specified using the file URI scheme.
FSSourceHandler::getInstance()->mHandleNonUrls = false;

# Set the maximum size of files to be fetched over HTTP to 200 KiB.
HttpSourceHandler::getInstance()->mFileSizeMax = 200 * pow( 1024, 1 );

File repository handlers[edit]

The target file name for a file to be saved in the associated repository of a file repository handler is constructed from the following components:

  • the source file name (excluding extension)
  • a space character, ' ', if both the source file name and the generated hash are non-empty
  • the generated hash
  • a dot character, '.', if the source file extension is non-empty
  • the source file extension

The FileRepoHandler class defines three length preservation members affecting the construction of target file names:

  • FileRepoHandler::$mPreservedFileNameLength: The maximum length of the source file name (excluding extension) to be preserved when constructing the target file name.
  • FileRepoHandler::$mPreservedFileExtensionLength: The maximum length of the source file extension to be preserved when constructing the target file name.
  • FileRepoHandler::$mPreservedHashLength: The maximum length of the generated hash to be preserved when constructing the target file name. If set to 0, the hash is not generated at all.

These members determine how their associated components are used in the construction of the target file name, in the following manner:

  • Set to a positive value n to preserve the first n characters from the component.
  • Set to 0 to preserve nothing of the component.
  • Set to a negative value to preserve the full component (not recommended for arbitrary-length components).

If the constructed target file name is empty, an exception would be thrown. This would happen if, for example, all the component length preservation members are set to 0.

The other configuration members of the file repository handlers are the following:

  • FileRepoHandler::$mUseMemoryString: Whether to read the contents of the source file into a string in memory before storing it into the file repository. If set to true, the file would be read into a string in memory before being saved to the file repository. If set to false, the file would be copied directly to the file repository.
  • FileRepoHandler::$mHashContents: Whether to generate the hash from the actual contents of the source file, rather than from its path or URI. Set to true to generate the hash from the actual contents of the source file. Set to false to generate the hash from the file path or URI of the source file. This setting has no effect if $mPreservedHashLength is set to 0.
  • FileRepoHandler::$mParenthesizeHash: Whether to introduce parenthetical marks, "(...)", around the generated hash (if non-empty) when constructing the target file name.
  • ForeignRepoHandler::$mForeignRepoInfo: The foreign repository structure (array) from which the FileRepo instance associated with this handler is initialized by the MediaWiki core. One may change its element values or replace it altogether; however, it should never be set to null after being registered with the MediaWiki core $wgForeignFileRepos array.

Provided that $mPreservedHashLength is not set to 0, hash generation works as follows:

  • If $mHashContents is set to false, then the hash is generated from the source file path or URI (i.e. from the string containing the path or URI itself), without paying any consideration to the file contents.
  • If $mHashContents is set to true, then the hash is generated from the file contents, in the following manner:
    • If $mUseMemoryString is set to true, then the hash is generated from the file contents as formerly read to a string in memory.
    • If $mUseMemoryString is set to false, then the hash is generated from the file contents as read during the hash operation itself.

For example:

# Set the file repository handler associated with the HTTP source handler to generate the hash for the 
# target file name from the source URI itself rather than from the source file contents.
# This way, the file would not have to be retrieved at all if it already exists in the repository.
HttpSourceHandler::getInstance()->mFileRepoHandler->mHashContents = false;
		
# Set the file repository handler associated with the filesystem source handler to read the contents of 
# the source file into a string in memory before storing it in the repository.
# This way, should its $mHashContents be set to true, the file would 
# not need to be read twice (once for the hash and once for the direct copy),
# since the hash would be performed on the string in memory. 
FSSourceHandler::getInstance()->mFileRepoHandler->mUseMemoryString = true;

# Set the directory into which the file repository handlers associated with both source handlers
# store the fetched files.
  FSSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'directory' ] = "$IP/images/fsfetched";
HttpSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'directory' ] = "$IP/images/httpfetched";

# Set the base URLs through which the files will be accessed by clients to correspond
# with the above directory. Always ensure that the paths correspond (unless you
# have set up aliases through your web server).
  FSSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'url' ] = "$wgScriptPath/images/fsfetched";
HttpSourceHandler::getInstance()->mFileRepoHandler->mForeignRepoInfo[ 'url' ] = "$wgScriptPath/images/httpfetched";

Simpler alternatives[edit]

In one simply wishes to allow inline images hosted externally (hotlinking) without requiring any image syntax functionality, simply set $wgAllowExternalImages to true. For images from the local filesystem, one could use the file URI scheme after enabling it through $wgUrlProtocols. (Note that some browsers, including Mozilla Firefox, will not follow file URLs on pages that have been loaded via HTTP as a security measure. For details and workarounds, see the MozillaZine article "Links to local pages do not work".)

# Allow external images to be rendered inline with text.
$wgAllowExternalImages = true;

# Enable links to files on the local filesystem through the file: protocol.
$wgUrlProtocols[] = "file://";

MediaWiki does offer some support for manipulating external media with standard image syntax functionality if one registers the foreign repository with the $wgForeignFileRepos array. For the local filesystem, the relevant core repository class would be the FSRepo.

# Define the filesystem foreign repository structure and register it with MediaWiki.
$wgForeignFileRepos[] = array
(
    # The class name for the repository.
    # The core repository classes are LocalRepo, ForeignDBRepo, FSRepo, and ForeignAPIRepo.
        'class'       => "FSRepo",
    # A unique name for the repository.
        'name'        => "MyRepo",		
    # The root directory in which the files are located.
        'directory'   => "$IP/myfiles",
    # The number of directory levels for hash-based division of files.
        'hashLevels'  => 0,
    # The base public URL.
        'url'         => "$wgScriptPath/myfiles",
);

FSRepo has two limitations (which we aimed to address through our extension):

  • All files must reside in a single directory. Subdirectories are only permitted for dividing files according to the hash level.
  • All files must be accessible through a URL.

Finally, if one's aim is simply to automate the upload of large quantities of files, then this extension should not be used. The file repository handler implemented in this extension does not register any of the fetched files with the MediaWiki local repository (LocalRepo), losing out on the benefits that such integration would offer. This was a design decision, since we wanted a solution which kept the files as separate from MediaWiki – in fact, the file fetching process does not make any changes to the wiki's database, and the directory in which the fetched files are stored may be safely deleted at any time without breaking the system. (In the case of deletion, the files would be fetched again the next time one of the pages in which they appear is visited.)

Bugs and limitations[edit]

  • Unicode is not supported yet.