User:Melissa 4.0/Extension:FileFetcher

Description
Whilst MediaWiki has solid support for the display of uploaded media, it does not always extend this functionality to external files (which have not been uploaded to the wiki). This can be an inconvenience for users who wish to maintain their media files separately from their wiki (for example, within a hierarchical directory structure on their hard disk), but would nonetheless like to take advantage of MediaWiki's image syntax functionality for formatting, resizing, and captioning images.

This extension works by fetching any specified file from its path or URI, storing it into a repository, and making it accessible through the standard image syntax. Before using this extension, make sure to check whether any of the simpler alternatives would suffice for your needs.

Warning
This extension is only intended for use in personal or corporate wikis deployed in a high-trust environment. The aim of this extension is to facilitate the retrieval of files from external sources, including the local filesystem. If used in an inappropriate setting, this extension would pose a severe security breach since all users with write access to the wiki would be capable of accessing any file on the local filesystem (including those exposed through network shares) and injecting malicious content.

Usage
The  parser function fetches the file from the specified path or URI, and returns its file name as stored in the repository. Thus, it could easily be used to combine a web address with image syntax; for example:

The same syntax may be used with local paths:

Either of the above would give a result similar to:

The same syntax may also be used for image galleries, using the  parser extension tag:  File:|Image fetched from web address. File:|Image fetched from local path.

Download instructions
Please download the archive containing the PHP source files from the link given in the infobox above, and extract it into the  directory. (Note:  stands for the root directory of your MediaWiki installation, the same directory that holds  .)

Installation
To install this extension, add the following to :

Documentation
Most of the extension is implemented through the following classes (with indentation indicating subclassing):


 * : The main file fetcher class, instantiated using the singleton pattern. Responsible for fetching a file from its source using the appropriate  subclass instance. In order to support the fetching of files from diverse sources using specific protocols (wrappers), all the functionality required for handling source files has been delegated to the   class (as implemented in its protocol-specific subclasses). Similarly, in order to support the storing of files in diverse repositories, all the functionality required for handling target files has been delegated to the   class (as implemented in its repository-specific subclasses).


 * : Base class defining methods and members for handling the fetching of files from sources using a specific protocol (wrapper).
 * : Base class defining methods and members for handling the fetching of files from file-based sources using the PHP in-built file functions.
 * : Source handler responsible for the fetching of files from the local filesystem.
 * : Source handler responsible for the fetching of files over the HTTP or HTTPS protocol.


 * : Base class defining methods and members for handling the storing of files into the associated file repository.
 * : Base class defining methods and members for handling the storing of files into a foreign file repository registered with the MediaWiki core through the  array.
 * : File repository handler for storing files through a  (filesystem foreign repository) instance initialized by the MediaWiki core.

Both source handlers and file repository handlers may be configured by modifying the values of their public class members. Source handlers are instantiated using the singleton pattern; the singleton instance of each concrete source handler class (namely,  and  ) may be accessed through its   static method. Each source handler has an associated file repository handler, which may be accessed through the former's  member.

Source handlers
The main configuration members of the source handlers are the following:


 * : The file repository handler associated with this source handler. It is allowed for multiple source handler instances to share the same file repository handler instance; in such cases, the settings for the latter would also be shared.


 * : The maximum size of a fetched file, in bytes. Files exceeding this size will not be fetched, and will throw an exception. Set to a negative value to refrain from enforcing a maximum size (not recommended).


 * : Whether to tolerate unknown file sizes. A file size may be unknown either because the source does not support reporting it, or because an error was encountered whilst the source handler was determining it. If set to, the file should be fetched anyway if the file size is unknown. If set to  , the file should not be fetched if the file size is unknown.


 * : Whether to handle any path which is not recognized as a URI. Set to  to handle non-URI file path representations, such as " " on Windows and " " on Linux. Set to   to refrain from handling any path which is not recognized as a URI. Valid URLs belonging to the file URI scheme (starting with " ") are always handled in either case.

For example:

File repository handlers
The target file name for a file to be saved in the associated repository of a file repository handler is constructed from the following components:
 * the source file name (excluding extension)
 * a space character, ' ', if both the source file name and the generated hash are non-empty
 * the generated hash
 * a dot character, ' ', if the source file extension is non-empty
 * the source file extension

The  class defines three length preservation members affecting the construction of target file names:


 * : The maximum length of the source file name (excluding extension) to be preserved when constructing the target file name.
 * : The maximum length of the source file extension to be preserved when constructing the target file name.
 * : The maximum length of the generated hash to be preserved when constructing the target file name. If set to, the hash is not generated at all.

These members determine how their associated components are used in the construction of the target file name, in the following manner:
 * Set to a positive value n to preserve the first n characters from the component.
 * Set to  to preserve nothing of the component.
 * Set to a negative value to preserve the full component (not recommended for arbitrary-length components).

If the constructed target file name is empty, an exception would be thrown. This would happen if, for example, all the component length preservation members are set to.

The other configuration members of the file repository handlers are the following:


 * : Whether to read the contents of the source file into a string in memory before storing it into the file repository. If set to, the file would be read into a string in memory before being saved to the file repository. If set to  , the file would be copied directly to the file repository.


 * : Whether to generate the hash from the actual contents of the source file, rather than from its path or URI. Set to  to generate the hash from the actual contents of the source file. Set to   to generate the hash from the file path or URI of the source file. This setting has no effect if   is set to.


 * : Whether to introduce parenthetical marks, " ... ", around the generated hash (if non-empty) when constructing the target file name.


 * : The foreign repository structure (array) from which the  instance associated with this handler is initialized by the MediaWiki core. One may change its element values or replace it altogether; however, it should never be set to   after being registered with the MediaWiki core   array.

Provided that  is not set to , hash generation works as follows:
 * If  is set to , then the hash is generated from the source file path or URI (i.e. from the string containing the path or URI itself), without paying any consideration to the file contents.
 * If  is set to , then the hash is generated from the file contents, in the following manner:
 * If  is set to , then the hash is generated from the file contents as formerly read to a string in memory.
 * If  is set to , then the hash is generated from the file contents as read during the hash operation itself.

For example:

Simpler alternatives
In one simply wishes to allow inline images hosted externally (hotlinking) without requiring any image syntax functionality, simply set  to. For images from the local filesystem, one could use the file URI scheme after enabling it through. (Note that some browsers, including Mozilla Firefox, will not follow file URLs on pages that have been loaded via HTTP as a security measure. For details and workarounds, see the MozillaZine article "Links to local pages do not work".)

MediaWiki does offer some support for manipulating external media with standard image syntax functionality if one registers the foreign repository with the  array. For the local filesystem, the relevant core repository class would be the.

has two limitations (which we aimed to address through our extension):
 * All files must reside in a single directory. Subdirectories are only permitted for dividing files according to the hash level.
 * All files must be accessible through a URL.

Finally, if one's aim is simply to automate the upload of large quantities of files, then this extension should not be used. The file repository handler implemented in this extension does not register any of the fetched files with the MediaWiki local repository, losing out on the benefits that such integration would offer. This was a design decision, since we wanted a solution which kept the files as separate from MediaWiki – in fact, the file fetching process does not make any changes to the wiki's database, and the directory in which the fetched files are stored may be safely deleted at any time without breaking the system. (In the case of deletion, the files would be fetched again the next time one of the pages in which they appear is visited.)

Bugs and limitations

 * Unicode is not supported yet.