InstantCommons

From mediawiki.org
Revision as of 22:01, 14 February 2006 by Eloquence (talk | contribs)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

These are draft development specifications, not documentation. This feature does not exist yet.

Enter the name of an image from Commons on any MediaWiki installation ..
... and the image is fetched from Commons and embedded into the page.

InstantCommons is a proposed feature for MediaWiki to allow the usage of any uploaded media file from the Wikimedia Commons in any MediaWiki installation world-wide. InstantCommons-enabled wikis would cache Commons content so that it would only be downloaded once, and subsequent pageviews would load the locally existing copy.

Basic feature set

During the installation, the administrator can choose whether to enable InstantCommons. This could be tied to the wiki being under a free content license (see #Scalability considerations). Ideally, however, the feature should be enabled by default to allow the largest possible number of users to use Wikimedia Commons content.

If the feature is enabled, the wiki would behave like a Wikimedia project, that is, if an image or other media file is referred to which exists on Commons, it can be included in a wiki page like a locally uploaded file by specifying its name. Local filenames take precedence over Commons filenames.

While the Wikimedia Commons would be the default repository for images, the implementation would not be repository-specific. Instead, it would be an extension to the existing shared image repository functionality in MediaWiki (used by Commons), which currently only allows filesystem-based usage of an external image repository (though image description pages are already fetched via HTTP). A single Boolean parameter ($wgUseInstantCommons) should be sufficient to enable or disable access to Wikimedia Commons, while access to a different repository would require more configuration.

Implementation details

When a filename from Commons (or another repository) which does not exist locally is entered into the wiki and the page is parsed, the wiki sends an XML-RPC [1] request to the repository to ask whether a file with this exact name exists and what its size is. If the file exists, a response containing the file size and URL is returned. Multiple requests should be aggregated into one (using multiple methodCall and methodResponse elements).

Request example:

<?xml version="1.0"?>
<methodCall>
  <methodName>files.getInformation</methodName>
  <params>
    <param>
      <value><string>Karachi - Market.jpg</string></value>
    </param>
  </params>
</methodCall>

Response example:

<?xml version="1.0"?>
<methodResponse>
  <methodName>files.getInformation</methodName>
  <params>
    <param>
      <value>
        <struct>
          <member>
            <name>fileSize</name>
            <value><i4>169885</i4></value>
          </member>
          <member>
            <name>fileURL</name>
              <value><string>
                http://upload.wikimedia.org/wikipedia/commons/c/c5/Karachi_-_Pakistan-market.jpg
              </string></value>
          </member>
      </value>
    </param>
  </params>
</methodResponse>

If the file does not exist, an XML-RPC <fault> structure should be used to describe the cause of the problem. In the first implementation, only one error code will be supported, which will indicate that the file does not exist. The request should be handled by a new special page, Special:API, which could be extended later to provide other functionality.

If the file exists, the wiki uses an HTTP request to download it (using the PHP fopen method). If the download is successful, the file will be treated as a local upload, that is, its metadata will be processed and added to the IMAGE table (which will require a new flag indicating that the file is from a remote server), and a copy of the file will be placed in the local directory structure. However, the file will not have an associated history or image description page (a new flag may have to be added to the IMAGE table to distinguish local uploads from InstantCommons content, or the img_user field could be set to 0 for these files). The img_timestamp field will record the time of the import.

As a consequence, future pageviews of the same page will no longer have to query Commons, as the file will now exist as a local copy.

The description page will use the existing functionality to load metadata from Commons using interwiki transclusion; however, a new caching table will be created and used in order to store and retrieve the returned HTML description once it has been first downloaded (this will reduce automated queries to Commons, especially given that we cannot rely on InstantCommons wikis doing proper search engine exclusion).

There will be no mechanism to automatically push changes to files or description pages from Commons to InstantCommons-enabled wikis. Instead, each file that has once been loaded from InstantCommons will have on its description page a link available to logged in users to re-fetch the locally cached data (image and description page).

Media links and non-embedded filetypes

In addition to images, MediaWiki also supports other file types. These can be linked to using [[Media:]] links, which result in a direct URL pointing to the file on the server, or using [[Image:]] links (renamed to File: in MediaWiki 1.6) where the file description page will show a link to the uploaded file instead of embedding the image. These links and description pages will be treated no differently from embedded images: When a media link is found in a page, or a file description page is viewed, and the file does not exist locally, an inquiry is sent to Commons, and if the file exists there, a local copy is made.

Scalability considerations

Because the InstantCommons feature would allow a wiki user to download resources from the Wikimedia servers, it is crucial that there is no possibility of a Denial of Service attack against either the using wiki, or the Wikimedia Commons, for example, by pasting 30K of links to the largest files on Wikimedia Commons into a wiki page and pressing "preview".

Therefore, every successful InstantCommons request will have to be logged by the InstantCommons-enabled wiki together with the originating user or IP address and the time of the request. If an individual user overrides a generous internal bandwidth limitation (could be as high as 1 GB by default, but should be user-configurable), future images will not be downloaded within a 24 hour period. This limitation should not exist for administrators (if a wiki admin wants to conduct a denial of service attack against his own wiki, they do not need to be stopped from doing so; if they want to conduct an attack against Wikimedia, they cannot be stopped from doing so except on Wikimedia's end).

Future potential

In the future, it may be desirable to offer a publisher/subscribe model of changes, which will require wiki-to-wiki authentication and a database of images which are used in subscribing wikis. This would also open up the threat of cross-wiki vandalism, which could be addressed using a delay phase of 24 hours or more for changes to take effect.

Two-way functionality is another possibility, that is, to allow uploading free media directly to Commons from any wiki installation. However, this will require federated authentication as a minimum. It may also necessitate cross-wiki communication facilities to notify users from other wikis about Commons policies, which could be part of a larger project like LiquidThreads.

Similar functionality like InstantCommons could be offered for extensions -- if extensions like WikiTeX are run within a secure environment on the Wikimedia servers, access to them could be provided to any free content wiki. The benefit for Wikimedia would be that the generated data could be stored on Wikimedia's servers as well, and potentially useful content could be reviewed and added to the Wikimedia projects. (A subscriber database would again be useful to record the source and context of use, perhaps even allowing for a browsable library of recently generated extension-derived content on outside wikis.)