InstantCommons
From MediaWiki.org
These are draft development specifications, not documentation. This feature does not exist yet.
- Actually it does. It's called $wgForeignFileRepos. All information on this page is old stuff and unlikely to be correct.
InstantCommons is a proposed feature for MediaWiki to allow the usage of any uploaded media file from the Wikimedia Commons in any MediaWiki installation world-wide. InstantCommons-enabled wikis would cache Commons content so that it would only be downloaded once, and subsequent pageviews would load the locally existing copy.
Contents |
[edit] Rationale
As of February 2006, the Wikimedia Commons, a central media archive operated by the Wikimedia Foundation, contains about 450,000 files uploaded by nearly 30,000 registered users. Each of these files is available under a free content license or in the public domain; there are no restrictions of use beyond those relating to use of official insignia. Licenses which limit commercial use are considered non-free.
As awareness of the Commons grows, so does the desire of external parties to use content included therein, and to contribute new material. It is currently technically possible to load images directly from Wikimedia's servers in the context of any webpage. This is bad for multiple reasons:
- It does not respect the license terms of the image, and does not allow for other metadata to be reliably transported
- It does not give credit to Wikimedia
- It consumes Wikimedia bandwidth on every pageview (unless the image has been cached on the client side or through a proxy)
- It does not facilitate useful image operations such as thumbnail generation and captioning and is difficult to use in the context of a wiki, particularly for standard layout operations
- It is tied to URLs as resource identifiers, which complicates mirroring
- It creates an untrackable external usage web, where any change on Wikimedia's side necessarily affects these external users
- It does not permit offline viewing, which is crucial in countries which have only intermittent network access.
The InstantCommons proposal seeks to address all this by providing an easy method for cached loading of images and metadata from Wikimedia's servers. The first implementation of InstantCommons will be within MediaWiki, allowing for all MediaWiki image operations (thumbnailing, captioning, galleries, etc.) to be performed transparently. However, other wiki engines can implement InstantCommons-like functionality using the API operations described below.
[edit] Basic feature set
During the installation, the site administrator can choose whether to enable InstantCommons. This could be tied to the wiki being under a free content license (see #Scalability considerations). Ideally, however, the feature should be enabled by default (provided a writable upload directory is specified) to allow the largest possible number of users to use Wikimedia Commons content.
If the feature is enabled, the wiki would behave like a Wikimedia project, that is, if an image or other media file is referred to which exists on Commons, it can be included in a wiki page like a locally uploaded file by specifying its name. Local filenames take precedence over Commons filenames.
While the Wikimedia Commons would be the default repository for images, the implementation would not be repository-specific. Instead, it would be an extension to the existing shared image repository functionality in MediaWiki (used by Commons), which currently only allows filesystem-based usage of an external image repository (though image description pages are already fetched via HTTP). A single Boolean parameter ($wgUseInstantCommons) should be sufficient to enable or disable access to Wikimedia Commons, while access to a different repository would require more configuration.
[edit] Implementation details
Needs to be updated with the ForeignAPIRepo/ForeignAPIFile specs.
[edit] Scalability considerations
Because the InstantCommons feature would allow a wiki user to download resources from the Wikimedia servers, it is crucial that there is no possibility of a Denial of Service attack against either the using wiki, or the Wikimedia Commons, for example, by pasting 30K of links to the largest files on Wikimedia Commons into a wiki page and pressing "preview".
Therefore, every successful InstantCommons request will have to be logged by the InstantCommons-enabled wiki together with the originating user or IP address and the time of the request. If an individual user overrides a generous internal bandwidth limitation (could be as high as 1 GB by default, but should be user-configurable), future images will not be downloaded within a 24 hour period. This limitation should not exist for wiki administrators (if a wiki admin wants to conduct a denial of service attack against his own wiki, they do not need to be stopped from doing so; if they want to conduct an attack against Wikimedia, they cannot be stopped from doing so except on Wikimedia's end).
In addition to the per-user bandwidth limit, there could be a limit on the size of files which should be downloaded transparently. This would primarily be because files above a certain size would delay pageviews significantly and might even cause the page request to time out. It might be desirable to use an external application for the purpose of downloading these files, so that it can be done in the background without causing the page request to continue. Finally, there could be a total maximum size for the InstantCommons cache; if this size is exceeded, no further files would be downloaded.
While it is unlikely that individual wikis using the InstantCommons feature would cause a significant increase in cost for the Wikimedia Foundation (since every file only has to be downloaded once, and there are per-user bandwidth limitations), it would nevertheless be fair and reasonable for projects using the feature to include a notice on InstantCommons description pages such as: "This file comes from Wikimedia Commons, a media archive hosted by the Wikimedia Foundation. If you would like to support the Wikimedia Foundation, you can donate here ..."
[edit] Future potential
In the future, it may be desirable to offer a publisher/subscribe model of changes, which will require wiki-to-wiki authentication and a database of images which are used in subscribing wikis. This would also open up the threat of cross-wiki vandalism, which could be addressed using a delay phase of 24 hours or more for changes to take effect.
Two-way functionality is another possibility, that is, to allow uploading free media directly to Commons from any wiki installation. However, this will require federated authentication as a minimum. It may also necessitate cross-wiki communication facilities to notify users from other wikis about Commons policies, which could be part of a larger project like LiquidThreads.
Finally, the biggest challenge for making Commons content available is making it searchable throughout all languages -- new approaches such as meaning-based tagging will be necessary to accomplish this. This functionality will hopefully be enabled by the OmegaWiki project; see a simple demonstration of the concept.
Similar functionality to InstantCommons could be offered for extensions ― if extensions like WikiTeX are run within a secure environment on the Wikimedia servers, access to them could be provided to any free content wiki. The benefit for Wikimedia would be that the generated data could be stored on Wikimedia's servers as well, and potentially useful content could be reviewed and added to the Wikimedia projects. (A subscriber database would again be useful to record the source and context of use, perhaps even allowing for a browsable library of recently generated extension-derived content on outside wikis.)