Extension:Memento

What can this extension do?
The Memento extension implements support of the Accept-Datetime HTTP header to perform content negotiation in the date-time dimension, built on the principles of RFC 2295. This enables MediaWiki to be used as a web archive.

The extension works in two simple steps:
 * The Memento Plugin will return the URL to the TimeGate for the article requested.
 * Upon navigating to the TimeGate, it redirects the client to the version of the requested resource that was the live version at the date-time expressed as the value of the Accept-Datetime header.

This plug-in uses the same handlers that MediaWiki does to connect to the database and hence all the existing database permissions and page access permissions are honored. This plug-in only uses a 'DB_SLAVE' database connection, which means that the database connection can only read from the tables. Hence, this plug-in makes no changes to the database.

Download instructions
Please cut and paste the code in. Copy paste the code with the same file name as above in Copy paste the code with the same file name as above in  Note: $IP stands for the root directory of your MediaWiki installation, the same directory that holds LocalSettings.php.
 * The code for the Memento Plugin can be found at memento.php.
 * There are four files for the TimeGate.
 * timegate.php
 * timegate_body.php
 * timegate.alias.php
 * timegate.i18n.php
 * To generate the TimeMaps, use the following 4 files.
 * timemap.php
 * timemap_body.php
 * timemap.alias.php
 * timemap.i18n.php

Installation
To install this extension, add the following to LocalSettings.php:

TimeGates, TimeMaps and their Workings
The earlier implementations of this plugin combined the functionality of a TimeGate and the Memento Plugin into one single extension. This implementation however, implements a separate TimeGate. A TimeGate is a resource that enables transparent datetime content negotiation. The TimeMap exposes all the available mementos for a resource enabling cross-archive services and provides finer datetime granularity.

The current implementation has three separate plugins:
 * 1) Memento Plugin: When visiting any article in your wiki, this client will return the URL of the article's TimeGate in the Link header. This URL can be used to retrieve the mementos of the article.
 * 2) TimeGate: The TimeGate parses the datetime in the Accept-Datetime header, performs the content negotiations and redirects the client to the appropriate version that was live during the datetime expressed in the header.
 * 3) TimeMap: The TimeMap retrieves the mementos of the article, and serializes it in XML.

The TimeGate can be accessed directly by http://your.wikiserver.here/index.php/Special:TimeGate. The memento plugin returns the URL to the TimeGate in the Link header using the following URL format: http://your.wikiserver.here/index.php/Special:TimeGate/http://your.wikiserver.here/index.php/Title The Memento page returns the URL to the TimeMap in the Link header. The URL format is: http://your.wikiserver.here/index.php/Special:TimeMap/http://your.wikiserver.here/index.php/Title

Usage
Once installed, the extension can be tested and used in two ways:

And then look in headers.txt to retrieve the Link header. The URL we are interested in will have rel="timegate" in the Link</tt> header as shown below.
 * 1) Using a Firefox browser: Install the Modify Headers Firefox extension. Then set the Accept-Datetime</tt> header from the Tools/Modify Headers menu option. The syntax to use is Accept-Datetime: Sat, 03 Oct 2009 10:00:00 GMT</tt>. Set it to a date-time at which your wiki was already generating history pages. Then enter a URL of a page from your wiki that has associated history pages around the date-time you chose. Using the Live HTTP Headers Firefox Extension, the request and response headers involved in this transaction can be seen. The URL to the TimeGate can be obtained from the <Link> header, using this extension. This URL can be used to navigate to the TimeGate. Detailed explanation of the headers are given below.
 * 2) Using the UNIX command line tool curl</tt>:

This URL in the Link</tt> header is the URL to your TimeGate and the original URL is passed as a parameter to this TimeGate. Repeating the curl</tt> command with this new URL in the Link</tt> header, the headers.txt file will look similar to:

The TimeGate performs datetime content negotiation and returns the URI to the memento in the Location header. Notice that the URLs of the oldest and the most recent versions of this resource is also returned in the Link</tt> header. The URL of the original resource is returned in the Link</tt> header. Again, following the Location URL again, using the curl</tt> command returns the following result:

The Content-Datetime</tt> header returns the datetime of the content that was returned. The Link header with rel="timemap"</tt> returns the URL of the TimeMap for this resource.

In essence, it is important to note that the memento client returns the URL of the TimeGate to the original resource and the TimeGate performs the content negotiation and redirects us to a memento. The Link</tt> and the Content-Datetime</tt> headers are returned to inform us that the returned resource is indeed a memento.

Namespaces
The extension renders the requested page the same way MediaWiki does. It queries the wiki database table page</tt> with the requested title. Both MediaWiki reserved namespaces and custom namespaces are accounted for, by retrieving the namespace_id</tt> from the object <tt>$wgTitle</tt>. If the namespace does not exist, then the plug-in treats the namespace also as part of the title. For example, if the requested title is 'Memento:Main_Page', the plugin will first check if a namespace exist for "Memento" and retrieve it's corresponding <tt>namespace_id</tt>. It will then query the <tt>page</tt> table for the title 'Main_Page' with the <tt>namespace_id</tt>. Otherwise, it will treat "Memento" also as part of the title and search the <tt>page</tt> table for the title 'Memento:Main_Page'. If the title could not be found in the database, then an <tt>HTTP/1.1 404 Not Found</tt> is returned.

Templates
Mediawiki by default, retrieves the most recent version of a template when transcluded in an article. This extension cannot perform datetime content negotiations on transcluded templates. However, we have written a quick fix that would perform this operation by adding the following code to the file <tt>Parser.php</tt> in the directory <tt>path/to/wiki/includes/parser/</tt>. Paste the code above in the function <tt>statelessFetchTemplate(...)</tt>, immediately after the variable  is declared. This code will fetch the revision_id for the template for the datetime requested and direct mediawiki to fetch that <tt>rev_id</tt> instead of fetching the latest version of the template using the title. This code's been written for mediawiki version 1.8+.

Caching
Mediawiki by default searches it's cache for templates using the title and retrieves the most recent version. For best result, it is recommended that the caching is disabled for templates so that mediawiki always queries the database for the revision. this can be done by either commenting the respective lines in the function <tt>getTemplateDom</tt> in Parser.php or write a simple code to skip the caching part if the <tt>Accept-Datetime</tt> header is detected.

Special Pages
Special pages under the URL http://your.wikiserver.here/index.php/Special:SpecialPages do not have a history, i.e. there are no revisions to these pages. Hence, the Memento extension will return an <tt>HTTP/1.1 406 Not Acceptable</tt>.

Deleted Contributions
To do date-time negotiations for the deleted revisions in MediaWiki, most installations require "Administrator" privileges. Even with administrative access, MediaWiki can only show the revisions in "Edit" mode.

To enable this feature, set the configuration variable <tt>$wgMementoConfigDeleted</tt> to <tt>true</tt>.

Timestamps
This extension searches for and retrieves the revisions for an article using the timestamp of when the revision was generated. Timestamps are not unique identifiers and it is possible that an article will have more than one revision at the same given time. This extension handles this situation by returning an <tt>HTTP/1.1 300 Multiple Choices</tt>, with the list of URIs which were created at the same time. MediaWiki does not resolve deleted revisions using revision ids, but use timestamps instead, in their URIs. Hence, we could not come up with a way to resolve a situation when more than one deleted revision has the same timestamp.