Extension:Memento

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
Memento

Release status: experimental

Implementation Data extraction, User interface
Description Performs content negotiation in the DateTime dimension.
Author(s) Harihar Shankar and Robert Sanderson
Last version 0.5
MediaWiki 1.6.0+
License GPL
Download No link
Hooks used
BeforePageDisplay

Check usage (experimental)

Contents

[edit] What can this extension do?

The Memento extension implements support of the Accept-Datetime HTTP header to perform content negotiation in the date-time dimension, built on the principles of RFC 2295 [1]. This enables MediaWiki to be used as a web archive.[2]

The extension works in two simple steps:

  • The Memento Plugin will return the URL to the TimeGate for the article requested.
  • Upon navigating to the TimeGate, it redirects the client to the version of the requested resource that was the live version at the date-time expressed as the value of the Accept-Datetime header.


This plug-in uses the same handlers that MediaWiki does to connect to the database and hence all the existing database permissions and page access permissions are honored. This plug-in only uses a 'DB_SLAVE' database connection, which means that the database connection can only read from the tables. Hence, this plug-in makes no changes to the database.

[edit] Download instructions

  • Unzip the file memento.zip to your MediaWiki extensions folder.

[edit] Installation

To install this extension, add the following to LocalSettings.php:

    require_once "$IP/extensions/memento/memento.php";

Note: $IP stands for the root directory of your MediaWiki installation, the same directory that holds LocalSettings.php.

[edit] Configuration parameters

$wgMementoConfigDeleted = true/false; # Toggles the feature to do datetime content negotiation for deleted pages.

[edit] Short URL

This plugin works with Short URLs only. If your server does not support Short URL, add these lines to your LocalSettings.php file.

$wgArticlePath = "$wgScriptPath/index.php/$1";
$wgUsePathInfo = true;

[edit] Server Setup

In addition to the default MediaWiki installation, this plugin will also work in a setup with URL rewriting. This plugin will also work with wikis in a proxy setup.

[edit] TimeGates, TimeBundles/TimeMaps and their Workings

The earlier implementations of this plugin combined the functionality of a TimeGate and the Memento Plugin into one single extension. This implementation however, implements a separate TimeGate. A TimeGate is a resource that enables transparent datetime content negotiation. The TimeMap exposes all the available mementos for a resource enabling cross-archive services and provides finer datetime granularity.

The current implementation has three separate plugins:

  1. Memento Plugin: When visiting any article in your wiki, this client will return the URL of the article's TimeGate in the Link header. This URL can be used to retrieve the mementos of the article.
  2. TimeGate: The TimeGate parses the datetime in the Accept-Datetime header, performs the content negotiations and redirects the client to the appropriate version that was live during the datetime expressed in the header.
  3. TimeBundle/TimeMap: The TimeMap retrieves the mementos of the article, and serializes it in text/csv.

The TimeGate can be accessed directly by http://your.wikiserver.here/index.php/Special:TimeGate. The memento plugin returns the URL to the TimeGate in the Link header using the following URL format: http://your.wikiserver.here/index.php/Special:TimeGate/http://your.wikiserver.here/index.php/Title

The Memento page returns the URL to the TimeMap in the Link header. The URL format is: http://your.wikiserver.here/index.php/Special:TimeMap/http://your.wikiserver.here/index.php/Title

[edit] Usage

Once installed, the extension can be tested and used in two ways:

  1. Using a Firefox browser: Install the Modify Headers Firefox extension. Then set the Accept-Datetime header from the Tools/Modify Headers menu option. The syntax to use is Accept-Datetime: Sat, 03 Oct 2009 10:00:00 GMT. Set it to a date-time at which your wiki was already generating history pages. Then enter a URL of a page from your wiki that has associated history pages around the date-time you chose. Using the Live HTTP Headers Firefox Extension, the request and response headers involved in this transaction can be seen. The URL to the TimeGate can be obtained from the <Link> header, using this extension. This URL can be used to navigate to the TimeGate. Detailed explanation of the headers are given below.
  2. Using the UNIX command line tool curl:
           curl -o null.html -D headers.txt -H 
                "Accept-Datetime: Sat, 03 Oct 2009 10:00:00 GMT" 
                http://your.wikiserver.here/your-title-here

And then look in headers.txt to retrieve the Link header. The URL we are interested in will have rel="timegate" in the Link header as shown below.

Link: <http://your.wikiserver.here/wiki/index.php/Special:TimeGate/http://your.wikiserver.here/wiki/index.php/Title>; rel="timegate"

This URL in the Link header is the URL to your TimeGate and the original URL is passed as a parameter to this TimeGate. Repeating the curl command with this new URL in the Link header, the headers.txt file will look similar to:

Link:
<http:///your.wikiserver.here/wiki?title=index.php&oldid=3122>; rel="last-memento"; datetime="Wed, 20 Jan 10 22:00:20 +0000",
<http:///your.wikiserver.here/wiki?title=index.php&oldid=60>; rel="first-memento"; datetime="Thu, 10 Dec 09 18:18:07 +0000",
<http:///your.wikiserver.here/wiki/index.php>; rel="original"
Location: http://your.wikiserver.here/wiki/index.php?title=index.php&oldid=505
Vary: negotiate,datetime

The TimeGate performs datetime content negotiation and returns the URI to the memento in the Location header. Notice that the URLs of the oldest and the most recent versions of this resource is also returned in the Link header. The URL of the original resource is returned in the Link header. Again, following the Location URL again, using the curl command returns the following result:

Link: 
<http://your.wikiserver.here/wiki/index.php?title=index.php&oldid=3122>; rel="last-memento"; datetime="Wed, 20 Jan 10 22:00:20 +0000",
<http://your.wikiserver.here/wiki/index.php?title=index.php&oldid=60>; rel="first-memento"; datetime="Thu, 10 Dec 09 18:18:07 +0000",
<http://your.wikiserver.here/wiki/index.php?title=index.php&oldid=504>; rel="prev-memento"; datetime="Wed, 16 Dec 09 17:00:12 +0000", 
<http://your.wikiserver.here/wiki/index.php?title=index.php&oldid=509>; rel="next-memento"; datetime="Wed, 16 Dec 09 19:00:13 +0000",
<http://your.wikiserver.here/wiki/index.php/Special:TimeMap/http://your.wikiserver.here/wiki/title>; rel="timemap",
<http://your.wikiserver.here/wiki/index.php/index.php>; rel="original"
Memento-Datetime: Wed, 16 Dec 09 18:00:09 +0000

The Memento-Datetime header returns the datetime of the content that was returned. The Link header with rel="timemap" returns the URL of the TimeMap for this resource.

In essence, it is important to note that the memento client returns the URL of the TimeGate to the original resource and the TimeGate performs the content negotiation and redirects us to a memento. The Link and the Memento-Datetime headers are returned to inform us that the returned resource is indeed a memento.

[edit] Namespaces

The extension renders the requested page the same way MediaWiki does. It queries the wiki database table page with the requested title. Both MediaWiki reserved namespaces and custom namespaces are accounted for, by retrieving the namespace_id from the object $wgTitle. If the namespace does not exist, then the plug-in treats the namespace also as part of the title. For example, if the requested title is 'Memento:Main_Page', the plugin will first check if a namespace exist for "Memento" and retrieve it's corresponding namespace_id. It will then query the page table for the title 'Main_Page' with the namespace_id. Otherwise, it will treat "Memento" also as part of the title and search the page table for the title 'Memento:Main_Page'. If the title could not be found in the database, then an HTTP/1.1 404 Not Found is returned.

[edit] Templates

Mediawiki by default, retrieves the most recent version of a template when transcluded in an article. This extension cannot perform datetime content negotiations on transcluded templates. However, we have written a quick fix that would perform this operation by adding the following code to the file Parser.php in the directory path/to/wiki/includes/parser/.

            # querying the db to get the rev_id for the template. 
                foreach($_SERVER as $key => $value) {
                    //checking for the occurance of the accept datetime header.
                    if( strcasecmp($key, 'HTTP_ACCEPT_DATETIME') == 0 ) {
                        $req_dt = $_SERVER["$key"]; 
                        $dt = strtotime($_SERVER["$key"]);
                        $dt = date( 'YmdHis', $dt );
                        $pg_id = $title->getArticleID();
 
                        $dbr = wfGetDB( DB_SLAVE );
                        $dbr->begin();
 
                        $tbl_rev = $dbr->tableName( 'revision' );
                        $res = $dbr->query( "SELECT DISTINCTROW rev_id FROM $tbl_rev 
                                             WHERE rev_page = $pg_id 
                                             AND rev_timestamp <= $dt 
                                             ORDER BY rev_id DESC 
                                             LIMIT 0,1" 
                                          );
                        if( $res ) {
                            $row = $dbr->fetchObject( $res );
                            $id = $row->rev_id;
                        }
                    }
                }

Paste the code above in the function statelessFetchTemplate(...), immediately after the variable

$id = false;

is declared. This code will fetch the revision_id for the template for the datetime requested and direct mediawiki to fetch that rev_id instead of fetching the latest version of the template using the title. This code's been written for mediawiki version 1.8+.

[edit] Caching

Mediawiki by default searches it's cache for templates using the title and retrieves the most recent version. For best result, it is recommended that the caching is disabled for templates so that mediawiki always queries the database for the revision. this can be done by either commenting the respective lines in the function getTemplateDom in Parser.php or write a simple code to skip the caching part if the Accept-Datetime header is detected.

[edit] Special Pages

Special pages under the URL http://your.wikiserver.here/index.php/Special:SpecialPages do not have a history, i.e. there are no revisions to these pages. Hence, the Memento extension will return an HTTP/1.1 406 Not Acceptable.

[edit] Deleted Contributions

To do date-time negotiations for the deleted revisions in MediaWiki, most installations require "Administrator" privileges. Even with administrative access, MediaWiki can only show the revisions in "Edit" mode.

To enable this feature, set the configuration variable $wgMementoConfigDeleted to true.

[edit] Timestamps

This extension searches for and retrieves the revisions for an article using the timestamp of when the revision was generated. Timestamps are not unique identifiers and it is possible that an article will have more than one revision at the same given time. This extension handles this situation by returning an HTTP/1.1 300 Multiple Choices, with the list of URIs which were created at the same time. MediaWiki does not resolve deleted revisions using revision ids, but use timestamps instead, in their URIs. Hence, we could not come up with a way to resolve a situation when more than one deleted revision has the same timestamp.

[edit] Wikis with Memento Plug-in Installed

JISC Dev8D

W3C Wiki

DCMI Wiki

[edit] See also

  1. RFC 2295. http://www.ietf.org/rfc/rfc2295.txt
  2. Van de Sompel et al.: Memento: Time Travel for the Web. http://arxiv.org/pdf/0911.1112v2 (preprint)
Personal tools
Namespaces
Variants
Actions
Site
Support
Download
Development
Communication
Print/export
Toolbox