Extension:Memento

From mediawiki.org
MediaWiki extensions manual
Memento
Release status: beta
Implementation Data extraction, User interface
Description Performs content negotiation in the DateTime dimension.
Author(s) Harihar Shankar and Robert Sanderson (hariharshankartalk)
Latest version 1.0 (05/31/2012)
MediaWiki 1.19.0+
PHP 5.3.2+
License GPL
Download No link
Help Help:Extension:Memento
Example Dublin Core Wiki
Quarterly downloads 0

What can this extension do?

The idea of the Memento extension is it to make it as straightforward to access articles of the past as it is to access their current version.

The Memento framework allows you to see versions of articles as they existed at some date in the past. All you need to do is enter a URL of an article in your browser and specify the desired date in a browser plug-in. This way you can browse the Web of the past. What the Memento extension will present to you is a version of the article as it existed on or very close to the selected date. Obviously, this will only work if previous (archived) versions are available on the Web. Fortunately, MediaWiki is a Content Management System which implies that it maintains all revisions made to an article. This extension leverages this archiving functionality and provides native Memento support for MediaWiki.

Use Cases

Users often wish to see versions of resources both before and after certain events, for example, the page about Michael Jackson both before and after his death, or follow the evolution of the description of the TSA's approach to air travel security since 2001.

Additionally, editors can benefit from Memento access to see where the hot spots of activity are, and the differences before and after editing wars.

By exposing Memento TimeGates and giving HTTP access to versions, MediaWiki allows for easy access to software agents to perform time series analysis of its resources, either by extracting information from the article (text mining, data extraction, etc) or from the upcoming data platform Wikidata. As the information may change many times, allowing fine grained access is extremely valuable compared to the DBPedia implementation[1].

How Memento Works

This extension accesses older articles by implementing support for the Accept-Datetime HTTP header to perform content negotiation in the date-time dimension, built on the principles of RFC 2295 [2]. This enables MediaWiki to be used as a web archive[3]. The IETF Memento Internet Draft provides complete technical information about the Memento framework.

The extension works in two simple steps:

  • When an article is requested by a client, this extension will return the URL to a TimeGate in the HTTP Link header.
  • Upon navigating to the TimeGate, it redirects the client to the version of the requested resource that was the live version at the date-time expressed.

The date-time should be expressed as a value of the Accept-Datetime HTTP header. The MementoFox Firefox Extension can be used to set this date-time value and browse the past web, thus making your web browser a time machine.

This plug-in uses the same handlers as MediaWiki to connect to the database. Hence all the existing database permissions and page access permissions are honored. This plug-in only uses a 'DB_SLAVE' database connection, which means that the database connection can only read from the tables. Therefore, this plug-in makes no changes to the data in the wiki.

Download instructions

  • Unzip the file memento.zip to your MediaWiki extensions folder.

Installation

To install this extension, add the following to LocalSettings.php:

    require_once "$IP/extensions/memento/memento.php";

Note: $IP stands for the root directory of your MediaWiki installation, the same directory that holds LocalSettings.php.

Configuration parameters

Optional: Use this variable to indicate the number of mementos should be returned for an article by the TimeMap. The default value is 500.

Note: This operation is data intensive. Hence, for wikis that may have lots of revisions for an article, it is not ideal to set this number too high.

$wgTimemapNumberOfMementos = 500;

Optional: Set this variable to true if you want memento functionality for deleted articles. You may omit this variable and the extension will assume it as false.

$wgMementoConfigDeleted = false; # Toggles the feature to do datetime content negotiation for deleted pages. False by default.

Short URL

This plugin works with Short URLs only. If your server does not support Short URL, add these lines to your LocalSettings.php file.

$wgArticlePath = "$wgScriptPath/index.php/$1";
$wgUsePathInfo = true;

Server Setup

In addition to the default MediaWiki installation, this plugin will also work in a setup with URL rewriting. This plugin will also work with wikis in a proxy setup.

TimeGates, TimeBundles/TimeMaps and their Workings

The earlier implementations of this plugin combined the functionality of a TimeGate and the Memento Plugin into one single extension. This version of the plugin however, implements a separate TimeGate. A TimeGate is a resource that enables transparent datetime content negotiation. The TimeMap exposes all the available mementos for a resource enabling cross-archive services and provides finer datetime granularity.

The current implementation has three separate plugins:

  1. Memento Plugin: When visiting any article in your wiki, this client will return the URL of the article's TimeGate in the Link header. This URL can be used to retrieve the mementos of the article.
  2. TimeGate: The TimeGate parses the datetime in the Accept-Datetime header, performs the content negotiations and redirects the client to the appropriate version that was live during the datetime expressed in the header.
  3. TimeMap: The TimeMap retrieves the mementos of the article, and serializes it in application/link-format. This TimeMap is paged: For articles with many revisions, the TimeMap will only return the number of Mementos specified by the configuration parameter $wgTimemapNumberOfMementos. If this variable is not set, this number defaults to 500. The TimeMap will also provide links to retrieve additional mementos by using the link header rel type "timemap next", "timemap prev".

The TimeGate can be accessed directly with the URL: http://your.wikiserver.here/index.php/Special:TimeGate. This extension will return the URL of the TimeGate in the Link header for every article that is visited. The URL format is: http://your.wikiserver.here/index.php/Special:TimeGate/http://your.wikiserver.here/index.php/Title

The Memento page returns the URL to the TimeMap in the Link header. The URL format is: http://your.wikiserver.here/index.php/Special:TimeMap/http://your.wikiserver.here/index.php/Title

Usage

The best way to experience this extension is by installing the MementoFox Plugin for the Firefox browser. After installing MementoFox, enter the URL of a page in your wiki and set the desired date-time in MementoFox. MementoFox will use the TimeGate installed in the wiki to redirect you to the version of the article that was live at the requested date-time.

After setting the date-time in MementoFox, a user can click both the internal and external links in the page and navigate the web in the past.


This extension can also be used and tested in two other ways:

  1. Using a Firefox browser: Install the Modify Headers Firefox extension. Then set the Accept-Datetime header from the Tools/Modify Headers menu option. The syntax to use is Accept-Datetime: Sat, 03 Oct 2009 10:00:00 GMT. Set it to a date-time at which your wiki was already generating history pages. Then enter a URL of a page from your wiki that has associated history pages around the date-time you chose. Using the Live HTTP Headers Firefox Extension, the request and response headers involved in this transaction can be seen. The URL to the TimeGate can be obtained from the <Link> header, using this extension. This URL can be used to navigate to the TimeGate.
  2. Using the UNIX command line tool curl:

     	curl -o null.html -D headers.txt -H 
     		"Accept-Datetime: Sat, 03 Oct 2009 10:00:00 GMT" 
     		http://your.wikiserver.here/your-title-here

For complete information about the memento framework and it's request and response headers, please refer to the IETF Memento Internet Draft.

Templates

Mediawiki by default, retrieves the most recent version of a template when transcluded in an article. This extension cannot perform datetime content negotiations on transcluded templates. However, we have written a quick fix that would perform this operation by adding the following code to the file Parser.php in the directory path/to/wiki/includes/parser/.

            # querying the db to get the rev_id for the template. 
                foreach($_SERVER as $key => $value) {
                    //checking for the occurance of the accept datetime header.
    	            if( strcasecmp($key, 'HTTP_ACCEPT_DATETIME') == 0 ) {
                        $req_dt = $_SERVER["$key"]; 
                        $dt = strtotime($_SERVER["$key"]);
                        $dt = date( 'YmdHis', $dt );
                        $pg_id = $title->getArticleID();

                        $dbr = wfGetDB( DB_SLAVE );
    			$dbr->begin();
                        
                        $tbl_rev = $dbr->tableName( 'revision' );
    			$res = $dbr->query( "SELECT DISTINCTROW rev_id FROM $tbl_rev 
                                             WHERE rev_page = $pg_id 
                                             AND rev_timestamp <= $dt 
                                             ORDER BY rev_id DESC 
                                             LIMIT 0,1" 
                                          );
                        if( $res ) {
    			    $row = $dbr->fetchObject( $res );
    			    $id = $row->rev_id;
                        }
                    }
                }

Paste the code above in the function statelessFetchTemplate(...), immediately after the variable

$id = false;

is declared. This code will fetch the revision_id of the template for the datetime requested and direct mediawiki to use this rev_id instead of fetching the latest version of the template. This code's been written for mediawiki version 1.8+.

This patch to the MediaWiki core is NOT MANDATORY for this plugin to work.

Caching

Mediawiki by default searches it's cache for templates using the title and retrieves the most recent version. For best result, it is recommended that the caching is disabled for templates so that mediawiki always queries the database for the revision. this can be done by either commenting the respective lines in the function getTemplateDom in Parser.php or write a simple code to skip the caching part if the Accept-Datetime header is detected.

Special Pages

Special pages under the URL http://your.wikiserver.here/index.php/Special:SpecialPages do not have a history, i.e. there are no revisions to these pages. Hence, the Memento extension will return an HTTP/1.1 406 Not Acceptable.

Deleted Contributions

To perform date-time negotiations for the deleted revisions in MediaWiki, most wiki setups require "Administrator" privileges for the user logged in. Even with administrative access, MediaWiki can only show the revisions in "Edit" mode.

To enable this feature, set the configuration variable $wgMementoConfigDeleted to true in LocalSettings.php.

Timestamps

This extension searches for and retrieves the revisions of an article using the time the article's date modified. Timestamps are not unique identifiers and it is possible that an article will have more than one revision at any given time. This extension handles this situation by redirecting to the revision that has the highest revision id.

MediaWiki does not resolve deleted revisions using revision ids, but use timestamps instead in their URIs. Hence, we could not come up with a way to resolve a situation when more than one deleted revision has the same timestamp.

Wikis with Memento Plug-in Installed

JISC Dev8D

W3C Wiki

DCMI Wiki

See also

  1. ↑ An HTTP-Based Versioning Mechanism for Linked Data http://arxiv.org/abs/1003.3661
  2. ↑ RFC 2295. http://www.ietf.org/rfc/rfc2295.txt
  3. ↑ Van de Sompel et al.: Memento: Time Travel for the Web. http://arxiv.org/pdf/0911.1112v2 (preprint)