Extension:ActiveAbstract

From MediaWiki.org

Jump to: navigation, search

     

Manual on MediaWiki Extensions
List of MediaWiki Extensions
Crystal Clear action run.png
ActiveAbstract

Release status: stable

Implementation  Extended syntax
Description Generate XML feed for Yahoo's Active Abstracts project
Author(s)  Brion Vibber
License GPL
Download Download snapshot

Subversion [Help]
Browse source code

check usage (experimental)

[edit] Purpose

This extension is used by the XML Dumps to pull a formatted copy of an articles initial text and its organizational structure. It is one of the filters available for the backup system.

It will pull the first two sentences of the article along with each section heading and wrap them in xml tags and will trim the entry to be no longer then 1024.

[edit] Parameters

Abstract filter it run as a plugin to dumpBackup.php and can be invoked like

php dumpBackup.php \
  --plugin=AbstractFilter:extensions/ActiveAbstract/AbstractFilter.php \
  --current \
  --output=gzip:/dumps/abstract.xml.gz \
    --filter=namespace:NS_MAIN \
    --filter=noredirect \
    --filter=abstract
Can optionally convert output text to a given language variant:
  --filter=abstract:variant=zh-cn

Where

  • current - pulls only the latest article revision
  • output - sets where our output stream will go
  • filters
    • namespace - only pull documents from this namespace
    • noredirect - don't show redirects
    • abstract - registers the abstracts filter and runs it from within dumpBackup.php
    • variants - if variants exist, pull all latest copy and output it to a separate abstracts file

A possible way to run this in production would be

/usr/bin/php -q /apache/common/php-1.5/maintenance/dumpBackup.php   --wiki='quwiki' \
   --plugin=AbstractFilter:/apache/common/php-1.5/extensions/ActiveAbstract/AbstractFilter.php \
   --current   --report=1000   --force-normal   --server='x.x.x.x'  \
   --output=file:/mnt/dumps/public/quwiki/20090613/quwiki-20090613-abstract.xml  \
   --filter=namespace:NS_MAIN     --filter=noredirect     --filter=abstract 

note: force-normal is added for better utf8 conversion

[edit] Formatting

The required tags for each entry are:

  • doc
  • title
  • url
  • abstract
  • links
  • sublink

[edit] Example Listing

<doc>
<title>Wikipedia: An American in Paris</title>
<url>http://en.wikipedia.org/wiki/An_American_in_Paris</url>
<abstract>An American in Paris is a symphonic composition by American composer George Gershwin, composed in 1928. Inspired .. </abstract>
<links>
<sublink linktype="nav"><anchor>Instrumentation</anchor><link>http://en.wikipedia.org/wiki/An_American_in_Paris#Instrumentation</link></sublink>
<sublink linktype="nav"><anchor>Recordings</anchor><link>http://en.wikipedia.org/wiki/An_American_in_Paris#Recordings</link></sublink>
<sublink linktype="nav"><anchor>Film</anchor><link>http://en.wikipedia.org/wiki/An_American_in_Paris#Film</link></sublink>
</links>
</doc>

note: <abstract> text trimmed to not run off the page for this wiki


This extension is currently being used to generate part of the Wikimedia Xml Database Dumps. It is not meant to be used exclusively within Mediawiki but instead allows for content to be exported from your wiki installation into a specific format.