Mediawiki-utilities/List

From mediawiki.org

Datasource[edit]

mwxml -- XML dump processing[edit]

pip install mwxml • docs • source

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.

Complexity
Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the fact that you’re streaming XML. A mwxml.Dump contains a mwxml.SiteInfo and an iterator of mwxml.Page‘s. A mwxml.Page contains page metadata and an iterator of mwxml.Revision‘s. A mwxml.Revision contains revision metadata and text.
Performance
Performance is a serious concern when processing large database XML dumps. Regretfully, python’s Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides mwxml.map(), a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS

See also dumps.wikimedia.org, Special:Export, and Manual:DumpBackup.php.

mwapi -- API querying and session management[edit]

pip install mwapi • docs • source

This library provides a set of basic utilities for interacting with MediaWiki’s “action” API – usually available at /w/api.php. The most salient feature of this library is the mwapi.Session class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get() and post().

Authentication
mwapi.Session provides convenient login() and logout() methods

See also API and w/api.php.

mwdb -- Database connection and querying[edit]

pip install mwdb • source

This library provides a set of utilities for connecting to and querying a MediaWiki database.

Authentication & authorization[edit]

mwoauth -- OAuth connection handler for MediaWiki[edit]

pip install mwoauth • docs • source

This library provide a simple means to performing an OAuth handshake with a MediaWiki installation with the OAuth Extension installed.

Data processing[edit]

mwdiffs -- Revision diff processing[edit]

pip install mwdiffs • docs • source

This library provides a set of utilities for generating information about the difference between revisions.

mwreverts -- Revert detection[edit]

pip install mwreverts • docs • source

This library provides a set of utilities for detecting reverts (see mwreverts.Detector and mwreverts.detect()) and identifying the reverted status of edits to a MediaWiki wiki.

See also m:R:Revert detection.

mwsessions -- Edit session processing[edit]

pip install mwsessions • docs • source

This library provides a set of utilities for group MediaWiki user actions into sessions. mwsessions.Sessionizer and mwsessions.sessionize() can be used by python scripts to group activities into sessions or the command line utilities can be used to operate directly on data files. Such methods have been used to measure editor labor hours[1].

See m:R:Activity session.

mwpersistence -- Content persistence processing[edit]

pip install mwpersistence • docs • source

This library provides a set of utilities for measuring content persistence and tracking authorship in MediaWiki revisions.

See also m:R:Content persistence.

mwparserfromhell -- Easy-to-use parser for wikitext[edit]

pip install mwparserfromhell • docs • source

This library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.


Basic utilities[edit]

mwtypes -- A basic type system for MediaWiki data[edit]

pip install mwtypes • docs • source

This library provides a set of standardized types to be used when processing MediaWiki data. All of the types in this package make use of jsonable and therefore can be trivially serialized as JSON documents.

mwcli -- Utilities for unix command-line data processing[edit]

pip install mwcli • source

Incubator[edit]

These libraries are experimental and may change dramatically or be discontinued.

mwmetrics -- A collection of statistics and measurements for MediaWiki[edit]

source

mwevents -- A generalized event extraction and processing framework[edit]

pip install mwevents • source
  1. Using Edit Session to Measure Participation in Wikipedia R. Stuart Geiger & Aaron Halfaker. (2013). CSCW (pp. 861-870) DOI:10.1145/2441776.2441873.