Mediawiki-utilities/List

mwxml -- XML dump processing
This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.


 * Complexity
 * Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the fact that you’re streaming XML. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Dump mwxml.Dump] contains a [//pythonhosted.org/mwxml/iteration.html#mwxml.SiteInfo mwxml.SiteInfo] and an iterator of [//pythonhosted.org/mwxml/iteration.html#mwxml.Page mwxml.Page]‘s. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Page mwxml.Page] contains page metadata and an iterator of [//pythonhosted.org/mwxml/iteration.html#mwxml.Revision mwxml.Revision]‘s. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Revision mwxml.Revision] contains revision metadata and text.


 * Performance
 * Performance is a serious concern when processing large database XML dumps. Regretfully, python’s Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides [//pythonhosted.org/mwxml/map.html mwxml.map], a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS

See also [//dumps.wikimedia.org dumps.wikimedia.org], Special:Export, and Manual:DumpBackup.php.

mwapi -- API querying and session management
This library provides a set of basic utilities for interacting with MediaWiki’s “action” API – usually available at /w/api.php. The most salient feature of this library is the [//pythonhosted.org/mwapi/session.html#mwapi.Session mwapi.Session] class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get and post.


 * Authentication
 * [//pythonhosted.org/mwapi/session.html#mwapi.Session mwapi.Session] provides convenient login and logout methods

See also API and [//mediawiki.org/w/api.php w/api.php].

mwdb -- Database connection and querying
To do

mwtypes -- A basic type system for MediaWiki data
This library provides a set of standardized types to be used when processing MediaWiki data. All of the types in this package make use of jsonable and therefore can be trivially serialized as JSON documents.

mwoauth -- OAuth connection handler for MediaWiki
This library provide a simple means to performing an OAuth handshake with a MediaWiki installation with the OAuth Extension installed.

mwreverts -- Revert detection
This library provides a set of utilities for detecting reverts (see [//pythonhosted.org/mwreverts/detection.html#mwreverts.Detector mwreverts.Detector] and [//pythonhosted.org/mwreverts/detection.html#mwreverts.detect mwreverts.detect]) and identifying the reverted status of edits to a MediaWiki wiki.

See also m:R:Revert detection.

mwsessions -- Edit session processing
This library provides a set of utilities for group MediaWiki user actions into sessions. [//pythonhosted.org/mwsessions/sessionization.html#mwsessions.Sessionizer mwsessions.Sessionizer] and [//pythonhosted.org/mwsessions/sessionization.html#mwsessions.sessionize mwsessions.sessionize] can be used by python scripts to group activities into sessions or the command line utilities can be used to operate directly on data files. Such methods have been used to measure editor labor hours.

See m:R:Activity session.

mwpersistence -- Content persistence processing
This library provides a set of utilities for measuring content persistence and tracking authorship in MediaWiki revisions.

See also m:R:Content persistence.

mwparserfromhell -- Easy-to-use parser for wikitext
This library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.