Mediawiki-utilities/List

mwxml -- XML dump processing

 * • [//pythonhosted.org/mwxml docs] • [//github.com/mediawiki-utilities/python-mwxml source]

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.


 * Complexity
 * Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the fact that you’re streaming XML. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Dump mwxml.Dump] contains a [//pythonhosted.org/mwxml/iteration.html#mwxml.SiteInfo mwxml.SiteInfo] and an iterator of [//pythonhosted.org/mwxml/iteration.html#mwxml.Page mwxml.Page]‘s. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Page mwxml.Page] contains page metadata and an iterator of [//pythonhosted.org/mwxml/iteration.html#mwxml.Revision mwxml.Revision]‘s. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Revision mwxml.Revision] contains revision metadata and text.


 * Performance
 * Performance is a serious concern when processing large database XML dumps. Regretfully, python’s Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides [//pythonhosted.org/mwxml/map.html mwxml.map], a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS

See also [//dumps.wikimedia.org dumps.wikimedia.org], Special:Export, and Manual:DumpBackup.php.

mwapi -- API querying and session management

 * • [//pythonhosted.org/mwapi docs] • [//github.com/mediawiki-utilities/python-mwapi source]

This library provides a set of basic utilities for interacting with MediaWiki’s “action” API – usually available at /w/api.php. The most salient feature of this library is the [//pythonhosted.org/mwapi/session.html#mwapi.Session mwapi.Session] class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get and post.


 * Authentication
 * [//pythonhosted.org/mwapi/session.html#mwapi.Session mwapi.Session] provides convenient login and logout methods

See also API and [//mediawiki.org/w/api.php w/api.php].

mwdb -- TODO
To do

mwtypes -- A basic type system for MediaWiki data

 * • [//pythonhosted.org/mwtypes docs] • [//github.com/mediawiki-utilities/python-mwtypes source]

mwoauth -- OAuth connection handler for MediaWiki

 * • [//pythonhosted.org/mwoauth docs] • [//github.com/mediawiki-utilities/python-mwoauth source]

mwsessions -- Edit session processing

 * • [//github.com/mediawiki-utilities/python-mwsessions source]

mwmetrics -- A collection of statistics and measurements for MediaWiki

 * [//github.com/mediawiki-utilities/python-mwmetrics source]

mwparserfromhell -- Easy-to-use parser for wikitext

 * • [//mwparserfromhell.readthedocs.org/en/latest/ docs] • [//github.com/earwig/mwparserfromhell source]

This library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.

mwevents -- A generalized event extraction and processing framework

 * • [//github.com/mediawiki-utilities/python-mwevents source]