Mediawiki-utilities/List

mwxml -- XML dump processing

 * -- [//pythonhosted.org/mwxml docs]

This library contains a collection of utilities for efficiently processing MediaWiki’s XML database dumps. There are two important concerns that this module intends to address: complexity and performance of streaming XML parsing.


 * Complexity
 * Streaming XML parsing is gross. XML dumps consist of (1) some site meta data, (2) a collection of pages that contain (3) collections of revisions. The module allows you to think about dump files in this way and ignore the factYeah. that you’re streaming XML. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Dump mwxml.Dump] contains a [//pythonhosted.org/mwxml/iteration.html#mwxml.SiteInfo mwxml.SiteInfo] and an iterator of [//pythonhosted.org/mwxml/iteration.html#mwxml.Page mwxml.Page]‘s. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Page mwxml.Page] contains page metadata and an iterator of [//pythonhosted.org/mwxml/iteration.html#mwxml.Revision mwxml.Revision]‘s. A [//pythonhosted.org/mwxml/iteration.html#mwxml.Revision mwxml.Revision] contains revision metadata and text.


 * Performance
 * Performance is a serious concern when processing large database XML dumps. Regretfully, python’s Global Intepreter Lock prevents us from running threads on multiple CPUs. This library provides [//pythonhosted.org/mwxml/map.html mwxml.map], a function that maps a dump processing over a set of dump files using multiprocessing to distribute the work over multiple CPUS

See also [//dumps.wikimedia.org dumps.wikimedia.org], Special:Export, and Manual:DumpBackup.php.

mwapi -- API querying and session management

 * -- [//pythonhosted.org/mwapi docs]

This library provides a set of basic utilities for interacting with MediaWiki’s “action” API – usually available at /w/api.php. The most salient feature of this library is the [//pythonhosted.org/mwapi/session.html#mwapi.Session mwapi.Session] class that provides a connection session that sustains a logged-in user status and provides convenience functions for calling the MediaWiki API. See get and post.


 * Authentication
 * [//pythonhosted.org/mwapi/session.html#mwapi.Session mwapi.Session] provides convenient login and logout methods

See also API and [//mediawiki.org/w/api.php w/api.php].

mwdb -- TODO
To do

mwparserfromhell -- Easy-to-use parser for wikitext

 * -- [//mwparserfromhell.readthedocs.org/en/latest/ docs]

This library provides an easy-to-use and outrageously powerful parser for MediaWiki wikicode.

mwtypes -- A basic type system for MediaWiki data

 * -- [//pythonhosted.org/mwtypes docs]

mwoauth -- OAuth connection handler for MediaWiki

 * -- [//pythonhosted.org/mwoauth docs]