Mediawiki-utilities

MediaWiki-utilities is a collection of simple, sharp tools for extracting and processing MediaWiki data using Python. These libraries are inspired by the Unix philosophy. Each library is designed to do one thing and do it well. The libraries are designed to work together. Where applicable, they also include unix-style command line utilities that handle text streams, because that is a universal interface.


 * mwapi (source • docs) -- MediaWiki API
 * mwdb (source • docs) -- MediaWiki database
 * mwsql (source • docs) -- MediaWiki SQL dumps
 * mwxml (source • docs) -- MediaWiki XML dumps
 * mwreverts (source • docs) -- MediaWiki reverts
 * mwsessions (source • docs) -- MediaWiki activity sessions
 * mwdiffs (source • docs) -- MediaWiki revision diffs
 * mwoauth (source • docs) -- MediaWiki OAuth helpers
 * mwtypes (source • docs) -- MediaWiki datastructures
 * mwpersistence (source • docs) -- MediaWiki content persistence tracking

Related libraries

 * mwparserfromhell ([//github.com/earwig/mwparserfromhell source] • [//mwparserfromhell.readthedocs.org/en/latest/ docs]) -- an easy-to-use and outrageously powerful parser for MediaWiki wikicode.
 * mwparserfromhtml (source • docs) -- a parser for Wikimedia Enterprise (Parsoid) HTML dumps inspired by mwparserfromhell.
 * mwedittypes (source • docs) -- a diff engine that generates structured details about changes between two revisions of wikitext.
 * mwtokenizer (source • docs) -- a tokenizer for splitting plain text into sentences, words that works for (almost) all languages in Wikipedia
 * pywikibase ([//github.com/wikimedia/pywikibot-wikibase source]) -- a set of types for handling the Wikibase data model (item, property, claim, etc.)

Resources

 * Unofficial GitHub group: github.com/mediawiki-utilities