WP-MIRROR is a free utility for building mirrors of any desired set of Wikimedia Foundation wikis.
WP-MIRROR can build mirrors of any desired set of these wikis.
Why Build a Mirror
The main use cases for a mirror are these:
- Development. If you are technically minded and need a mirror with which you may conduct experiments;
- Infrastructure. If you need redundancy, or need to serve pages locally to minimize telecommunications traffic;
- Offline browsing. If you need off-line access, perhaps for reasons of mobility, availability, and privacy; and
- Research. If you need a mirror as a tool to assist your research on the contents of any given wiki.
WP-MIRROR builds a set of mirrors with the properties:
- Appearance. A wiki page rendered by a mirror looks very similar to the same page rendered by the WMF servers;
- Behavior. A wiki page rendered by a mirror behaves almost the same (e.g. edit, search, user account creation, beta features); and
- Completeness. Builds a complete mirror with original size images.
WP-MIRROR is easy:
- Easy to install. Available as a DEB package, and is available from a Debian package repository;
- Easy to configure. The user may select any desired set of wikis by editing just one line in a configuration file;
- Easy to use. Sets up virtual hosts such as https://simple.wikipedia.site/, https://simple.wiktionary.site/, and https://www.wikidata.site/, one for each wiki in the set, which the user may access with a web browser; and
- Robust. Stable even in the face of: corrupt dump files, corrupt media files, incomplete downloads, Internet access interruptions, and low disk space; and uses check-pointing to resume after process interruption.
WP-MIRROR automatically configures other software:
- Apache2. Enables the URL rewrite module, and enables virtual hosts;
- Cron. Sets up a cron job that updates the mirrors weekly;
- MediaWiki. Configures MediaWiki 1.24 and several dozen extensions; and
- MySQL. Configures MySQL to achieve an order-of-magnitude improvement in database performance.
WP-MIRROR is free:
- Free software. Software is released under the GNU General Public License (GPLv3); and
- Free documentation. Documentation is released under the GNU Free Documentation License, version 1.3.
WP-MIRROR by default, builds the following set of mirrors:
where Simple English means shorter sentences, and Wikidata is a centralized collection of facts usable by all other wikis (e.g. to populate infoboxes).
The default works out-of-the-box with no user configuration. It should build in 200ks (two days), occupy 150G of disk space, be served locally by virtual hosts: https://simple.wikipedia.site/, https://simple.wiktionary.site/, and https://www.wikidata.site/, and update automatically every week.
The default should be suitable for anyone who learned English as a second language (ESL).
The largest wikis (with over a million articles each) are: en, sv, de, nl, fr, war, ru, it, ceb, es, vi, and pl. Because WP-MIRROR uses original size media files, the largest wikis are too large to fit on a laptop with a single 500G disk, unless the user does not need the images (and this is configurable). The en wikipedia is the most demanding case. It should build in 1Ms (twelve days), occupy 3T of disk space, be served locally by virtual host https://en.wikipedia.site/, and update automatically every month.
WP-MIRROR 0.7.4 is known to install out-of-the-box on the following platforms:
- Debian 7.4 (wheezy) with backports. Tested both on a host, and on a virtual machine.
- Ubuntu 14.04 (trusty). Tested on a virtual machine.
Author is reachable by e-mail with user name wpmirrordev and domain name gmail dot com.