Wp-mirror

WP-MIRROR is a free utility for building mirrors of any desired set of Wikimedia Foundation wikis.

Which Wikis
The Wikimedia Foundation offers wikipedias in nearly 300 languages. In addition, the WMF has several other projects (e.g. wikibooks, wiktionary, etc.) for a total of around 1000 wikis.

WP-MIRROR can build mirrors of any desired set of these wikis.

Why Build a Mirror
The main use cases for a mirror are these:


 * Development. If you are technically minded and need a mirror with which you may conduct experiments;
 * Infrastructure. If you need redundancy, or need to serve pages locally to minimize telecommunications traffic;
 * Offline browsing. If you need off-line access, perhaps for reasons of mobility, availability, and privacy; and
 * Research. If you need a mirror as a tool to assist your research on the contents of any given wiki.

Key Features
WP-MIRROR builds a set of mirrors:


 * Appearance. A wiki page rendered by a mirror looks very similar to the same page rendered by the WMF servers;
 * Behavior. A wiki page rendered by a mirror behaves almost the same (e.g. edit, search, user account creation, beta features); and
 * Completeness. Builds a complete mirror with original size images.

WP-MIRROR is easy:


 * Easy to install. Available as a DEB package, and is available from a Debian package repository;
 * Easy to configure. The user may select any desired set of wikis by editing just one line in a configuration file;
 * Easy to use. Sets up virtual hosts such as http://simple.wikipedia.site/, http://simple.wiktionary.site/, and http://www.wikidata.site/, one for each wiki in the set, which the user may access with a web browser; and
 * Robust. Stable even in the face of: corrupt dump files, corrupt media files, incomplete downloads, Internet access interruptions, and low disk space; and uses check-pointing to resume after process interruption.

WP-MIRROR automatically configures other software:


 * Apache2. Enables the URL rewrite module, and enables virtual hosts;
 * Cron. Sets up a cron job that updates the mirrors weekly;
 * MediaWiki. Configures MediaWiki 1.24 and several dozen extensions; and
 * MySQL. Configures MySQL to achieve an order-of-magnitude improvement in database performance.

WP-MIRROR is free:


 * Free software. Software is released under the GNU General Public License (GPLv3); and
 * Free documentation. Documentation is released under the GNU Free Documentation License, version 1.3.

Out-of-the-box Experience
WP-MIRROR by default, builds the following set of mirrors:


 * simple wikipedia,
 * simple wiktionary, and
 * wikidata wiki,

where Simple English means shorter sentences, and Wikidata is a centralized collection of facts usable by all other wikis (e.g. to populate infoboxes).

The default works out-of-the-box with no user configuration. It should build in 200ks (two days), occupy 150G of disk space, be served locally by virtual hosts: http://simple.wikipedia.site/, http://simple.wiktionary.site/, and http://www.wikidata.site/, and update automatically every week.

The default should be suitable for anyone who learned English as a second language (ESL).

Top Ten Wikipedias
The top ten wikipedias are: [//en.wikipedia.org/ en], [//de.wikipedia.org/ de], [//nl.wikipedia.org/ nl], [//fr.wikipedia.org/ fr], [//it.wikipedia.org/ it], [//ru.wikipedia.org/ ru], [//es.wikipedia.org/ es], [//sv.wikipedia.org/ sv], [//pl.wikipedia.org/ pl], and [//ja.wikipedia.org/ ja]. Because WP-MIRROR uses original size media files, the top ten are too large to fit on a laptop with a single 500G disk, unless the user does not need the images (and this is configurable). The [//en.wikipedia.org/ en wikipedia] is the most demanding case. It should build in 1Ms (twelve days), occupy 3T of disk space, be served locally by virtual host http://en.wikipedia.site/, and update automatically every month.

Platforms
WP-MIRROR 0.7.4 is known to install out-of-the-box on the following platforms:


 * Debian 7.4 (wheezy) with backports. Tested both on a host, and on a virtual machine.
 * Ubuntu 14.04 (trusty). Tested on a virtual machine.

Contact
Author is reachable by e-mail with user name wpmirrordev and domain name gmail dot com.

Other Websites

 * Home page of the WP-MIRROR project