Manual:Backing up a wiki

From MediaWiki.org

Jump to: navigation, search

It is important to make regular backups of the data in your wiki. This page provides an overview of the backup process for a typical MediaWiki wiki; you will probably want to devise your own backup scripts or schedule to suit the size of your wiki and your individual needs.

Contents

[edit] Overview

MediaWiki stores important data in two places:

Database 
Pages and their contents, users and their preferences, metadata, search index, etc.
File System 
Software configuration files, custom skins, extensions, images (inc. deleted images) etc.

Consider making the Wiki read-only before creating the backup - see Manual:$wgReadOnly. This makes sure all parts of your backup are consistent (some of your installed extensions may write data nonetheless).

[edit] Database

For a very well written tutorial, see Siteground: MySQL Export: How to backup phpMyAdmin database

Most of the critical data in the wiki is stored in the database, which is typically straightforward to back up. When using the MySQL backend (default), various utilities are available to assist with "dumping" the database into a file, that is, generating a script file which can be used to recreate the database and all data in it from scratch if needed.

For example, the MySQL dump tool is a command-line application which can produce a dump file given the name of the database(s) to back up. Behaviour can be altered using standard parameters which will customise the output file format, for example, setting the character encoding.


A sample command that you may run from a crontab may look like this:

nice -n 19 /usr/bin/mysqldump -u $USER --password=$PASSWORD $DATABASE -c | nice -n 19 /bin/gzip -9 > ~/backup/wiki-$DATABASE-$(date '+%Y%m%d').sql.gz

Use valid values for $USER, $PASSWORD, $DATABASE. This will write a backup file with the weekday in the filename so you would have a rolling set of backups. If you want to save the files and extensions as well, you might want to use this one.

[edit] Tables

Under close examination one finds that some of the tables dumped have various degrees of temporariness. So to save disk space (beyond just gziping), though those tables need to be present in a proper dump, their data does not. However, under certain circumstances the disadvantage of having to rebuild all this data may outweigh the saving in disk space (for example, on a large wiki where restoration speed is paramount).

See a mailing list thread about the topic.

[edit] Character set

Warning Warning: In some common configurations of MySQL 4.1 and later, mysqldump can corrupt MediaWiki's stored text. If your database's character set is set to "latin1" rather than "UTF-8", mysqldump in 4.1+ will apply a character set conversion step which can corrupt text containing non-English characters as well as punctuation like "smart quotes" and long dashes used in English text.

You can see which character set your tables are using with a mysql statement like SHOW CREATE TABLE text; (including the semicolon). The last line will include a DEFAULT CHARSET clause.

If the last line does not include a DEFAULT CHARSET clause then there is another way if you know that nobody has changed the character set of the database server since it was installed and the wiki's database was created using the default character set of the database. The STATUS command displays the database server's default character set next to Server characterset:. Here is an example output:

mysql> status
- - - - - - - - -
mysql  Ver 12.22 Distrib 4.0.20a, for Win95/Win98 (i32)

Connection id:          13601
Current database:
Current user:           root@localhost
SSL:                    Not in use
Server version:         4.0.20a-nt
Protocol version:       10
Connection:             localhost via TCP/IP
Client characterset:    latin1
Server characterset:    latin1
TCP port:               3306
Uptime:                 27 days 4 hours 58 min 26 sec

Use the option --default-character-set=latin1 on the mysqldump command line to avoid the conversion if you find it set to "latin1".

Like this:

/usr/bin/nice -n 19 /usr/bin/mysqldump -u $USER -p$PASSWORD --default-character-set=$CHARSET $DATABASE -c | /usr/bin/nice -n 19 /bin/gzip -9 > ~/backup/wiki-sql-$(date '+%a').sql.gz

Also one can try --default-character-set=binary . “Convert latin1 to UTF-8 in MySQL” on Gentoo Linux Wiki has more information.

[edit] Latin-1 to UTF-8 conversion

Note: In the following I expressly use different input- and output file names for commands using sed because the -i (inplace) option of sed throws problems on very big dumps. The described procedure was used several times and works 100% reliably. Section contributed and updated by --Wikinaut 15:36, 1 February 2010 (UTC)


When you want to upgrade from a rather old Mediawiki installation with Latin-1 to UTF-8 which might be tricky depending on your operating system and MySQL settings - in my example from Mediawiki 1.5 (2004) to 1.15.1 (2009) - perform the following steps as found in the article Convert a MySQL DB from latin1 to UTF8 and further adapted to Mediawiki specialities (DBNAME is the name of your wiki database):

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset DBNAME > DBNAME.sql

Then use sed to change character settings latin1 to utf8:

sed  -e 's/character set latin1 collate latin1_bin/character set utf8 collate utf8_bin/gi' -e 's/CHARSET=latin1/CHARSET=utf8/g' DBNAME.sql > DBNAME2.sql

Every character in UTF-8 needs up to 3 bytes, thus it is necessary to decrease one key which is done with the following command
Sources:

-- Truncate so that the cl_sortkey key fits in 1000 bytes
-- (MyISAM 5 with server_character_set=utf8)
a further importing problem with the math table (ERROR line 389: Duplicate entry '' for key 1 when trying to import the mysqldump) was solved by simply deleting that math table content, the table is needed for caching purposes only and need not to be imported when upgrading
sed -e 's/`cl_sortkey` varchar(255)/`cl_sortkey` varchar(70)/gi' -e 's/`cl_sortkey`(128)/`cl_sortkey`(70)/gi' -e '/^INSERT INTO `math/d' DBNAME2.sql > DBNAME3.sql

From here I then created a new database DBNEW and then imported the dumpfile

mysql -p -e "create database DBNEW"
mysql -p --default-character-set=utf8 DBNEW < DBNAME3.sql

Now start a fresh MediaWiki installation and use your new wiki database name DBNEW - actually the UTF-8 converted copy of your untouched old DBNAME wiki - and the database copy will be automatically upgraded to the recent MediaWiki database scheme. Several successful conversions from MediaWiki 1.5 to MediaWiki 1.15.1 under PHP 5.2.12 (apache2handler) and MySQL 4.1.13 have been made.

[edit] Latin-1 to UTF-8 conversion under Windows

--LouisCyphre 10:38, 1 February 2010 (UTC)

  1. Dump your DB as usual.
  2. Convert your DB using Chargo
  3. Replace all latin1 thru utf8 inside the dump.
  4. Import the dump into a new DB or overwrite the old.
  5. Ready

Tested under WindowsXP. Mediawiki 1.13.2 dumped under EasyPHP 1.8.0.1. Converted with Chargo 0.8.1. Imported to XAMPP 1.7.3. Updated to Mediawiki 1.15.1.

[edit] File system

MediaWiki stores other components of the wiki in the file system where this is more appropriate than insertion into the database, for example, site configuration files (LocalSettings.php, AdminSettings.php), image files (including deleted images, thumbnails and rendered math and SVG images, if applicable), skin customisations, extension files, etc.

The best method to back these up is to place them into an archive file, such as a .tar file, which can then be compressed if desired. On Windows, applications such as WinZip or 7-zip can be used if preferred.

It should be possible to backup the entire "wiki" folder in "htdocs" if using XAMPP.

[edit] XML dump

It is also a good idea to create an XML dump in addition to the database dump. XML dumps contain the content of the wiki (wiki pages with all their revisions), without the site-related data (it does not contain user accounts, image metadata, logs, etc). XML dumps are independent of the database structure, and can be imported into future (and even past) versions of MediaWiki. They are also less likely to cause problems with character encoding, and can readily be processed by third party tools, which makes them a good fallback should your main database dump become unusable, and also as a means of redistributing content en masse.

To create an XML dump, use the command-line tool dumpBackup.php, located in the maintenance directory of your MediaWiki installation. Run the command as php dumpBackup.php without any arguments to display a brief description of the syntax. You need to specify whether you want a full dump of the complete history of every page, or just the current contents of each page.

If an attempt to use dumpBackup.php fails, see if creating an AdminSettings.php file solves the problem.

You can also create an XML dump for a specific set of pages online, using the Special:Export, although attempting to dump large quantities of pages through this interface will usually time out.

To import an XML dump into a wiki, use the command-line tool importDump.php. For a small set of pages, you can also use the Special:Import page via your browser (per default, this is restricted to the sysop group). As an alternative to dumpBackup.php and importDump.php, you can use MWDumper, which is faster, but requires a Java runtime environment. See Manual:Importing XML dumps for more information.

[edit] Scripts

Warning Warning: Use these at your own risk.

[edit] See also

[edit] External links