Manual talk:Backing up a wiki

=2007=

First Note
Note: Some of the information in this page was adapted from Manual:Moving a wiki. robchurch | talk 20:59, 11 April 2007 (UTC)

Information needed on how to verify and restore a backup
Great overview, but so much more needed:

- how to verify our backup or export

- how to restore or import our data again

And we need specific, unambiguous, differentiating definitions of these words. Let me try:

"backup / restore" file copy, done from outside the program using the operating system utilities

"export / import" data copy, done inside the program using MediaWiki program interface utilities

... or something like that. Then we can clearly write specific steps (the "by doing what") for each way.

I agree that backing up everything is probably best, but, how does someone KNOW what's been customized and belongs to them, and what's standard and can be replaced from a fresh reinstall of the master software? Does any restore or reinstall routine intelligently preserve existing data? Or, do these process merely overwrite anything in their way? So, if I backup today, then crash tomorrow, and then restore yesterday's backup, and something is missing or it doesn't function properly, what should I do? I'd probably then reinstall from scratch to rebuild an empty MediaWiki structure. Then I'd try restoring from my backup again to fill in the supposedly completely rebuilt but empty structure. What if even that fails? What gets clobbered? How do I "know"?

Should I reinstall an empty MediaWiki and try to "import" data? What if I don't have a data "export", but I only have a file "backup" copy? How do I then reconfigure my customized choices? Do all my users come back with a restore or import?

So, it probably makes sense to preserve an mirror image off line and copy everything or one file at a time from there during troubleshooting if something's not working in the main system. My MediaWiki is small at the moment - ~500,000 words in ~4,500 sections, ALL files = ~120 MB, so making multiple copies to CD daily is acceptable, even ~5 backups to a CD will fit, total materials cost per year of ~$60 or less.

I suggest people try renaming directory structures to pull their main MediaWiki off line and TRY restoring their backup to a fresh directory structure to verify if their chosen backup routine works or not. If not, figure out why not before trusting our backups!

-- Peter Blaise peterblaise 11:21, 19 April 2007 (UTC)


 * I have moved the paragraph you added for now, as it's not quite correct:
 * See also Verifying a wiki backup and Restoring a wiki from a backup and Combining wikis from multiple backups. Note also that this page addresses backing up DATA only.  Your MediaWiki installation also probably includes much custom configuration in the form of changes to various supporting files that are as yet NOT incorporated into any database table, including CSS cascading style sheet files and PHP script files and JS Java script files.  You must backup and restore / re-integrate these as separate steps.  (Does someone want to write an extension to import and export all support files into supplemental database tables so everything is all in one place?)


 * The statement that the page is about data backup only is plain wrong - there's an extra section about backing up files; It could probably be more detailed, though. The only file that is "custom" by default is LocalSettings.php, and the uploaded files in the images directory of course.
 * Also, while red links are generally a good thing, they should "invite" people to write pages that actually make sense. First of all, those pages should be in the Manual namespace. "Verifying a wiki backup" isn't really possible, or rather, it's the same as Restoring a wiki from a backup (which is a page we should probably write soon). And Combining wikis from multiple backups probably doesn't make sense as a page of its own, and should be addressed in Restoring a wiki from a backup which should also explain what is overwritten when, or not.
 * There's also Manual:Moving a wiki, with which the contents of future pages should be coordinated.
 * Sticking everything into the database isn't really possible - the configuration must be outside, because it has to tell mediawiki how to access the database in the first place. Customized JS and CSS can go into the database (as MediaWiki:common.js and MediaWiki:common.css respectively), and that's the preferred way. Skins that need custom PHP code cannot reside in the database, program code needs to be in files (for technical as well as security reasons). Same for extensions. -- Duesentrieb ⇌ 11:32, 19 April 2007 (UTC)

Import config files, backup, verify, restore, automate
Peter Blaise says:

Why not auto-import text config files into the database?

Thanks, Duesentrieb. I see your points. Yes, there's mention of some support files, but I have dozens not mentioned! In my pushing for the maturation of MediaWiki, including it's support universe, I suggest that it could use a feature to automatically import copies of our custom config files into the main database even though it needs master copies of those files outside, in the operating system, in order to run properly. Your suggestion of manually creating copies as articles is interesting. How about automation, anyone?

Why not create an auto verify of backup?

I also see that you agree with me that there is no MediaWiki "verify" or "compare" option for backup or export, and as you say, so all we can do is try to restore or import and check it manually (against what, our memory of how the wiki behaved before?). Again, I'm pointing out a difference between "mature" applications we may have experienced before MediaWiki, and MediaWiki. My operating system backup application has a verify / compare feature after backup, MediaWiki has ... what?

Let's call items by their names, not shorthand.

Also, to reduce confusion and invite and enhance the quick success of newcomers, may I suggest sticking to a specific and complete nomenclature? You say you moved my comments to the "talk" pages, which I could not find. But, thankfully, I eventually found your link that brought me to the "discussion" page, which I'd already been looking at, and was already using. Why not call it the "discussion/talk" page? Thanks.

-- Peter Blaise peterblaise 18:02, 20 April 2007 (UTC)


 * The phrases "talk page" and "discussion page" are synonymous and interchangeable in MediaWiki wiki culture. robchurch | talk 00:54, 29 April 2007 (UTC)


 * Peter Blaise says:
 * So ... newbies, non-wiki culture people, are not invited to make the wiki their home? Rob, I'm suggesting that we recognize how offputting and success-inhibiting "jargon" is to newcomers.  I suggest that we all foster growth by welcoming newcomers, and welcoming and recognizing criticism, not chiding them for their "not getting it, not fitting in".  I'm suggesting that any Wiki is not owned by first comers, but is owned by anyone at any moment who is reading and offering their edits, their input at any time.  I think this coincides with the intended Wiki "culture" more so than elitist jargon and belligerent exclusivity of first-comers against newbies.
 * "Why can't we all just get along?"
 * -- Rodney King
 * "Talk" and "discussion" ado not function as synonyms in that they are not interchangeable. Note this page's URL says "talk" yet this page's tab says "discussion". Try typing "discussion" into the URL and try looking for a "talk" tab.  No can do.  Is it so hard to say, "I moved your comment to the discussion/talk page" AND give a link, rather than presume others can figure out the ambiguity on their own, and then dismiss them when they suggest a way around getting lost?
 * -- Peter Blaise peterblaise 10:39, 1 May 2007 (UTC)


 * The problem is that internally, these pages are "talk" pages, and the namespace's english name has always been "talk" (and user_talk, etc). But the (english) text on the tab at the top of each page was at some point decided to be labeled "discussion" instead. This label can be changed by editing MediaWiki:Talk. Most other languages seem to use the equivalent of "Diskussion" on the label, and also as the Namespace name. Some languages may use something else entirely.
 * This is inconsistent and confusing, yes. But it's a fact that in the context of MediaWiki, "talk" and "discussion" are interchangable; Making "Discussion" an alias for the "Talk" namespace may be nice, but it would break backwards compatibility. This confusion isn't easy to resolve. It's better to explain it and live with it. -- Duesentrieb ⇌ 12:57, 1 May 2007 (UTC)

"php dumpbackup.php --full" returns "DB connection error: Unknown error"
Peter Blaise asks:

I run:
 * C:\www\apache2\htdocs\mediawiki\maintenance>php dumpbackup.php --full
 * DB connection error: Unknown error

... and when I search the drive for new files, I see nothing's been created. How should an XML export / backup work? HELP, please!

-- Peter Blaise peterblaise 13:34, 20 June 2007 (UTC)


 * Learn to use a shell. You will notice that the dumpBackup.php script spits out XML. This XML should be saved into a file. The standard means of doing this is to redirect standard output to a file. On Windows (and also on POSIX-compliant shells), this is done using the > operator, e.g.
 * php dumpBackup.php --full > backup.xml
 * We expect our users to have at least a basic working knowledge of their computers. robchurch | talk 14:11, 21 June 2007 (UTC)


 * ...and geeks are expected to read carefully before bashing other users: Peters problem is not the output file, his problem is the DB connection error: Unknown error. Got the same under Kubuntu Feisty, seems something is wrong with the out-of-the-box installation. Cheers, 88.73.85.21 13:45, 21 July 2007 (UTC)

I'm getting the same error on an FC5 installation. Does anyone know the cause of this error. So far my Google searched haven't turned up any relevant information. Zeekec 20:32, 1 August 2007 (UTC)

I tried creating an AdminSettings.php file as suggested, but still get the same error. Zeekec 15:27, 2 August 2007 (UTC)


 * And I'm too...
 * I've exactly the same error but need dumpBackup.php for Lucene integration.
 * Can nobody explain the DB connection error: Unknown error- Problem? --11 September 2007

OK Guys,

After big trouble and consideration of this script I've found a solution for this/my and our Problem. The Problem exists, because of the for dumpBackup.php required File "includes/backup.inc". This File does the main-backup-work and uses some MediaWiki-Variables($wg...). This is really no Problem, if dumpBackup.php runs with mediaWiki but as standalone console-script, it will miss this $wg..-Parameters. So dumpBackup.php uses empty strings for $wgDBtype,$wgDBadminuser,$wgDBadminpassword,$wgDBname,$wgDebugDumpSql and this causes the DB connection error: Unknown error while running. I've solved this Problem with a self-written php-wrapper-script, which only initializes this Variables and then simply include dumpBackup.php and now it works fine.

This is my php-wrapper-script: <?php
 * 1) dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
 * 2) @author: Stefan Furcht
 * 3) @version: 1.0
 * 4) @require: /srv/www/htdocs/wiki/maintenance/dumpBackup.php

$wgDBtype = 'mysql'; $wgDBadminuser="[MySQL-Username]"; $wgDBadminpassword ="[MySQL-Usernames-Password]"; $wgDBname = '[mediaWiki-Database-scheme]'; $wgDebugDumpSql='true';
 * 1) The following Variables musst be set, to get dumpBackup.php at work
 * 1) you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php

require_once("/srv/www/htdocs/wiki/maintenance/dumpBackup.php"); ?>
 * 1) XML-Dumper 'dumpBackup.php' requires the setted Vars to run
 * 2) simply include the original dumpBackup-Script

Now you can use this script as like as the dumpBackup.php with exception it will (hopefully) now run correctly. Example:  php dumpBackupInit.php --current > WikiDatabaseDump.xml 

I hope this will help you. Please excuse my properly bad english

Regards -Stefan- 12 September 2007


 * Another (simpler) solution.
 * Simply add the above mentioned variables to you LocalSettings.php
 * You will notice that most of them are already there. The ones that need to be added are:

$wgDBadminuser="[MySQL-Username]"; $wgDBadminpassword ="[MySQL-Usernames-Password]";
 * Tested to work with MediaWiki 1.11.0
 * -Rammer- 9 November 2007

Similar problem with dumpBackup
I have been using MediaWiki for about a year, and have about 20 MB of pages in 1.6.6. I have been backing it up regularly with mysqldump. This computer is still running fine, but I would like to be able to move the data to another computer. I installed 1.10.1 on the new computer, restored the data to mysql, and found that Table 'wikidb.objectcache' doesn't exist (localhost). Rooting around some more, I found out about dumpBackup. On the new computer, php dumpBackup.php --full > wikidump works. On 1.6.6 it does not, and reports (Can't contact the database server: Unknown error). Looking inside dumpBackup, I find, near the beginning: require_once( 'command_line.inc' ); require_once( 'SpecialExport.php' ); require_once( 'maintenance/backup.inc' ); What does this do? Well, the PHP Manual says that it includes the files, and the link to "require" says that if it does not find them, it will fail. However, looking around for these files, 'command_line.inc' and 'backup.inc' are in 'maintenance', but 'SpecialExport.php' is in 'includes'. But how does php find the files? 'SpecialExport is in a different folder, and the prepended 'maintenance' edge on the third file suggests the program should be run from the root folder for the wiki. In any case, the first and third are inconsistent. It would be very helpful to readers if someone who has actually used dumpBackup can explain how it is supposed to be used, and what sets the environment so that files such as these can be found. The environment is very likely relevant, since the "keys" to the database come from LocalDefaults.

--Docduke 2 August 2007 0147 GMT.

P.S. The gremlins are working overtime tonight. I logged into MediaWiki about a half hour ago, now it won't let me in. I even had it send me a new password, and it won't let me in with that either!

--Docduke 01:16, 2 August 2007 (UTC) [It let me in on a different computer]


 * Digging deeper ... In version 1.6.1, the "Can't contact the database server" message comes from line 1829 of includes/Database.php. My guess is that the "real" people adapt a copy of "index.php" to initialize the environment, then call dumpBackup from that environment in an automated backup script.  Either a copy of that script, or some indication of what is in it, would be greatly appreciated!
 * Docduke 02:48, 2 August 2007 (UTC)
 * I still have no idea how dumpBackup sets its environment, but I have found the solution to my problem, and probably that of the other folks who have reported backup failures. It is necessary to create an AdminSettings.php file at the wiki root from a copy of AdminSettings.sample, inserting a valid username, password pair for a DBuser with full privileges.  Then dumpBackup runs when started in either the root folder or the "maintenance" folder, in version 1.6.6.  It runs in 1.10.1 without an AdminSettings file because the LocalDefaults file has a username, password pair with administrator privileges.  [Hint: I tried update.php.  It failed, but the resulting error message said to check the AdminSettings file.]
 * Docduke 03:18, 2 August 2007 (UTC)

back up with phpmyadmin
If your host will not allow you to access such tools and you can only use phpmyadmin, or if this does not work for you, you might want to:
 * 1) export your full mysql wiki from phpmyadmin export functionality. Save the exported file locally.
 * 2) edit the exported file, and change the following if it applies for you:
 * 3) search and replace to change your tables prefixes (e.g., because prefixing is no more required on your new host)
 * 4) to work around the "latin1 in mysql > 4.1" character set problem, search and replace latin1 character set with utf8 one's. This might cause some strange behaviors afterwards because I'm not sure that media wiki won't be disturbed by the column encoding changing without warning. But apparently, for me, it works (and I found no other way to do it).
 * 5) * please note: as utf8 encoding take more space (three bytes per character) than latin1 (two, I think), some keys might become too large (my mysql installation does not allow keys > 1000 bytes). For these fields I didn't change the encoding (luckily these tables were empty at migration time). You can just do this by trial and error: phpmyadmin will warn you at import time if some key is too big.
 * 6) you might want to transform the utf8 to latin1 back with ALTER TABLE statements (phpmyadmin can do that for you). This will not revert the changes you just made as this time the contents will be re-encoded also.

Maybe this should be checked by some mediawiki expert and, if judged a good advice, integrated into the manual? I spent a full day searching for this workaround! --OlivierMiR 7 July 2007


 * Thanks for the idea : worth trying but not sufficient for me though :(
 * NewMorning 20:10, 16 June 2008 (UTC)

latin1
The following line contradicts itself, no? Use the option --default-character-set=latin1</tt> on the mysqldump command line to avoid the conversion if you find it set to "latin1". Jidanni 04:46, 13 December 2007 (UTC)


 * The latin warning has me confused as well. I think that section should be rewritten/clarified by someone who understands it.
 * Where do I enter 'SHOW CREATE TABLE text'?. It doesn't look like a valid sql command to me and gives me an error when I run it.
 * You can see which character set your tables are using with a statement like SHOW CREATE TABLE text. The last line will include a DEFAULT CHARSET clause.
 * --71.107.96.222 17:40, 22 March 2008 (UTC)


 * No answer still.


 * I agree with the above:
 * SHOW CREATE TABLE text; is not a working command


 * replacing "text" or skipping "text" neither


 * --Livingtale 09:06, 20 September 2008 (UTC)

Rewrite this page
This page is written very badly, it jumps all over and I can barely figure out what to do, please make them into more short and simple steps. PatPeter 20:51, 10 December 2007 (UTC)

=2008=

Corruption Section for MySQL 4.1 is unclear
The section that discusses possible corruption due to nonstandard character encoding is unclear. I cannot seem to make out if it is saying that the dump may be corrupted or if my actual database may be corrupted.

It discusses doing a conversion prior to dumping, but does not say if this conversion is reflected back to the DB.

I thank you for the documentation that is here, but could you please clarify this issue. --Vaccano 17:50, 30 January 2008 (UTC)


 * I agree, this section is not clear at all!
 * I'm setting up a new wiki, my host has mysql 5.
 * In phpMyAdmin, here are the server variables I have :
 * character set client = utf8
 * character set connection = utf8
 * character set database = latin1
 * character set filesystem = binary
 * character set results = utf8
 * character set server = latin1
 * character set system = utf8
 * What should I do ? The connection, client, results and system are utf8 but server is latin1. Do I need a conversion ?
 * --Iubito 17:24, 1 March 2008 (UTC)


 * You can check for instant in your "categorylinks" table : mine is full of accent transformed in strange caracters. Impossible to figure out what to do with phpMyAdmin though...


 * NewMorning 20:10, 16 June 2008 (UTC)

I'm a WikiNewb....help!!
Can I just back up my wamp file and everything in it on an external hard disk?

I mean, the backup that I'm used to consists of moving files to a different disk. I know that there must be more to it than this when it comes to a wiki, but, frankly, I don't know what the heck I'm doing.

I downloaded MediaWiki two days ago to use as a database for personal journals, etc. and know how to edit and create new "articles" within the database. I plan on creating new "articles" everyday for the rest of my life and suspect that my computer's hardware will not last that long. So, I not only want to back up all of my database information, in case of some tragedy, but eventually will want move it to a different computer, altogehter.

What am I to do? --13 June 2008

My table categorylinks is corrupted !
I can't figure out how this is possible : I first thought that the dump had corrupted the table. But when I checked inside PHPMyAdmin, I realised that the french accents, correctly written in the wiki, where corrupted in the table ! This is annoying since my wiki was supposed to be a test version, and I need to backup it for a new server where strange caracters give strange caracters on screen ! I only have access to PhpMyAdmin, and wonder what I can do with that : any suggestion ? NewMorning 20:18, 16 June 2008 (UTC)


 * In the end it's not that bad : I extracted my corrupted database and imported as well with bad caracters : they appear correctly in th other wiki ! Strange that I can't have them corrected in the DB though... I also could read the database extraction using a text converter to UTF8, but if inserted corrected in the other DB the wiki sets strange caracters again!
 * --NewMorning 04:32, 22 June 2008 (UTC)


 * phpmyadmin probably got it wrong. actually, it doesn't have a way to know how the caracters in the tables are encoded, since mediawiki (per default) stores them as binary.
 * Generally, be very carfull about character encoding: no matter what client you use (phpmyadmin, php cli client, whatever), mysql nearly always performs some conversion on the characters. which is supposed to make them "look right" for you, but quite often screws things up. -- Duesentrieb ⇌ 11:37, 22 June 2008 (UTC)


 * Thanks for answering, I got it through anyway : export was weard (strange accents) but I imported it the same way and it worked ! I used PHPmyAdmin both times, and both time had set everything to UTF8. The export itsefl was readable with a text editor tranforming it to UTF8, but I could not import it afterwards : it was nice in the table, but awful in the wiki ! --NewMorning 22 June 2008

dumpBackup.php seems to be generating invalid xml
I'm trying to export my MediaWiki content to TWiki and the conversion program fails with "junk after document element at line 9626, column 2, byte 907183 at ... (very long error message)" The problem seems to be that the xml produced by dumpBackup.php is invalid. I tested this by pointing Firefox to the xml dump and it stops at the same place. I've looked at the raw xml code and I don't see anything obviously wrong. Any ideas what might be the problem? I'm using MediaWiki 1.12 and FreeBSD 6.2 --Ldillon 6 August 2008
 * please run xmllint --noout
 * xmllint is standard on most linux distributions, don't know about bsd. if you can't find it, please find some other xml checker and run it over the file. the hope is that it will produce a more meaningful error message. -- Duesentrieb ⇌ 19:42, 6 August 2008 (UTC)
 * Also, please provide the xml code around the given location (use head and tail, or something similar) -- Duesentrieb ⇌ 19:44, 6 August 2008 (UTC)

Here's the output of xmllint --noout (from a Linux box) Basically, there is a at line 1, a corresponding at line 9808, and a new at line 9809, where the parser errors. I did a quick grep -c \<page && grep -c \<\/page and I can at least say there are the same number of open and close tags. I'd include more but, even with the code and nowiki tags, this page tries to parse the text. --Ldillon 15:55, 7 August 2008 (UTC)


 * If the first tag is, then something is very wrong - should not be the top level tag (and there must only be one top level tag in an xml document, hence the error). The first tag in the file should be a tag, which wraps everything - compare the output of Special:Export/Test. What'S the exact command used to generate these dumps? do you perhaps use --skip-header? That would generate such an incomplete xml snippet. -- Duesentrieb ⇌ 18:55, 7 August 2008 (UTC)

I've tried a bunch of things to get this working, including the --skip-header and --skip-footer flags, because I was getting header and footer "junk" that I didn't need when I tried to import to TWiki. Sorry if I do not completly understand what the flags are supposed to do; I didn't see any documentation that said otherwise so I took it at face-value. I'm still getting errors on import when i omit the flags, but the xml passes "xmllint --noout" so I guess the problem lies elsewhere. Thank you for your feedback.--Ldillon 21:19, 7 August 2008 (UTC)

Bringing the wiki offline?
Is it possible to bring a MediaWiki site offline or to put it in a "read only" mode so no changes are made during a database dump? --Kaotic 12:50, 20 August 2008 (UTC)


 * Manual:$wgReadOnly -- Duesentrieb ⇌ 13:55, 20 August 2008 (UTC)


 * I've taken your script and added the ability to place the wiki into read/write mode. let me know what you think. User:Kaotic/WikiBackup --Kaotic 09:51, 21 August 2008 (UTC)

alternative cronjob command
Apparently I don't have rights to make changes on the page .... Sytange: you can post, and after that everything is disappeared....

Because if you don't have the nice-program the following alternative cronjob:

/usr/bin/mysqldump -u [username] --password=[password] [databasename] | /bin/gzip > [databasename].gz


 * [username] - this is your database username
 * [password] - this is the password for your database
 * [databasename] - the name of your database

don't forget to remove the []

You can use other names for the file: [database].gz and/or put a number before it.

If you use different numbers for each cronjob, you can schedule it (for instance: a cronjob for each day of the week and name it 1[databasename], 2[databasename] etc. For uploading the .gz file with the database manager (in my situation with DirectAdmin) the name of the file is not critical, but it has to be a .gz file. --Livingtale 13:33, 22 September 2008 (UTC)

Files list
The section about file system should contain a comprehensive list of files and directories to back up, or link to such a list. --Florent Georges 00:13, 17 November 2008 (UTC)

=2009=

N00BY DUDES, HERE IS WHAT YOU WANT
These links tell what noobs like me need to backup in PhpMyAdmin:

http://mambo-manual.org/pages/viewpage.action?pageId=393703

http://mambo-manual.org/display/conf/Backing+up+the+Database

http://mambo-manual.org/display/conf/Restoring+the+Database

Good luck. SpiralOfYarn 21:42, 11 April 2009 (UTC)

Charset problem after switching to wgDBmysql5 = true
Hello,

After years, we finally solve our charset issue and have been able to switch back to "wgDBmysql5 = false".

One of our symptoms was some special characters (like "φ") were converted silently (corrupt) into literal question marks (?).

How we resolved it is documented here in English and in French.

I hope this will help!

Jean-Luc

xml backup/restore + database backup/restore or one of the other?
I'm confused, should I backup(and restore) both xml and sql database (mysql) or just one or the other?


 * As per the article, an XML dump does not include site-related data (user accounts, image metadata, logs, etc), so you'd be better off performing an SQL Dump. In a discussion about MediaWiki backups, I'd describe an XML dump as more of an 'Export' and in this sense it can be used as a fallback to save off the rendition of wiki content (with or without historical revisions), but it is not capable of restoring your MediaWiki installation to a running state on a new server, as it existed previously. -- Gth-au 03:44, 3 November 2009 (UTC)

crontab?
Since the manual are meant for non-experts too, a word like crontab maybe shouldn't be used, or defined clearly. Personally I have no idea what a crontab is.


 * The word is now linked to the "cron" Wikipedia article. —Emufarmers(T 05:00, 16 May 2009 (UTC)


 * The explanation of cron is useful, however while a simple backup may be attempted by non-experts, investigations should be someone who is at least familiar with your environment's operating system, database engine, webserver and MediaWiki installation. -- Gth-au 03:44, 3 November 2009 (UTC)

Maintenance / Optimisation / or normal part of Backup?
Can the reference to data that need not reside in the wiki backup content be clarified? Specifically, this list discussion linked to from the article mentions content that could be rebuilt upon restoration, without explaining how that rebuild would be performed, nor how long that may typically take in a small/large wiki implementation. Comments are also made that during the rebuild period the user experience would be curtailed in some areas (search, whatlinkshere, category views; any other areas?) - would users see a warning that the wiki is undergoing a rebuild and to come back later? What happens to edits made during the rebuild process? I see the reduction of backup data as a useful method of achieving faster backups and restoration (thus faster verification of backups, too), but I'm curious for more detail. Conversely, if the suggestion has no merit and is purely an advanced topic for a possible future feature, perhaps it shouldn't be presented in the article. --Gth-au 04:30, 3 November 2009 (UTC)

Empirical Backup Procedure Needed
The backup procedure described in this article is full of vague statements, suggestions that 'might' be necessary or desirable. Front and center there should be a process for hosted installations to backup files (FTP) and database (SQL). Then, separately, expand on other options as differing methods to achieve the same goal, such as: if the wiki owner has filesystem access, then that enables more than just FTP to backup the files; and, using specific a database engine toolset enables other database backup options, and so on. I'm interested in assisting in improving this area of the documentation, but is this the appropriate place to brainstorm it out? Is there a beta discussion area to present test backup results etc. that would be more appropriate? -- Gth-au 03:44, 3 November 2009 (UTC)
 * Well said, I 2nd this request --SomaticJourney 11:43, 19 February 2010 (UTC)
 * I agree as well. This page is all over the place.  Just give me two examples, one for windows and one for other systems.  JB

Empirical Backup Verification Needed
I saw an earlier suggestion for this (see top), but it degenerated into the inherited 'talk vs. discussion' issue. The need for users/admins to be confident their backup has worked is still a clear requirement and a very valuable feature to aim for, IMHO. As the various backup methods resolve down to files, I'd posit that there's two levels of verification desired: (1) confirmation that none of the backup steps ended in error; and, (2) an expanded test to prove the content from the backup repository can be restored to a working system, possibly with some kind of verification tool between the original site and the other. Item (1) can be confirmed with cross-checks such as file counts (per directory & grand total), and database backup could be confirmed via row counts (per table & grand total), log files that contain tasks executed, recorded their actions and their return codes. Item 2 is more involved, but it would be a desirable goal - perhaps more suitable for a page like Manual:Restoring a wiki though? -- Gth-au 03:44, 3 November 2009 (UTC)
 * Yes, very important need. I agree Gth-au --SomaticJourney 11:46, 19 February 2010 (UTC)

Tips for Shared Hosting wikis
For those of us who use Godaddy and other shared hosted sites (issues with permissions). What is the best possible method to backup the wiki? --SomaticJourney 13:04, 19 February 2010 (UTC)

about Backing up the wiki from public caches...
Someone posted about Backing up the wiki from public caches... (added by --Diego Grez return fire 02:51, 26 May 2010 (UTC)) (content restored to Manual:Restoring wiki code from cached HTML}}
 * You haven't explained why you removed this content, so I'm going to restore it. Then, you can add your comments on the discussion page. Jdpipe 08:09, 26 May 2010 (UTC)
 * Because it should be merged there, not on another page anyway. --Diego Grez return fire 18:00, 26 May 2010 (UTC)
 * My content is on a different topic to 'proper' backups... IMHO instead of deleting the content, you should have issued a 'merge request. 150.203.43.41 02:06, 27 May 2010 (UTC)
 * New discussion is at Manual talk:Restoring wiki code from cached HTML Jdpipe 02:33, 27 May 2010 (UTC)

XML dump... images?
Not clear from the main page whether or not the XML dump includes the image files or not...? Maybe it should be explicity stated if the images need to be separately handled? Jdpipe 08:22, 26 May 2010 (UTC)

= 2010 =

Background on Latin-1 to UTF-8 Conversion and Character Set Problems
Problems with the character set (i.e., special characters like umlauts not working) appear to be widespread when converting vom latin-1 to utf-8. The tips on the main page sure helped me a lot. However, to solve my problems I needed more background information. Here I report what I found out.

An encoding (also inaccurately called a character set) describes how a character is represented as one or more bytes. The latin-1 encoding has 256 characters, thus it always uses exactly one byte per character. This does not support the characters of many languages existing on Earth. The utf-8 encoding supports all characters of all languages practically used. It uses one to three bytes per character. Utf-8 encodes the ASCII characters (i.e., A to Z, a to z and the most common punctuation characters) in one byte, and it uses the same value for this as in the latin-1 encoding.

Mediawiki uses the utf-8 encoding, thus allowing all special characters to be used in a wiki page. Internally, Mediawiki stores a page as a string of bytes in its data base. Mediawiki (at least as of version 1.16.0) does not convert the encoding in any way when storing a page in its data base.

The data base must accept a string of bytes from Mediawiki when storing a page, and the data base must return the exact same string when retrieving the page. It does not matter what the data base thinks the encoding of the page is, as long as it returns the same string that was stored. Therefore, Mediawiki can have a page in utf-8 encoding, and it can store it in a data base which thinks the string are characters encoded in latin-1. And historically, exactly this was necessary to do when no data base was available that supported utf-8.

The problems started when an updated MySQL data base did not return the same string of bytes on retrieval. In particular, this can happen after backing up and restoring a MySQL data base. The reasons for this behaviour are as follows.

One popular way of backing up a MySQL data base is to use the program. It generates SQL code that describes the contents of the data base. If you feed this SQL code back into a data base, for example by using the code as the standard input for the program, then the data base contents will be recreated.

One problem with the / approach is that the SQL code of course contains the special characters from the wiki pages, but that the SQL code does not specify the encoding used for wiki page table content. (To be more precise, this is true only for tables using the InnoDB engine. Tables using the MyISAM engine get a pure ASCII representation immune to encoding problems.) Therefore, backup/restore work only if both sides use the same encoding.

A harmless side effect is that the generated SQL code can contain broken special characters even if everything works. This happens when the data base internally stores the wiki pages in latin-1 encoding, but the default encoding for talking to data base clients like  is utf-8. Then the MySQL server "converts" the bytes in the database from latin-1 to utf-8 when dumping and converts them back when restoring. For example, an umlaut character, which is represented in utf-8 by two bytes, gets a four-byte representation in the SQL code, which no editor can display correctly. This works because every byte can be interpreted as a latin-1 character (even if it really is one of the several bytes of an utf-8 character).

The change from MySQL version 5.0 to version 5.1 included a change of the default encoding from latin-1 to utf-8.

You can specify the encoding explicitly when you need to move your data base contents to a data base with a different default encoding: mysqldump --user=root --password --default-character-set=latin1 --skip-set-charset wikidb > mywiki.sql (This assumes that your data base is named "wikidb", and that the internal representation is set to latin-1. As a consequence, the MySQL server returns the data as-is, i.e. without any conversion.) You can read  into another data base, which uses utf-8 by default, by typing: mysql --user=root --password --default-character-set=latin1 wikidb < mywiki.sql (Again, this reads the data as-is, because MySQL thinks that no conversion is necessary, since the MySQL code specifies that the individual fields shall be stored in latin-1 representation.)

However, having stored the data unmodified in the utf-8 data base is not sufficient. When the data base server is asked to retrieve a wiki page, it will notice that it is stored in latin-1 encoding, while it talks to its clients in utf-8. Therefore, the data base server will "convert" the data, thus breaking it on delivery.

You can fix this problem by changing the specifications for the internal encoding of the data that are written into the SQL code of a data base dump. Editing the SQL code manually would be tedious and error-prone. A better solution is to use an automated stream editor like, which comes with all Linux/Unix distributions (and with Cygwin on Windows).

The stream editor must find all occurences of latin-1 data base field definitions and replace them. You could choose an utf-8 encoding, but I chose to mark the fields as "binary", i.e. without a specific encoding. The reason is that this is what Mediawiki really puts into the data base. The command line for this is: sed < mywiki.sql > mywiki-patched.sql \ -e 's/character set latin1 collate latin1_bin/binary/gi'

Additionally, you should change the default encoding for each table from latin-1 to utf-8. Therefore, you extend the above command line like this: sed < mywiki.sql > mywiki-patched.sql \ -e 's/character set latin1 collate latin1_bin/binary/gi' \ -e 's/CHARSET=latin1/CHARSET=utf8/gi'

But you should make still some more modifications. As explained on main page, there is a restriction on the length of sort keys that might be violated when a wiki page character is converted from latin-1 to utf-8. (I did not really understand this particular aspect, since the there should not be any actual conversion when things are done as described by me above.) If you don't experience the problem, you might skip the fix, but I suppose it does not hurt to shorten the sort key limit in any case. You can do all substitutions using the following command line: sed < mywiki.sql > mywiki-patched.sql \ -e 's/character set latin1 collate latin1_bin/binary/gi' \ -e 's/CHARSET=latin1/CHARSET=utf8/gi' -e 's/`cl_sortkey` varchar([0-9]*)/`cl_sortkey` varchar(70)/gi' (Note that the regular expression  matches a string of digits of any length. This subsumes the three separate substitutions giveon on the main page.)

Finally, the main page says that the content of the table named  could cause problems and should not be deleted, since it is a cache only and not needed when upgrading. The complete command line including this deletion of  is: sed < mywiki.sql > mywiki-patched.sql \ -e 's/character set latin1 collate latin1_bin/binary/gi' \ -e 's/CHARSET=latin1/CHARSET=utf8/gi' -e 's/`cl_sortkey` varchar([0-9]*)/`cl_sortkey` varchar(70)/gi' \ -e '/^INSERT INTO `math/d' Please note that the example given there (and here) assumes that you have defined an empty table name prefix for your Mediawiki data base tables. If not, you have to prepend that prefix. For example, if your prefix is, you have to write .

I hope this background information is helpful. Please correct any mistakes or omissions. If it helps, someone could link this material from the main page. (It is probably too long to be put there directly.)

Bigoak 09:39, 27 August 2010 (UTC)