Manual talk:Backing up a wiki

From MediaWiki.org
Jump to navigation Jump to search

Contents

2007[edit]

First Note[edit]

Note: Some of the information in this page was adapted from Manual:Moving a wiki. robchurch | talk 20:59, 11 April 2007 (UTC)

Information needed on how to verify and restore a backup[edit]

Great overview, but so much more needed:
- how to verify our backup or export
- how to restore or import our data again

And we need specific, unambiguous, differentiating definitions of these words. Let me try:

"backup / restore"
file copy, done from outside the program using the operating system utilities
"export / import"
data copy, done inside the program using MediaWiki program interface utilities

... or something like that. Then we can clearly write specific steps (the "by doing what") for each way.

I agree that backing up everything is probably best, but, how does someone KNOW what's been customized and belongs to them, and what's standard and can be replaced from a fresh reinstall of the master software? Does any restore or reinstall routine intelligently preserve existing data? Or, do these process merely overwrite anything in their way? So, if I backup today, then crash tomorrow, and then restore yesterday's backup, and something is missing or it doesn't function properly, what should I do? I'd probably then reinstall from scratch to rebuild an empty MediaWiki structure. Then I'd try restoring from my backup again to fill in the supposedly completely rebuilt but empty structure. What if even that fails? What gets clobbered? How do I "know"?

Should I reinstall an empty MediaWiki and try to "import" data? What if I don't have a data "export", but I only have a file "backup" copy? How do I then reconfigure my customized choices? Do all my users come back with a restore or import?

So, it probably makes sense to preserve an mirror image off line and copy everything or one file at a time from there during troubleshooting if something's not working in the main system. My MediaWiki is small at the moment - ~500,000 words in ~4,500 sections, ALL files = ~120 MB, so making multiple copies to CD daily is acceptable, even ~5 backups to a CD will fit, total materials cost per year of ~$60 or less.

I suggest people try renaming directory structures to pull their main MediaWiki off line and TRY restoring their backup to a fresh directory structure to verify if their chosen backup routine works or not. If not, figure out why not before trusting our backups!

-- Peter Blaise peterblaise 11:21, 19 April 2007 (UTC)

I have moved the paragraph you added for now, as it's not quite correct:
See also Verifying a wiki backup and Restoring a wiki from a backup and Combining wikis from multiple backups. Note also that this page addresses backing up DATA only. Your MediaWiki installation also probably includes much custom configuration in the form of changes to various supporting files that are as yet NOT incorporated into any database table, including CSS cascading style sheet files and PHP script files and JS Java script files. You must backup and restore / re-integrate these as separate steps. (Does someone want to write an extension to import and export all support files into supplemental database tables so everything is all in one place?)
The statement that the page is about data backup only is plain wrong - there's an extra section about backing up files; It could probably be more detailed, though. The only file that is "custom" by default is LocalSettings.php, and the uploaded files in the images directory of course.
Also, while red links are generally a good thing, they should "invite" people to write pages that actually make sense. First of all, those pages should be in the Manual namespace. "Verifying a wiki backup" isn't really possible, or rather, it's the same as Restoring a wiki from a backup (which is a page we should probably write soon). And Combining wikis from multiple backups probably doesn't make sense as a page of its own, and should be addressed in Restoring a wiki from a backup which should also explain what is overwritten when, or not.
There's also Manual:Moving a wiki, with which the contents of future pages should be coordinated.
Sticking everything into the database isn't really possible - the configuration must be outside, because it has to tell mediawiki how to access the database in the first place. Customized JS and CSS can go into the database (as MediaWiki:common.js and MediaWiki:common.css respectively), and that's the preferred way. Skins that need custom PHP code cannot reside in the database, program code needs to be in files (for technical as well as security reasons). Same for extensions. -- Duesentrieb 11:32, 19 April 2007 (UTC)

Import config files, backup, verify, restore, automate[edit]

Peter Blaise says:

Why not auto-import text config files into the database?

Thanks, Duesentrieb. I see your points. Yes, there's mention of some support files, but I have dozens not mentioned! In my pushing for the maturation of MediaWiki, including it's support universe, I suggest that it could use a feature to automatically import copies of our custom config files into the main database even though it needs master copies of those files outside, in the operating system, in order to run properly. Your suggestion of manually creating copies as articles is interesting. How about automation, anyone?

Why not create an auto verify of backup?

I also see that you agree with me that there is no MediaWiki "verify" or "compare" option for backup or export, and as you say, so all we can do is try to restore or import and check it manually (against what, our memory of how the wiki behaved before?). Again, I'm pointing out a difference between "mature" applications we may have experienced before MediaWiki, and MediaWiki. My operating system backup application has a verify / compare feature after backup, MediaWiki has ... what?

Let's call items by their names, not shorthand.

Also, to reduce confusion and invite and enhance the quick success of newcomers, may I suggest sticking to a specific and complete nomenclature? You say you moved my comments to the "talk" pages, which I could not find. But, thankfully, I eventually found your link that brought me to the "discussion" page, which I'd already been looking at, and was already using. Why not call it the "discussion/talk" page? Thanks.

-- Peter Blaise peterblaise 18:02, 20 April 2007 (UTC)

The phrases "talk page" and "discussion page" are synonymous and interchangeable in MediaWiki wiki culture. robchurch | talk 00:54, 29 April 2007 (UTC)
Peter Blaise says:
So ... newbies, non-wiki culture people, are not invited to make the wiki their home? Rob, I'm suggesting that we recognize how offputting and success-inhibiting "jargon" is to newcomers. I suggest that we all foster growth by welcoming newcomers, and welcoming and recognizing criticism, not chiding them for their "not getting it, not fitting in". I'm suggesting that any Wiki is not owned by first comers, but is owned by anyone at any moment who is reading and offering their edits, their input at any time. I think this coincides with the intended Wiki "culture" more so than elitist jargon and belligerent exclusivity of first-comers against newbies.
"Why can't we all just get along?"
-- Rodney King
"Talk" and "discussion" ado not function as synonyms in that they are not interchangeable. Note this page's URL says "talk" yet this page's tab says "discussion". Try typing "discussion" into the URL and try looking for a "talk" tab. No can do. Is it so hard to say, "I moved your comment to the discussion/talk page" AND give a link, rather than presume others can figure out the ambiguity on their own, and then dismiss them when they suggest a way around getting lost?
-- Peter Blaise peterblaise 10:39, 1 May 2007 (UTC)
The problem is that internally, these pages are "talk" pages, and the namespace's english name has always been "talk" (and user_talk, etc). But the (english) text on the tab at the top of each page was at some point decided to be labeled "discussion" instead. This label can be changed by editing MediaWiki:Talk. Most other languages seem to use the equivalent of "Diskussion" on the label, and also as the Namespace name. Some languages may use something else entirely.
This is inconsistent and confusing, yes. But it's a fact that in the context of MediaWiki, "talk" and "discussion" are interchangable; Making "Discussion" an alias for the "Talk" namespace may be nice, but it would break backwards compatibility. This confusion isn't easy to resolve. It's better to explain it and live with it. -- Duesentrieb 12:57, 1 May 2007 (UTC)

"php dumpbackup.php --full" returns "DB connection error: Unknown error"[edit]

Peter Blaise asks:

I run:

C:\www\apache2\htdocs\mediawiki\maintenance>php dumpbackup.php --full
DB connection error: Unknown error

... and when I search the drive for new files, I see nothing's been created. How should an XML export / backup work? HELP, please!
-- Peter Blaise peterblaise 13:34, 20 June 2007 (UTC)

Learn to use a shell. You will notice that the dumpBackup.php script spits out XML. This XML should be saved into a file. The standard means of doing this is to redirect standard output to a file. On Windows (and also on POSIX-compliant shells), this is done using the > operator, e.g.
php dumpBackup.php --full > backup.xml
We expect our users to have at least a basic working knowledge of their computers. robchurch | talk 14:11, 21 June 2007 (UTC)
...and geeks are expected to read carefully before bashing other users: Peters problem is not the output file, his problem is the DB connection error: Unknown error. Got the same under Kubuntu Feisty, seems something is wrong with the out-of-the-box installation. Cheers, 88.73.85.21 13:45, 21 July 2007 (UTC)

I'm getting the same error on an FC5 installation. Does anyone know the cause of this error. So far my Google searched haven't turned up any relevant information. Zeekec 20:32, 1 August 2007 (UTC)

I tried creating an AdminSettings.php file as suggested, but still get the same error. Zeekec 15:27, 2 August 2007 (UTC)

And I'm too...
I've exactly the same error but need dumpBackup.php for Lucene integration.
Can nobody explain the DB connection error: Unknown error- Problem? --11 September 2007

OK Guys,
After big trouble and consideration of this script I've found a solution for this/my and our Problem. The Problem exists, because of the for dumpBackup.php required File "includes/backup.inc". This File does the main-backup-work and uses some MediaWiki-Variables($wg...). This is really no Problem, if dumpBackup.php runs with mediaWiki but as standalone console-script, it will miss this $wg..-Parameters. So dumpBackup.php uses empty strings for $wgDBtype,$wgDBadminuser,$wgDBadminpassword,$wgDBname,$wgDebugDumpSql and this causes the DB connection error: Unknown error while running. I've solved this Problem with a self-written php-wrapper-script, which only initializes this Variables and then simply include dumpBackup.php and now it works fine.

This is my php-wrapper-script:

 <?php
 ## dumpBackupInit - Wrapper Script to run the mediaWiki xml-dump "dumpBackup.php" correctly
 ## @author: Stefan Furcht
 ## @version: 1.0
 ## @require: /srv/www/htdocs/wiki/maintenance/dumpBackup.php

 # The following Variables musst be set, to get dumpBackup.php at work
 $wgDBtype = 'mysql';
 $wgDBadminuser="[MySQL-Username]";
 $wgDBadminpassword ="[MySQL-Usernames-Password]";
 $wgDBname = '[mediaWiki-Database-scheme]';
 $wgDebugDumpSql='true';
 # you'll find this Values in the DB-section into your mediaWiki-Config: LocalSettings.php

 # XML-Dumper 'dumpBackup.php' requires the setted Vars to run
 # simply include the original dumpBackup-Script
 require_once("/srv/www/htdocs/wiki/maintenance/dumpBackup.php");
 ?>

Now you can use this script as like as the dumpBackup.php with exception it will (hopefully) now run correctly. Example: php dumpBackupInit.php --current > WikiDatabaseDump.xml

I hope this will help you. Please excuse my properly bad english

Regards -Stefan- 12 September 2007

Another (simpler) solution.
Simply add the above mentioned variables to you LocalSettings.php
You will notice that most of them are already there. The ones that need to be added are:
   $wgDBadminuser="[MySQL-Username]";
   $wgDBadminpassword ="[MySQL-Usernames-Password]";
Tested to work with MediaWiki 1.11.0
-Rammer- 9 November 2007

Similar problem with dumpBackup[edit]

I have been using MediaWiki for about a year, and have about 20 MB of pages in 1.6.6. I have been backing it up regularly with mysqldump. This computer is still running fine, but I would like to be able to move the data to another computer. I installed 1.10.1 on the new computer, restored the data to mysql, and found that Table 'wikidb.objectcache' doesn't exist (localhost). Rooting around some more, I found out about dumpBackup. On the new computer, php dumpBackup.php --full > wikidump works. On 1.6.6 it does not, and reports (Can't contact the database server: Unknown error). Looking inside dumpBackup, I find, near the beginning:

 require_once( 'command_line.inc' );
 require_once( 'SpecialExport.php' );
 require_once( 'maintenance/backup.inc' );

What does this do? Well, the PHP Manual says that it includes the files, and the link to "require" says that if it does not find them, it will fail. However, looking around for these files, 'command_line.inc' and 'backup.inc' are in 'maintenance', but 'SpecialExport.php' is in 'includes'. But how does php find the files? 'SpecialExport is in a different folder, and the prepended 'maintenance' edge on the third file suggests the program should be run from the root folder for the wiki. In any case, the first and third are inconsistent. It would be very helpful to readers if someone who has actually used dumpBackup can explain how it is supposed to be used, and what sets the environment so that files such as these can be found. The environment is very likely relevant, since the "keys" to the database come from LocalDefaults.

--Docduke 2 August 2007 0147 GMT.
P.S. The gremlins are working overtime tonight. I logged into MediaWiki about a half hour ago, now it won't let me in. I even had it send me a new password, and it won't let me in with that either!
--Docduke 01:16, 2 August 2007 (UTC) [It let me in on a different computer]

Digging deeper ... In version 1.6.1, the "Can't contact the database server" message comes from line 1829 of includes/Database.php. My guess is that the "real" people adapt a copy of "index.php" to initialize the environment, then call dumpBackup from that environment in an automated backup script. Either a copy of that script, or some indication of what is in it, would be greatly appreciated!
Docduke 02:48, 2 August 2007 (UTC)
I still have no idea how dumpBackup sets its environment, but I have found the solution to my problem, and probably that of the other folks who have reported backup failures. It is necessary to create an AdminSettings.php file at the wiki root from a copy of AdminSettings.sample, inserting a valid username, password pair for a DBuser with full privileges. Then dumpBackup runs when started in either the root folder or the "maintenance" folder, in version 1.6.6. It runs in 1.10.1 without an AdminSettings file because the LocalDefaults file has a username, password pair with administrator privileges. [Hint: I tried update.php. It failed, but the resulting error message said to check the AdminSettings file.]
Docduke 03:18, 2 August 2007 (UTC)

back up with phpmyadmin[edit]

If your host will not allow you to access such tools and you can only use phpmyadmin, or if this does not work for you, you might want to:

  1. export your full mysql wiki from phpmyadmin export functionality. Save the exported file locally.
  2. edit the exported file, and change the following if it applies for you:
    1. search and replace to change your tables prefixes (e.g., because prefixing is no more required on your new host)
    2. to work around the "latin1 in mysql > 4.1" character set problem, search and replace latin1 character set with utf8 one's. This might cause some strange behaviors afterwards because I'm not sure that media wiki won't be disturbed by the column encoding changing without warning. But apparently, for me, it works (and I found no other way to do it).
      • please note: as utf8 encoding take more space (three bytes per character) than latin1 (two, I think), some keys might become too large (my mysql installation does not allow keys > 1000 bytes). For these fields I didn't change the encoding (luckily these tables were empty at migration time). You can just do this by trial and error: phpmyadmin will warn you at import time if some key is too big.
    3. you might want to transform the utf8 to latin1 back with ALTER TABLE statements (phpmyadmin can do that for you). This will not revert the changes you just made as this time the contents will be re-encoded also.

Maybe this should be checked by some mediawiki expert and, if judged a good advice, integrated into the manual? I spent a full day searching for this workaround! --OlivierMiR 7 July 2007

Thanks for the idea : worth trying but not sufficient for me though :(
NewMorning 20:10, 16 June 2008 (UTC)

latin1[edit]

The following line contradicts itself, no? Use the option --default-character-set=latin1 on the mysqldump command line to avoid the conversion if you find it set to "latin1". Jidanni 04:46, 13 December 2007 (UTC)

The latin warning has me confused as well. I think that section should be rewritten/clarified by someone who understands it.
Where do I enter 'SHOW CREATE TABLE text'?. It doesn't look like a valid sql command to me and gives me an error when I run it.
You can see which character set your tables are using with a statement like SHOW CREATE TABLE text. The last line will include a DEFAULT CHARSET clause.
--71.107.96.222 17:40, 22 March 2008 (UTC)
No answer still.
I agree with the above:
SHOW CREATE TABLE text; is not a working command
replacing "text" or skipping "text" neither
--Livingtale 09:06, 20 September 2008 (UTC)

Rewrite this page[edit]

This page is written very badly, it jumps all over and I can barely figure out what to do, please make them into more short and simple steps. PatPeter 20:51, 10 December 2007 (UTC)

2008[edit]

Corruption Section for MySQL 4.1 is unclear[edit]

The section that discusses possible corruption due to nonstandard character encoding is unclear. I cannot seem to make out if it is saying that the dump may be corrupted or if my actual database may be corrupted.

It discusses doing a conversion prior to dumping, but does not say if this conversion is reflected back to the DB.

I thank you for the documentation that is here, but could you please clarify this issue. --Vaccano 17:50, 30 January 2008 (UTC)

I agree, this section is not clear at all!
I'm setting up a new wiki, my host has mysql 5.
In phpMyAdmin, here are the server variables I have :
  • character set client = utf8
  • character set connection = utf8
  • character set database = latin1
  • character set filesystem = binary
  • character set results = utf8
  • character set server = latin1
  • character set system = utf8
What should I do ? The connection, client, results and system are utf8 but server is latin1. Do I need a conversion ?
--Iubito 17:24, 1 March 2008 (UTC)
You can check for instant in your "categorylinks" table : mine is full of accent transformed in strange caracters. Impossible to figure out what to do with phpMyAdmin though...
NewMorning 20:10, 16 June 2008 (UTC)

I'm a WikiNewb....help!![edit]

Can I just back up my wamp file and everything in it on an external hard disk?

I mean, the backup that I'm used to consists of moving files to a different disk. I know that there must be more to it than this when it comes to a wiki, but, frankly, I don't know what the heck I'm doing.

I downloaded MediaWiki two days ago to use as a database for personal journals, etc. and know how to edit and create new "articles" within the database. I plan on creating new "articles" everyday for the rest of my life and suspect that my computer's hardware will not last that long. So, I not only want to back up all of my database information, in case of some tragedy, but eventually will want move it to a different computer, altogehter.

What am I to do? --13 June 2008

My table categorylinks is corrupted ![edit]

I can't figure out how this is possible : I first thought that the dump had corrupted the table. But when I checked inside PHPMyAdmin, I realised that the french accents, correctly written in the wiki, where corrupted in the table ! This is annoying since my wiki was supposed to be a test version, and I need to backup it for a new server where strange caracters give strange caracters on screen ! I only have access to PhpMyAdmin, and wonder what I can do with that : any suggestion ? NewMorning 20:18, 16 June 2008 (UTC)

In the end it's not that bad : I extracted my corrupted database and imported as well with bad caracters : they appear correctly in th other wiki ! Strange that I can't have them corrected in the DB though... I also could read the database extraction using a text converter to UTF8, but if inserted corrected in the other DB the wiki sets strange caracters again!
--NewMorning 04:32, 22 June 2008 (UTC)
phpmyadmin probably got it wrong. actually, it doesn't have a way to know how the caracters in the tables are encoded, since mediawiki (per default) stores them as binary.
Generally, be very carfull about character encoding: no matter what client you use (phpmyadmin, php cli client, whatever), mysql nearly always performs some conversion on the characters. which is supposed to make them "look right" for you, but quite often screws things up. -- Duesentrieb 11:37, 22 June 2008 (UTC)
Thanks for answering, I got it through anyway : export was weard (strange accents) but I imported it the same way and it worked ! I used PHPmyAdmin both times, and both time had set everything to UTF8. The export itsefl was readable with a text editor tranforming it to UTF8, but I could not import it afterwards : it was nice in the table, but awful in the wiki ! --NewMorning 22 June 2008

dumpBackup.php seems to be generating invalid xml[edit]

I'm trying to export my MediaWiki content to TWiki and the conversion program fails with "junk after document element at line 9626, column 2, byte 907183 at ... (very long error message)" The problem seems to be that the xml produced by dumpBackup.php is invalid. I tested this by pointing Firefox to the xml dump and it stops at the same place. I've looked at the raw xml code and I don't see anything obviously wrong. Any ideas what might be the problem? I'm using MediaWiki 1.12 and FreeBSD 6.2 --Ldillon 6 August 2008

please run xmllint --noout <filename>
xmllint is standard on most linux distributions, don't know about bsd. if you can't find it, please find some other xml checker and run it over the file. the hope is that it will produce a more meaningful error message. -- Duesentrieb 19:42, 6 August 2008 (UTC)
Also, please provide the xml code around the given location (use head and tail, or something similar) -- Duesentrieb 19:44, 6 August 2008 (UTC)

Here's the output of xmllint --noout (from a Linux box)

MediaWiki_dump.xml:9809: parser error : Extra content at the end of the document
<page>
^

Basically, there is a <page> at line 1, a corresponding </page> at line 9808, and a new <page> at line 9809, where the parser errors. I did a quick grep -c \<page && grep -c \<\/page and I can at least say there are the same number of open and close <page> tags.

    </revision>
  </page>
<page>
    <title>Cacti Information</title>
    <id>3</id>
   <revision>

I'd include more but, even with the code and nowiki tags, this page tries to parse the text. --Ldillon 15:55, 7 August 2008 (UTC)

If the first tag is <page>, then something is very wrong - <page> should not be the top level tag (and there must only be one top level tag in an xml document, hence the error). The first tag in the file should be a <mediawiki> tag, which wraps everything - compare the output of Special:Export/Test. What'S the exact command used to generate these dumps? do you perhaps use --skip-header? That would generate such an incomplete xml snippet. -- Duesentrieb 18:55, 7 August 2008 (UTC)

I've tried a bunch of things to get this working, including the --skip-header and --skip-footer flags, because I was getting header and footer "junk" that I didn't need when I tried to import to TWiki. Sorry if I do not completly understand what the flags are supposed to do; I didn't see any documentation that said otherwise so I took it at face-value. I'm still getting errors on import when i omit the flags, but the xml passes "xmllint --noout" so I guess the problem lies elsewhere. Thank you for your feedback.--Ldillon 21:19, 7 August 2008 (UTC)

Bringing the wiki offline?[edit]

Is it possible to bring a MediaWiki site offline or to put it in a "read only" mode so no changes are made during a database dump? --Kaotic 12:50, 20 August 2008 (UTC)

Manual:$wgReadOnly -- Duesentrieb 13:55, 20 August 2008 (UTC)
I've taken your script [1] and added the ability to place the wiki into read/write mode. let me know what you think. User:Kaotic/WikiBackup --Kaotic 09:51, 21 August 2008 (UTC)

alternative cronjob command[edit]

Apparently I don't have rights to make changes on the page .... Sytange: you can post, and after that everything is disappeared....

Because if you don't have the nice-program the following alternative cronjob:

/usr/bin/mysqldump -u [username] --password=[password] [databasename] | /bin/gzip > [databasename].gz
  • [username] - this is your database username
  • [password] - this is the password for your database
  • [databasename] - the name of your database

don't forget to remove the []

You can use other names for the file: [database].gz and/or put a number before it.

If you use different numbers for each cronjob, you can schedule it (for instance: a cronjob for each day of the week and name it 1[databasename], 2[databasename] etc. For uploading the .gz file with the database manager (in my situation with DirectAdmin) the name of the file is not critical, but it has to be a .gz file. --Livingtale 13:33, 22 September 2008 (UTC)

Files list[edit]

The section about file system should contain a comprehensive list of files and directories to back up, or link to such a list. --Florent Georges 00:13, 17 November 2008 (UTC)

2009[edit]

Charset problem after switching to wgDBmysql5 = true[edit]

Hello,

After years, we finally solve our charset issue and have been able to switch back to "wgDBmysql5 = false".

One of our symptoms was some special characters (like "φ") were converted silently (corrupt) into literal question marks (?).

How we resolved it is documented here in English and in French.

I hope this will help!

Jean-Luc

xml backup/restore + database backup/restore or one of the other?[edit]

I'm confused, should I backup(and restore) both xml and sql database (mysql) or just one or the other?

As per the article, an XML dump does not include site-related data (user accounts, image metadata, logs, etc), so you'd be better off performing an SQL Dump. In a discussion about MediaWiki backups, I'd describe an XML dump as more of an 'Export' and in this sense it can be used as a fallback to save off the rendition of wiki content (with or without historical revisions), but it is not capable of restoring your MediaWiki installation to a running state on a new server, as it existed previously. -- Gth-au 03:44, 3 November 2009 (UTC)

crontab?[edit]

Since the manual are meant for non-experts too, a word like crontab maybe shouldn't be used, or defined clearly. Personally I have no idea what a crontab is.

The word is now linked to the "cron" Wikipedia article. —Emufarmers(T|C) 05:00, 16 May 2009 (UTC)
The explanation of cron is useful, however while a simple backup may be attempted by non-experts, investigations should be someone who is at least familiar with your environment's operating system, database engine, webserver and MediaWiki installation. -- Gth-au 03:44, 3 November 2009 (UTC)

Maintenance / Optimisation / or normal part of Backup?[edit]

Can the reference to data that need not reside in the wiki backup content be clarified? Specifically, this list discussion linked to from the article mentions content that could be rebuilt upon restoration, without explaining how that rebuild would be performed, nor how long that may typically take in a small/large wiki implementation. Comments are also made that during the rebuild period the user experience would be curtailed in some areas (search, whatlinkshere, category views; any other areas?) - would users see a warning that the wiki is undergoing a rebuild and to come back later? What happens to edits made during the rebuild process? I see the reduction of backup data as a useful method of achieving faster backups and restoration (thus faster verification of backups, too), but I'm curious for more detail. Conversely, if the suggestion has no merit and is purely an advanced topic for a possible future feature, perhaps it shouldn't be presented in the article. --Gth-au 04:30, 3 November 2009 (UTC)

Empirical Backup Procedure Needed[edit]

The backup procedure described in this article is full of vague statements, suggestions that 'might' be necessary or desirable. Front and center there should be a process for hosted installations to backup files (FTP) and database (SQL). Then, separately, expand on other options as differing methods to achieve the same goal, such as: if the wiki owner has filesystem access, then that enables more than just FTP to backup the files; and, using specific a database engine toolset enables other database backup options, and so on. I'm interested in assisting in improving this area of the documentation, but is this the appropriate place to brainstorm it out? Is there a beta discussion area to present test backup results etc. that would be more appropriate? -- Gth-au 03:44, 3 November 2009 (UTC)

Well said, I 2nd this request --SomaticJourney 11:43, 19 February 2010 (UTC)
I agree as well. This page is all over the place. Just give me two examples, one for windows and one for other systems. JB

Empirical Backup Verification Needed[edit]

I saw an earlier suggestion for this (see top), but it degenerated into the inherited 'talk vs. discussion' issue. The need for users/admins to be confident their backup has worked is still a clear requirement and a very valuable feature to aim for, IMHO. As the various backup methods resolve down to files, I'd posit that there's two levels of verification desired: (1) confirmation that none of the backup steps ended in error; and, (2) an expanded test to prove the content from the backup repository can be restored to a working system, possibly with some kind of verification tool between the original site and the other. Item (1) can be confirmed with cross-checks such as file counts (per directory & grand total), and database backup could be confirmed via row counts (per table & grand total), log files that contain tasks executed, recorded their actions and their return codes. Item 2 is more involved, but it would be a desirable goal - perhaps more suitable for a page like Manual:Restoring a wiki though? -- Gth-au 03:44, 3 November 2009 (UTC)

Yes, very important need. I agree Gth-au --SomaticJourney 11:46, 19 February 2010 (UTC)

Tips for Shared Hosting wikis[edit]

For those of us who use Godaddy and other shared hosted sites (issues with permissions). What is the best possible method to backup the wiki? --SomaticJourney 13:04, 19 February 2010 (UTC)

about Backing up the wiki from public caches...[edit]

Someone posted about Backing up the wiki from public caches... (added by --Diego Grez return fire 02:51, 26 May 2010 (UTC)) (content restored to Manual:Restoring wiki code from cached HTML}}

You haven't explained why you removed this content, so I'm going to restore it. Then, you can add your comments on the discussion page. Jdpipe 08:09, 26 May 2010 (UTC)
Because it should be merged there, not on another page anyway. --Diego Grez return fire 18:00, 26 May 2010 (UTC)
My content is on a different topic to 'proper' backups... IMHO instead of deleting the content, you should have issued a 'merge request. 150.203.43.41 02:06, 27 May 2010 (UTC)
New discussion is at Manual talk:Restoring wiki code from cached HTML Jdpipe 02:33, 27 May 2010 (UTC)

XML dump... images?[edit]

Not clear from the main page whether or not the XML dump includes the image files or not...? Maybe it should be explicity stated if the images need to be separately handled? Jdpipe 08:22, 26 May 2010 (UTC)

2010[edit]

Background on Latin-1 to UTF-8 Conversion and Character Set Problems[edit]

Problems with the character set (i.e., special characters like umlauts not working) appear to be widespread when converting vom latin-1 to utf-8. The tips on the main page sure helped me a lot. However, to solve my problems I needed more background information. Here I report what I found out.

An encoding (also inaccurately called a character set) describes how a character is represented as one or more bytes. The latin-1 encoding has 256 characters, thus it always uses exactly one byte per character. This does not support the characters of many languages existing on Earth. The utf-8 encoding supports all characters of all languages practically used. It uses one to three bytes per character. Utf-8 encodes the ASCII characters (i.e., A to Z, a to z and the most common punctuation characters) in one byte, and it uses the same value for this as in the latin-1 encoding.

Mediawiki uses the utf-8 encoding, thus allowing all special characters to be used in a wiki page. Internally, Mediawiki stores a page as a string of bytes in its data base. Mediawiki (at least as of version 1.16.0) does not convert the encoding in any way when storing a page in its data base.

The data base must accept a string of bytes from Mediawiki when storing a page, and the data base must return the exact same string when retrieving the page. It does not matter what the data base thinks the encoding of the page is, as long as it returns the same string that was stored. Therefore, Mediawiki can have a page in utf-8 encoding, and it can store it in a data base which thinks the string are characters encoded in latin-1. And historically, exactly this was necessary to do when no data base was available that supported utf-8.

The problems started when an updated MySQL data base did not return the same string of bytes on retrieval. In particular, this can happen after backing up and restoring a MySQL data base. The reasons for this behaviour are as follows.

One popular way of backing up a MySQL data base is to use the program mysqldump. It generates SQL code that describes the contents of the data base. If you feed this SQL code back into a data base, for example by using the code as the standard input for the program mysql, then the data base contents will be recreated.

One problem with the mysqldump/mysql approach is that the SQL code of course contains the special characters from the wiki pages, but that the SQL code does not specify the encoding used for wiki page table content. (To be more precise, this is true only for tables using the InnoDB engine. Tables using the MyISAM engine get a pure ASCII representation immune to encoding problems.) Therefore, backup/restore work only if both sides use the same encoding.

A harmless side effect is that the generated SQL code can contain broken special characters even if everything works. This happens when the data base internally stores the wiki pages in latin-1 encoding, but the default encoding for talking to data base clients like mysqldump is utf-8. Then the MySQL server "converts" the bytes in the database from latin-1 to utf-8 when dumping and converts them back when restoring. For example, an umlaut character, which is represented in utf-8 by two bytes, gets a four-byte representation in the SQL code, which no editor can display correctly. This works because every byte can be interpreted as a latin-1 character (even if it really is one of the several bytes of an utf-8 character).

The change from MySQL version 5.0 to version 5.1 included a change of the default encoding from latin-1 to utf-8.

You can specify the encoding explicitly when you need to move your data base contents to a data base with a different default encoding:

mysqldump --user=root --password --default-character-set=latin1 --skip-set-charset wikidb > mywiki.sql

(This assumes that your data base is named "wikidb", and that the internal representation is set to latin-1. As a consequence, the MySQL server returns the data as-is, i.e. without any conversion.) You can read mywiki.sql into another data base, which uses utf-8 by default, by typing:

mysql --user=root --password --default-character-set=latin1 wikidb < mywiki.sql

(Again, this reads the data as-is, because MySQL thinks that no conversion is necessary, since the MySQL code specifies that the individual fields shall be stored in latin-1 representation.)

However, having stored the data unmodified in the utf-8 data base is not sufficient. When the data base server is asked to retrieve a wiki page, it will notice that it is stored in latin-1 encoding, while it talks to its clients in utf-8. Therefore, the data base server will "convert" the data, thus breaking it on delivery.

You can fix this problem by changing the specifications for the internal encoding of the data that are written into the SQL code of a data base dump. Editing the SQL code manually would be tedious and error-prone. A better solution is to use an automated stream editor like sed, which comes with all Linux/Unix distributions (and with Cygwin on Windows).

The stream editor must find all occurences of latin-1 data base field definitions and replace them. You could choose an utf-8 encoding, but I chose to mark the fields as "binary", i.e. without a specific encoding. The reason is that this is what Mediawiki really puts into the data base. The command line for this is:

sed < mywiki.sql > mywiki-patched.sql \
    -e 's/character set latin1 collate latin1_bin/binary/gi'

Additionally, you should change the default encoding for each table from latin-1 to utf-8. Therefore, you extend the above command line like this:

sed < mywiki.sql > mywiki-patched.sql \
    -e 's/character set latin1 collate latin1_bin/binary/gi' \
    -e 's/CHARSET=latin1/CHARSET=utf8/gi'

But you should make still some more modifications. As explained on main page, there is a restriction on the length of sort keys that might be violated when a wiki page character is converted from latin-1 to utf-8. (I did not really understand this particular aspect, since the there should not be any actual conversion when things are done as described by me above.) If you don't experience the problem, you might skip the fix, but I suppose it does not hurt to shorten the sort key limit in any case. You can do all substitutions using the following command line:

sed < mywiki.sql > mywiki-patched.sql \
    -e 's/character set latin1 collate latin1_bin/binary/gi' \
    -e 's/CHARSET=latin1/CHARSET=utf8/gi' \
    -e 's/`cl_sortkey` varchar([0-9]*)/`cl_sortkey` varchar(70)/gi'

(Note that the regular expression [0-9]* matches a string of digits of any length. This subsumes the three separate substitutions giveon on the main page.)

Finally, the main page says that the content of the table named math could cause problems and should not be deleted, since it is a cache only and not needed when upgrading. The complete command line including this deletion of math is:

sed < mywiki.sql > mywiki-patched.sql \
    -e 's/character set latin1 collate latin1_bin/binary/gi' \
    -e 's/CHARSET=latin1/CHARSET=utf8/gi' \
    -e 's/`cl_sortkey` varchar([0-9]*)/`cl_sortkey` varchar(70)/gi' \
    -e '/^INSERT INTO `math/d'

Please note that the example given there (and here) assumes that you have defined an empty table name prefix for your Mediawiki data base tables. If not, you have to prepend that prefix. For example, if your prefix is mw_, you have to write mw_math.

I hope this background information is helpful. Please correct any mistakes or omissions. If it helps, someone could link this material from the main page. (It is probably too long to be put there directly.)

Bigoak 09:39, 27 August 2010 (UTC)

Ubuntu 10.10 - Step by Step Instructions[edit]

Details[edit]

Howdy folks. Ubuntu has some funky restrictions with it (like cPanel) but it was the distro of choice at the time of launch. I thought I'd drop some step-by-step instructions here for the next Ubuntu user. Hope it helps someone, this is the procedure I've followed so far (my Wiki is only used by about 20 people, so its comfortable for me to work with it manually)

Assumptions:

- You have installed PhpMyAdmin

- Your wiki is running as expected

- You are using Ubuntu 10.10 (should be fine on earlier versions)

- You are using the default directories

- You are comfortable with PHP and Ubuntu


Notes:

- I tested this myself

- Backed up and deleted my SQL tables

- Loaded the backup to make sure it works


Preparing the Wiki[edit]

Step 1 - Turn Wiki to Read-Only

1a - Launch Terminal
1b - Enter: gksu nautilus
1c - Go to /var/www/LocalSettings.php
1d - Add the flag: $wgReadOnly = 'Site Maintenance';


Backup MySQL[edit]

See also: Accessing PhpMyAdmin - [2]

Step 1 - Browse to http://hostname/phpmyadmin

Step 2 - User name and password are needed for the PHP My Admin lgoon

Step 3 - Use the table you created for the Wiki

Note: This logon information can be found in your LocalSettings.php

See also: Backing up the Database - [3]

Step 4 - Select Export

Step 5 - Make sure all items under Export are highlighted

Step 6 - Make sure Structure is highlighted (important to maintain the table structure)

Step 7 - Make sure that Add DROP TABLE is checked (ease of import - tells MySQL to delete existing references when importing)

Step 8 - Make sure Data is checked

Step 9 - Selected "zipped"

Step 10 - On the bottom right select GO

Step 11 - Save backup on the backup machine


Restore MySQL[edit]

See also: Restoring the Database - [4]

Step 1 - Browse to http://hostname/phpmyadmin

Step 2 - User name and password are needed for the PHP My Admin lgoon

Step 3 - Use the table you created for the Wiki

Note: This logon information can be found in your LocalSettings.php

Step 4 - Select Structure > localhost > Your_Table

Step 5 - Select CheckAll

Step 6 - From the drop down select Drop > Ok (the old/corrupt table is now gone)

Note: If you try to access the Wiki, it should say: MediaWiki internal error. Exception caught inside exception handler.

Step 7 - Click on Import, select Browse, pick your SQL file and import it.

Note: You will need to uncompress the SQL file from the .ZIP

Step 8 - Press Go

Step 9 - Verify your Wiki is restored


Backup File System[edit]

Step 1 - Launch: gksu nautilus

Step 2 - Browse to /var/www/

Step 3 - Right click on folder mediawiki and select Compress

Step 4 - Save file - and the backup can now be downloaded through the browsers root to the backup machine by appending the filename to the url

Example: Hostname = mediawiki

Download: http://mediawiki/mediawiki_backup_30-12-2010.tar.gz


Restore File System[edit]

Step 1 - Launch: gksu nautilus

Step 2 - Move the faulty mediawiki folder out of /var/www

Step 3 - Extract the backup to /var/www/mediawiki

Thanks to SpiralOfYarn for the handy links.

Cheers, KermEd 15:28, 30 December 2010 (UTC)

Avoid phpmyadmin 'gzipped'[edit]

As I commented here, I found phyMyAdmin would fail silently half way through exporting, if I set compression to 'gzippped'. I notice the above advice mentions 'zipped' format, which may suffer form the same problem. The problem seemed to crop up during export of the binary BLOB fields of the text table. In my case this was several gigabytes of data, but seemed to fail fairly near the beginning of that table. Exporting without compression worked fine. -- Harry Wood (talk) 02:46, 8 October 2012 (UTC)

Clarification on using Charco needed[edit]

Step 2 of the conversion under Windows says to use Charco to convert the database. That's great, but Charco doesn't have a single 'latin1' option. It has 'DOSLatin1', 'ISOLatin1', and 'WindowsLatin1'. I suppose I could take a guess and go with 'WindowsLatin1' since the wiki was created on a Windows system, but I think I'd rather be safe than sorry and decided to ask first with of the 3 would be the one to use. --Korby (talk) 11:22, 17 October 2012 (UTC)

Tables[edit]

Is section Tables still relevant? Maybe it would be useful to have an explanation of which character sets to look out for, by MediaWiki version. When installing 1.19.2 -- 1.20.2 there is no option for latin1, it's utf8 or a default of binary. --Robkam (talk) 22:16, 22 December 2012 (UTC)

Character sets[edit]

I've moved some overly detailed information here that used to be on the main page. Graham87 (talk) 14:05, 19 June 2013 (UTC)

Character set[edit]

Warning Warning: In some common configurations of MySQL 4.1 and later, mysqldump can corrupt MediaWiki's stored text. If your database's character set is set to "latin1" rather than "UTF-8", mysqldump in 4.1+ will apply a character set conversion step which can corrupt text containing non-English characters as well as punctuation like "smart quotes" and long dashes used in English text.

You can see which character set your tables are using with a mysql statement like SHOW CREATE TABLE text; (including the semicolon). The last line will include a DEFAULT CHARSET clause.

If the last line does not include a DEFAULT CHARSET clause then there is another way if you know that nobody has changed the character set of the database server since it was installed and the wiki's database was created using the default character set of the database. The STATUS command displays the database server's default character set next to Server characterset:. Here is an example output:

mysql> status
- - - - - - - - -
mysql  Ver 12.22 Distrib 4.0.20a, for Win95/Win98 (i32)

Connection id:          13601
Current database:
Current user:           root@localhost
SSL:                    Not in use
Server version:         4.0.20a-nt
Protocol version:       10
Connection:             localhost via TCP/IP
Client characterset:    latin1
Server characterset:    latin1
TCP port:               3306
Uptime:                 27 days 4 hours 58 min 26 sec

Use the option --default-character-set=latin1 on the mysqldump command line to avoid the conversion if you find it set to "latin1".

Like this:

nice -n 19 mysqldump -u $USER -p$PASSWORD --default-character-set=$CHARSET $DATABASE -c | nice -n 19 gzip -9 > ~/backup/wiki-sql-$(date '+%a').sql.gz

Also one can try --default-character-set=binary . “Convert latin1 to UTF-8 in MySQL” on Gentoo Linux Wiki has more information.

Latin-1 to UTF-8 conversion[edit]

In the following I intentionally use different input and output file names for commands using sed because the -i (inplace) option of sed throws problems on very big dumps. The described procedure was used several times and works 100% reliably. The steps do not change your existing database. You can use the old wiki with the old database until your new wiki runs with the new database, the UTF-8 copy clone of the old one. This section is contributed and updated by --Wikinaut 10:31, 18 February 2010 (UTC). Feedback is welcome.


When you want to upgrade from a rather old Mediawiki installation with Latin-1 to UTF-8 which might be tricky depending on your operating system and MySQL settings - in my example from Mediawiki 1.5 (2004) to 1.15.1 (2009) - perform the following steps as found in the article Convert a MySQL DB from latin1 to UTF8 and further adapted to Mediawiki specialities (DBNAME is the name of your wiki database):

mysqldump -u root -p --opt --default-character-set=latin1 --skip-set-charset DBNAME > DBNAME.sql

Then use sed to change character settings latin1 to utf8:

 sed  -e 's/character set latin1 collate latin1_bin/character set utf8 collate utf8_bin/g' -e 's/CHARSET=latin1/CHARSET=utf8/g' DBNAME.sql > DBNAME2.sql

Every character in UTF-8 needs up to 3 bytes.
Relevant sources:

A further problem which prevents reimporting the database was the math table (ERROR line 389: Duplicate entry '' for key 1 when trying to import the mysqldump). I solved it by simply deleting the math table content, as this is only a cache and need not to be imported when upgrading.

There may also be issues with custom tables introduced by extensions which may prevent the sed command from changing the information as required for them too. As a result an error like e.g. ERROR 1253 (42000) at line 1198: COLLATION 'latin1_general_ci' is not valid for CHARACTER SET 'utf8' may occur on importing the database dump.

sed -e 's/`cl_sortkey` varchar(255)/`cl_sortkey` varchar(70)/gi' DBNAME2.sql > DBNAME21.sql
sed -e 's/`cl_sortkey` varchar(86)/`cl_sortkey` varchar(70)/gi' DBNAME21.sql > DBNAME22.sql
sed -e 's/`cl_sortkey`(128)/`cl_sortkey`(70)/gi' DBNAME22.sql > DBNAME23.sql
sed -e '/^INSERT INTO `math/d' DBNAME23.sql > DBNAME3.sql

From here I then created a new database DBNEW and then imported the dumpfile

mysql -u root -p -e "create database DBNEW"
mysql -u root -p --default-character-set=utf8 DBNEW < DBNAME3.sql

Now start a fresh MediaWiki installation and use your new wiki database name DBNEW - actually the UTF-8 converted copy of your untouched old DBNAME wiki - and the database copy will be automatically upgraded to the recent MediaWiki database scheme. Several successful conversions from MediaWiki 1.5 to MediaWiki 1.15.1 under PHP 5.2.12 (apache2handler) and MySQL 4.1.13 have been made.

Latin-1 to UTF-8 conversion under Windows[edit]

  1. Dump your Database as usual.
  2. Convert your Database using the character set conversion utility Charco
  3. Replace all latin1 thru utf8 inside the dump.
  4. Import the dump into a new DB or overwrite the old.
  5. Ready

Tested under WindowsXP. Mediawiki 1.13.2 dumped under EasyPHP 1.8.0.1. Converted with Chargo 0.8.1. Imported to XAMPP 1.7.3. Updated to Mediawiki 1.15.1.

Latin-1 to UTF-8 conversion under Mac[edit]

  1. First export your Database as usual, separated into schema and data. You can use the terminal command mysqldump, which the official installer places in /usr/local/mysql/. Note that in the following lines, the lack of spaces between -u and username and -p and password is deliberate:
    • ./mysqldump --default-character-set=latin1 --skip-set-charset -d -uuser -ppassword DBNAME > ~/db_schema.sql
    • ./mysqldump --default-character-set=latin1 --skip-set-charset -t -uuser -ppassword DBNAME > ~/db_data.sql
  2. The database exports are now in your personal folder. Convert both exports with Charco from ISOlatin1 to UTF-8. Append "_utf8" to the output file names and fix the .txt extension that Charco enforces back to .sql.
  3. Open the file ~/db_schema_utf8.sql with Text Editor and replace each "DEFAULT CHARSET=latin1" phrase with "DEFAULT CHARSET=utf8"
  4. Make a new database using Sequel Pro, with encoding "UTF-8 Unicode (utf8)".
    • Import the ~/db_schema_utf8.sql file into your new database
    • Import the ~/db_data_utf8.sql file into your new database
    • ensure that your wiki user has access to the new database by adding a relevant line in your MYSQL database in the DB table
  5. Change the variable $wgDBname in your LocalSettings.php to reflect the name of the new database. Then test if everything works. If not, flip back to the old database and try a different method.
  6. (Optional) Delete the old database, and (also optional) rename the new database to the old database and revert the change in $wgDBname.

This sequence was adapted from Khelll's Blog, and used for Mediawiki 1.19.2 and MySQL 5.1.57. It will fix encodings that already show up as garbled under an updated wiki installation as well.

Repairing corrupted character sets[edit]

In case your database's character set got corrupted (see warning above), an easy way to fix the corrupted characters and remedy the situation for future backups has been posted in this source

Directly changing all latin1-encoded columns to UTF-8 won't help, as MySQL will just transform the erroneous characters directly. The remedy is to change the wrongly encoded latin1 string type (char/varchar/TEXT) into a binary type (binary/varbinary/BLOB). A conversion into a UTF8-encoded string type (char/varchar/TEXT) will then fix all your previously erroneous characters to their proper representation.

In short: latin1 char/varchar/TEXT -> binary/varbinary/BLOB -> UTF8 char/varchar/TEXT

Also don't forget to change the default charset for your database and the single tables to UTF-8, so your character sets won't get corrupted again.


Unclear instructions[edit]

"Other parameters might be useful such as ..." It would be far better if someone would explain exactly why these parameters would be useful.

How do I download a wiki from Special:Export?[edit]

I do a lot of work on a wiki. The guy who owns it hasn't been around for the past few years, and the website expires this November. How can I use Special:Export in October to download the entire wiki, so that I can recreate it if he turns out to have died or something? Edit: Or more regular backups -- I don't know when the hosting will run out. Banaticus (talk) 00:42, 5 August 2017 (UTC)

Whoops... well, using Special:Export you can export pages given a list of pages or categories. Get a list of pages first (looking at Special:AllPages) and paste them in the input textarea. Note, however, that it won't export images (only the file description pages, but not the files themselves), nor logs. There's a different approach, which will recreate all the possible details of the wiki on a new host, including logs, files and even deleted pages if you have a user with the required permissions: This is Grabbers. I'm now involved in moving an entire wiki from wikia outside of it, and I'm fixing some of those scripts. I'll update the repository when I finish (probably before the end of this month). Feel free to ping me if you need assistance with this. --Ciencia Al Poder (talk) 09:14, 7 August 2017 (UTC)