Manual:ImportDump.php/zh


 *  Recommended method for general use, but slow for very big data sets. See #Importing English Wikipedia or other large wikis, below. 

importDump.php file is a maintenance script to import XML dump files into the current wiki. It reads pages from an XML file as produced from Special:Export or, and saves them into the current wiki. It is one of MediaWiki's maintenance scripts, and is located in the maintenance folder of your MediaWiki installation.

If you have shell access, you can call importdump.php from within the maintenance folder like this (add paths as necessary):

php importDump.php --conf ../LocalSettings.php /path_to/dumpfile.xml.gz --username-prefix=""

or this:

where  is the name of the XML dump file. If the file is compressed and that has a  or   file extension (but not   or  ), it is decompressed  automatically.

Afterwards use to import the images:

php importImages.php ../path_to/images

After running importDump.php, you may want to run rebuildrecentchanges.php in order to update the content of your Special:Recentchanges page.

If you imported a dump with the  parameter, you'll need to run rebuildall.php to populate all the links, templates and categories.

Description of operation
The script reports ongoing progress in 100-page increments (by default), reporting the number of pages imported per second for each increment, so you can monitor its activity, and see that it hasn't hung. Can take 30 or more seconds between increments.

The script is robust, as it skips past previously loaded pages, rather than overwrites them, so that it can pick up where it left off fairly quickly after being interrupted and restarted. It still displays progress increments while doing this, which skips by pretty fast.

Pages will be imported preserving the timestamp of each edit. Due to this feature, if a page being imported is older than the existing page, it will only populate the page history, but it won't replace the most recent revision with an older one. If that behavior is not desired, existing pages should be deleted first prior to import, or they'll need to be edited, reverting to the last imported revision found in the page history.

The wiki is usable during the import.

The wiki looks weird missing most of the templates, and with so many red links, but it gets better as the import proceeds.

Examples
or

How to setup debug mode?
Use command line option.

How to make a dry run (no data added to the database)?
Use command line option

Failed to open stream
In case you get an error "failed to open stream: No such file or directory", make sure that the specified file does exist and that PHP has access to it.

Typed
roots@hello:~# php importImages.php /maps gif bmp PNG JPG GIF BMP

Error
> PHP Deprecated: Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/mcrypt.ini on line 1 in Unknown on line 0 > Could not open input file: importImages.php

Cause
Before running importImages.php you first need to change directories to the maintenance folder which has the importImages.php maintence script.

Error while running MAMP
DB connection error: No such file or directory (localhost)

Solution
Using specific database credentials

Importing English Wikipedia or other large wikis
For very large data sets, importDump.php may take a long time (days or weeks); there are alternate methods which can be much faster for full site restoration, see.

If you can't get the other methods to work, here are some pointers for using importDump.php for importing large wikis, to reduce import time as much as possible...

Parallelizing the import
You could try running importDump.php multiple times simultaneously on the same dump, using the option ...

In an experiment on Ubuntu, the script was run (on a decompressed dump) multiple times in separate windows simultaneously using the   option. On a quad-core laptop computer, running the script in 4 windows sped up import by a factor of 4. In the experiment, the   parameter was set   to   pages apart per instance, and the import was monitored (checked on from time to time), to stop each instance before catching up to another.

Note: This experiment was not tried running multiple instances without the "--skip-to" parameter, to avoid potential clashing -- if you try this without  , or you let the instances catch up to each other, please post your findings here. In this experiment, 2 of the windows caught up, and no error messages resulted. The instances of the script appeared to be jumping past each other.

Using   differs from normal operation, in that progress increments are not displayed during the skip, instead, it's just the (blinking) cursor. After a few minutes, the increment reports begin to display.

Data segmentation
It may be a good idea to segment the data first, with an xml splitter, before importing it in parallel. Then run importDump.php on each segment in a separate window, which would avoid potential clashes. (If you successfully split the dump so it works in this process, please post how to, here).

Import the most useful namespaces first
To speed up import of the most useful parts of the wiki, use the   parameter. Import templates first, because articles without working templates look awful. Then import articles. Or, do both at the same time, in multiple windows, as described above, starting templates first, as they import faster and the articles window(s) won't catch up. Note: The main namespace doesn't have a prefix, and so it must be specified using a  . "Main" and "Article" fail to run and return errors.

Once complete, this will necessitate using  again to get the pages in all the other namespaces.

Estimating how long it will take
Before you can estimate how long an import will take, you've got to find out how many total pages are in the wiki you are importing. That is displayed at Special:Statistics in each wiki. As of March 2022, the English Wikipedia had over 55,000,000 pages, including all page types such as talk pages, redirects, etc, but not including pictures ("files").

To see how fast the import is going, go to the page Special:Statistics in the wiki you are are importing into. Note the time and jot down the total pages. Then come back later and see by how much that number has changed. Convert that to pages per day, and then divide that figure into the total pages for the wiki you are importing, to see how many days the import will take.

For example, in the experiment mentioned above, importing using parallelization, and looking at the total pages in Special:Statistics, the wiki is growing about 1,000,000 pages per day. Therefore, it will take around 55 days at that rate to import the 55,000,000 pages (as of March 2022) in the English Wikipedia (not including pictures).

Troubleshooting
See Also: Data dumps/ImportDump.php

If errors occur when importing files, it may be necessary to use the  option.