Handbuch:ImportDump.php
| MediaWiki-Datei: importDump.php | |
|---|---|
| Speicherort: | maintenance/ |
| Quellcode: | master • 1.45.3 • 1.44.5 • 1.43.8 |
| Klassen: | BackupReader |
- Empfohlene Methode für den allgemeinen Gebrauch, aber langsam für sehr große Datensätze. Siehe Importieren von der englischsprachigen Wikipedia oder andere große Wikis unten.
importDump.php ist ein Wartungsskript, um XML-Dump-Dateien in ein vorhandenes Wiki zu importieren. Es liest die Seiten aus der XML-Datei, die von Special:Export oder dumpBackup.php erzeugt wurden und speichert diese im aktuellen Wiki. Es handelt sich um eines der Wartungsskripte von MediaWiki, das sich im Wartungsordner der MediaWiki-Installation befindet.
Nutzungsbeschreibung
Das Skript berichtet über den laufenden Fortschritt in 100-seitigen Inkrementen (Standard), wobei die Anzahl der für jedes Inkrement importierten Seiten und Revisionen pro Sekunde berichtet wird, sodass Sie seine Aktivität überwachen und sehen können, dass es sich nicht aufgehangen hat. Can take 30 or more seconds between increments.
The script is robust, as it skips past previously loaded pages, rather than overwrites them, so that it can pick up where it left off fairly quickly after being interrupted and restarted. It still displays progress increments while doing this, which skips by pretty fast.
Pages will be imported preserving the timestamp of each edit. Due to this feature, if a page being imported is older than the existing page, it will only populate the page history, but it won't replace the most recent revision with an older one. If that behavior is not desired, existing pages should be deleted first prior to import, or they'll need to be edited, reverting to the last imported revision found in the page history.
The wiki is usable during the import.
The wiki looks weird missing most of the templates, and with so many red links, but it gets better as the import proceeds.
Beispiele
If you have shell access, you can call importDump.php from the command line. (Adjust paths as necessary.):
| MediaWiki Version: | ≥ 1.40 |
# Run from the wiki's root directory (the one containing LocalSettings.php) ...
cd /path/to/wiki
php maintenance/run.php importDump --conf ./LocalSettings.php \
/path/to/dumpfile.xml.gz
| MediaWiki Version: | ≤ 1.39 |
# Run from the maintenance/ directory inside the wiki root ...
cd /path/to/wiki/maintenance
php importDump.php --conf ../LocalSettings.php /path/to/dumpfile.xml.gz
where dumpfile.xml is the name of the XML dump file.
If the file is compressed and that has a .gz or .bz2 file extension (but not .tar.gz or .tar.bz2), it is decompressed automatically.
Benutze anschließend importImages.php, um Bilder zu importieren:
| MediaWiki Version: | ≥ 1.40 |
cd /path/to/wiki
php maintenance/run.php importImages --conf ./LocalSettings.conf \
.../path/to/images/
| MediaWiki Version: | ≤ 1.39 |
cd /path/to/wiki/maintenance
php importImages.php .../path/to/images/
--no-updates for faster import. Also note that the information in Import about merging histories, etc. also applies.After running importDump.php, you may want to run rebuildrecentchanges.php in order to update the content of your Special:Recentchanges page.
If you imported a dump with the --no-updates parameter, you'll need to run rebuildall.php to populate all the links, templates and categories.
Optionen
| Option/Parameter | Beschreibung |
|---|---|
| --report | Gibt Position und Geschwindigkeit nach n Seiten an. |
| --namespaces | Import only the pages from namespaces belonging to the list of pipe-separated namespace names or namespace indexes. |
| --dry-run | Analysiert (parsed) die Eingabedatei, ohne die Seiten zu importieren. |
| --debug | Gibt ausführliche Debug-Informationen aus. |
| --uploads | Process file upload data if included (experimental). |
| --no-updates | Disable link table updates. Is faster but leaves the wiki in an inconsistent state. Run rebuildall.php after the import to correct the link table. |
| --image-base-path | Import files from a specified path. |
| --skip-to | Start from the given page number, by skipping first n-1 pages. |
| --username-prefix | Adds a prefix to usernames. |
FAQ
How to setup debug mode?
Use command line option --debug.
How to make a dry run (no data added to the database)?
Use command line option --dry-run
Error messages
Failed to open stream
Wenn der Fehler "failed to open stream: No such file or directory" auftaucht sollte überprüft werden ob die angegebene Datei existiert und ob PHP Zugriff auf diesen Pfad hat.
Error while running importImages
Typed
roots@hello:~# php importImages.php /maps gif bmp PNG JPG GIF BMP
Error
> PHP Deprecated: Comments starting with '#' are deprecated in /etc/php5/cli/conf.d/mcrypt.ini on line 1 in Unknown on line 0 > Could not open input file: importImages.php
Cause
Before running importImages.php you first need to change directories to the maintenance folder which has the importImages.php maintenance script.
Error while running MAMP
DB connection error: No such file or directory (localhost)
Solution
Using specific database credentials
$wgDBserver = "localhost:/Applications/MAMP/tmp/mysql/mysql.sock";
$wgDBadminuser = "XXXX";
$wgDBadminpassword = "XXXX";
Importieren von der englischsprachigen Wikipedia oder andere große Wikis
For very large data sets, importDump.php may take a long time (days or weeks); there are alternate methods which can be much faster for full site restoration, see Handbuch:XML-Dumps importieren.
If you can't get the other methods to work, here are some pointers for using importDump.php for importing large wikis, to reduce import time as much as possible...
Parallelizing the import
You could try running importDump.php multiple times simultaneously on the same dump, using the option --skip-to...
In an experiment on Ubuntu, the script was run (on a decompressed dump) multiple times in separate windows simultaneously using the --skip-to option.
On a quad-core laptop computer, running the script in 4 windows sped up import by a factor of 4.
In the experiment, the --skip-to parameter was set 250.000 to 1.000.000 pages apart per instance, and the import was monitored (checked on from time to time), to stop each instance before catching up to another.
Hinweis: This experiment was not tried running multiple instances without the "--skip-to" parameter, to avoid potential clashing -- if you try this without --skip-to, or you let the instances catch up to each other, please post your findings here.
In this experiment, 2 of the windows caught up, and no error messages resulted.
The instances of the script appeared to be jumping past each other.
Using --skip-to differs from normal operation, in that progress increments are not displayed during the skip, instead, it's just the (blinking) cursor.
After a few minutes, the increment reports begin to display.
Data segmentation
It may be a good idea to segment the data first, with an xml splitter, before importing it in parallel. Then run importDump.php on each segment in a separate window, which would avoid potential clashes. (If you successfully split the dump so it works in this process, please post how to, here).
Import the most useful namespaces first
To speed up import of the most useful parts of the wiki, use the --namespaces parameter.
Import templates first, because articles without working templates look awful.
Then import articles.
Or, do both at the same time, in multiple windows, as described above, starting templates first, as they import faster and the articles window(s) won't catch up.
Hinweis: The main namespace doesn't have a prefix, and so it must be specified using a 0.
"Main" and "Article" fail to run and return errors.
Once complete, this will necessitate using importDump.php again to get the pages in all the other namespaces.
Estimating how long it will take
Before you can estimate how long an import will take, you've got to find out how many total pages are in the wiki you are importing.
That is displayed at Special:Statistics in each wiki.
As of October 2023, the English Wikipedia had over 59.000.000 pages, including all page types such as talk pages, redirects, etc, but not including pictures ("files").
To see how fast the import is going, go to the page Special:Statistics in the wiki you are are importing into.
Note the time and jot down the total pages.
Then come back later and see by how much that number has changed.
Convert that to pages per day, and then divide that figure into the total pages for the wiki you are importing, to see how many days the import will take.
For example, in the experiment mentioned above, importing using parallelization, and looking at the total pages in Special:Statistics, the wiki is growing about 1.000.000 pages per day.
Therefore, it will take around 59 days at that rate to import the 59.000.000 pages (as of October 2023) in the English Wikipedia (not including pictures).
Anmerkungen
Seit MediaWiki 1.29 (T144600) aktualisiert importDump.php die Statistiken nicht. You should run initSiteStats.php manually after the import to update page and revision counts.
Fehlerbehebung
Siehe auch
- XML-Dumps importieren - für weitere Importmöglichkeiten.
- Handbuch:dumpBackup.php - für Anweisungen beim Erstellen eines Dumps.