Topic on Project:Support desk

importDump.php Performance

3
203.185.197.46 (talkcontribs)

Hi everyone,

I installed Mediawiki on a DigitalOcean cpu optimized droplet (vCPUs (Intel Xeon CPU E5-2697A v4 @ 2.60GHz), 4GB of RAM, and 25GB of SSD) on Apache/Mysql.

When I ran importdump.php it added around 110 pages per minute. I decided to move to Google Cloud to see if I could speed this process up.

I have two GCE instances. One for the MariaDB database n1-highmem-2 (x2 Haswell vCPUs, 13 GB memory) 50GB SSD Boot disk and 250GB mounted disk for the drive.

Mediawiki (and importDump.php) are sitting on n1-standard-4 (x4 Haswell vCPUs, 15 GB memory)a 50GBSSD boot disk + 50GB mounted disk drive for Mediawiki. It is running Nginx.

However when I run importDump.php it is even slower at 0.12 pages per second. This suprised me as I think this setup is better than the first. The only difference is that on the GCE server I installed a bunch of Lua Modules and Wikipedia templates.

It appears that it is only using one of the cpu cores and only around 30% of it. Is there a way to get it to run all 4 cores? Is it possible that php is limiting the resource available to it?

I have read a lot about importDump.php and most seem to be getting at least 100 per minute. This study could do 1000 per minute (https://pdfs.semanticscholar.org/7074/cd79c572fb6bbba7ca7e6c98bfb661a08573.pdf).

I am going to install redis and see if that helps, do you have any other recommendations on my setup / settings?


Bawolff (talkcontribs)

the best way to optimize anything is to collect profiling data and see where most of the time is spent (See How to debug for info on how to collect profiling data).

I guess the best way to use multiple cpus would be to split the dump up in multiple parts and import in parallel.


Historically the fastest way to import revisions was to use MWDumper plus the sql dumps of link tables that wikimedia provides. This can skip the parse step. However MWImporter isnt really well maintained anymore.

203.185.197.46 (talkcontribs)

Hi Bawolff, thanks for your advice. I did look into MWDumper but it seemed quite buggy so decided to stick with the php dump.

I will look into collecting some profile data and report back on my findings.

After installing Redis it has made a SIGNIFICANT impact. CPU usage is sitting around 80% and it is now importing roughly 90 per minute (much better than the previous 7.2 per minute!)

Splitting the dumps and importing in parallel is some great advice too.

Reply to "importDump.php Performance"