Topic on Project:Support desk

Unusually Large database: 8GB

8
24.182.18.127 (talkcontribs)

The database in question is more than 8GB in size, which is an unusually high amount of data. Isn't it that generally even larger MediaWiki installations have databases no bigger than 2-3GB. Of the 8GB of data, 5GB of all content is placed in the mz_text table.

What i would like to find out is, how did this come about?

  • Below is the "statistics" of the wiki
Page statistics
Content pages 180,951
Pages
(All pages in the wiki, including talk pages, redirects, etc.)
267,194
Uploaded files 14,476
Edit statistics
Page edits since Portal to The Philippines was set up 667,012
Average edits per page 2.50
88.130.101.238 (talkcontribs)

For each text change, that is saved, MediaWiki creates exactly one new revision. Each revision points to exactly one record in the text table; most likely this will not be an old record, which just can be re-used, but a record will be newly created when the user saves the page. Each entry in the text table holds the complete text of the according page (and not only parts of it).

So if you have 667.012 edits, then you will have around that number of rows in the text table. The size of each of these rows obviously depends on the length of the text. If the text table is around 5 GB = 5.000.000 KB in size, this means that your average text is around 7,5 KB big. I don't know your texts, but that sounds like a possible length to me.

Do you in fact want to reduce the size of the database? If so, this is most likely possible - with and without removing data from it.

24.182.18.127 (talkcontribs)

Yes I really would like to reduce the size of the database without deleting records.

88.130.122.168 (talkcontribs)

As detailed on Manual:Reduce size of the database, you can e.g. compress the contents of the text table and you can compress future revisions. You can also permanently delete archived revisions, which will in fact permanently remove those revisions, which are not visible to the public anyway.

Additionally, you could even remove the page histories - however, that would permanently remove all old revisions and their texts.

To refresh the searchindex (and a few other tables), you can run the maintenance script rebuildall.php. See Manual:Rebuildall.php! Additionally to having a smaller DB size, this might also be useful to get deleted pages removed from your search results.

24.182.18.127 (talkcontribs)

I tried the "compressOld.php" specifying all edits prior to 2015 (2009 to 2014). But that only reduced my database by 500meg. I must be doing something wrong.

I would say that only about 3% of my pages in the wiki were created in 2015. So the compress command should have compressed over 90% of the edits. Yet....

Kuhitkuhit (talkcontribs)

This is the command line that was used:

php compressOld.php -e 20141231235959

Did this command in effect compress all the edits prior to Dec. 31, 2014?

88.130.126.70 (talkcontribs)

This should concatenate all revisions of the same page (except the current revision), as far as they are from before January 1st 2015, into one field inside the text table. The actual compression happens using the serialize() function on the concatenated text.

You can check, if that worked by looking into a few records of the text table: Compressed records should no longer have the text inside the old_text field, but instead they should have a pointer to another row of the text table. This other row should then contain the concatenated and serialized string of all revisions of the page. If this is the case, the script worked correctly.

88.130.126.70 (talkcontribs)

Another aspect might be: After running the script, did you OPTIMIZE the text table? The script basically will change close to every record in the table and make many records smaller. This will fragment the table without end and might also make it slower than necessary. Running an OPTIMIZE query should improve performance; if the table uses InnoDB, the query will recreate the table making the actual size difference visible. It will effectively free fragmented space within the table to the tablespace. Note that during the execution of the query, since a copy of the table is created, you will need up to 5 GB of additional space while the query is running.

Reply to "Unusually Large database: 8GB"