Compression

This page is about data compression as it relates to MediaWiki.

Database dumps
The Wikimedia Database is quite large, so wikimedia compresses the data dumps using bzip2.

Output
It is possible, with HTTP, to compress individual pages served. Both the browser and the server must support it, and it is normally negotiated, (with uncompressed version available). This is on by default if PHP has zlib support enabled (no Apache mods are required). The negligible CPU time spent compressing on the server is dwarfed by things like loading the PHP scripts, and the bandwidth savings are considerable.

Articles
On or about 2004-02-20 the old table and archive table were changed to allow some articles in the history table to be compressed. Old entries marked with old_flags="gzip" have their old_text compressed with zlib's deflate algorithm, with no header bytes. PHP's gzinflate will accept this text plainly; in Perl etc set the window size to -MAX_WSIZE to disable the header bytes.

Page histories
It is also possible to compress the history table in a way which exploits the similar data in the different versions, such as Reverse diff version control. See History compression for some actual numbers.

Cache compression
File cache talks about compression in the cached copies of pages. Now that the Wikimedia projects use squids, it's unclear how much of this is obsolete.