Compression

mo: prerequisite software Data Compression as it relates to mediawiki.

what is compression
Data_compression

Compression of dumps
The WikiMedia Database is quite large, so wikimedia compresses the database dumps using bzip2.

Output compression
It is possible, with HTTP, to compress individual pages served. Both the browser and the server must support it, and it is normally negotiated, (with uncompressed version available). This is on by default if PHP has zlib support enabled (no Apache mods are required). The negligible CPU time spent compressing on the server is dwarfed by things like loading the PHP scripts, and the bandwidth savings are considerable.

Compression of articles
Anthony DiPierro is looking into the feasibility of using Huffman coding. A preliminary article space character count produced huffman codes which would give a 35% compression.

On or about 2004-02-20 the old table and archive table were changed to allow some articles in the history table to be compressed. Old entries marked with old_flags="gzip" have their old_text compressed with zlib's deflate algorithm, with no header bytes. PHP's gzinflate will accept this text plainly; in Perl etc set the window size to -MAX_WSIZE to disable the header bytes.

History compression
It is also possible to compress the history table in a way which exploits the similar data in the different versions, such as Reverse diff version control. See History compression for some actual numbers.

Cache compression
File cache talks about compression in the cached copies of pages. Now that the Wikimedia projects use squids, it's unclear how much of this is obsolete.