Extension:DumpHTML

dumpHTML is an extension for generating a simple HTML dump, including images and media files, of a MediaWiki installation. MediaWiki versions before 1.12.0 used the maintenance script dumpHTML.php instead.

Beware, cowboy!
DumpHTML required a lot of work and is permanently broken since August 2008. It stopped working shortly after it was split from core in 2008; complicated in 2009; worsened by ResourceLoader in 2010 and then in 2011 and later.

The only alive human known to have managed using dumpHTML with success is Kelson, to produce Kiwix ZIM files (with a lot of hacks). There were plans   to fix dumpHTML, but they've been abandoned in 2013.

'''A simple functioning solution to produce static HTML from MediaWiki doesn't currently exist! Modern developers use Parsoid and mwoffliner: .''' Very brave PHP developers willing to fix dumpHTML should probably plan some weeks of work on it; sysadmins may try using the file cache and check the HTML files produced in the cache directory.

Parameters
dumpHTML does not function like a normal extension; you must run it from the command line.

Example to create a complete snapshot including image and media files and image thumbnail files in directory wikidump (LINUX) /usr/bin/php /srv/www/mediawiki/extensions/DumpHTML/dumpHTML.php -d /srv/www/mediawiki/wikidump -k monobook --image-snapshot

Known issues
Warning! This extension is not properly maintained at the moment! You may encounter a number of issues. Any help fixing these (especially by sending patches to Gerrit) is greatly appreciated!

Filename problems solved by a modified version of DumpHTML
If you intend to use the wikidump on a CD/DVD or on a Windows filesystem, and if the wiki pages or files had non-ASCII characters (which is likely) then you probably need to change the link references, the directories, and filenames from UTF-8 to your Windows character encoding (for example to codepage 1252 for Western-European systems), but browsers may still have difficulties accessing the files.
 * fixed via (--munge-title  available munging algorithms: none, md5, windows) in r115629

Bugzilla 8147 "Filenames in the HTML static dump" has a patch for DumpHTML.inc that converts article, image, thumbnail image, and media filenames to their MD5-hashed version, which avoids character encoding problems on different operation systems.

Skin hacking
If you modified your skin (e.g. monobook) then this script will likely fail. Upgrade/update your mediawiki installation and replace any "hacked" skins, then re-try.

Extensions compatibility
For the same reason, some extensions modifying output aren't compatible with DumpHTML, like Extension:SyntaxHighlight_GeSHi.

If you use InstantCommons
If you use your dump on a custom MediaWiki install using InstantCommons, the script will consider your images files are in the images/wikimediacommons folder of the target directory.

Thus, if you encounter a message as: Warning: file_put_contents(/tmp/wiki/images/wikimediacommons/7/75/Live_studio_op_de_Mi_Amigo_kleine.jpg): failed to open stream: No such file or directory in [...]/w/extensions/DumpHTML/dumpHTML.inc on line 1377

You have to download http://upload.wikimedia.org/wikipedia/commons/ 7/75/Live_studio_op_de_Mi_Amigo_kleine.jpg to /tmp/yourdump/images/wikimediacommons/7/75/Live_studio_op_de_Mi_Amigo_kleine.jpg and restart the dump operation.

Static Wikipedia
See http://dumps.wikimedia.org/ and for example http://dumps.wikimedia.org/other/static_html_dumps/ for static snapshot examples. The last HTML dumps there were generated in 2008.