Parsoid/DumpGrepper

The dumpgrepper utility is useful to search XML dumps for specific regexp patterns. With a simple regexp, an enwiki dump can be grepped in ~20 minutes.

The grepper operates on actual wikitext (with XML encoding removed), so there is no need to complicate regexps with entities. It supports JavaScript RegExps.