The dumpgrepper utility is useful to search XML dumps for specific regexp patterns. With a simple regexp, an enwiki dump can be grepped in ~20 minutes.
npm install -g dumpgrepper
bzcat /path/to/enwiki-latest-pages-articles.xml.bz2 | dumpgrepper '\| *link *='
- New 'insource' regexp search on wikitext of WMF wikis: Example query, Bug.
- User:cscott made a hacked variant that lets you chain conditions, so you can say "pages with this but not that (optionally, on the same line)". See https://github.com/cscott/dumpgrepper. This was just a one-off for a particular wikitext migration; if it is more generally useful it could be cleaned up and merged.