Requests for comment/Text extraction

Currently, Wikimedia sites have API action=query&prop=extracts that can be used when someone wants a text-only extract of page content.

Core integration vs. separate extension
Initially, the extract functionality was located in MobileFrontend for practical reasons - it already had a HTML manipulation framework. However, now that it had been cleaned up and integrated into core (includes/HtmlFormatter.php), there's no reason it shouldn't be moved to some more appropriate location.

Arguments for integration into core:
 * This is a very basic functionality useful for almost every wiki.
 * Core already have HtmlFormatter.

Arguments for creating a separate extension:
 * Keep everything modular.
 * Easier to develop, no need to depend on the pace of core changes.
 * Can easily contain code for WMF-specific extraction process (see below).

Extract storage
Currently, extracts are generated on demand and cached in memcached, however this results in a bad worst-case behaviour when a lot of extracts are needed at once like for queries over several pages or action=opensearch which returns 10 results by default. Text extraction involves DOM manipulations and text processing (tens of milliseconds) and potentially a wikitext parse in case of cache miss (can easily take seconds or even tens of them). Such timing is less than optimal, I propose to extract text during LinksUpdate and store it in page_props. This will allow efficient batch retrieval and 100% immediate availability.