Thread:Talk:Requests for comment/Text extraction/Some notes

Personally even if this was implemented in a TextExtraction extension instead of core (though I think it should be implemented in core) I wouldn't want Wikimedia specific stuff in the generic MediaWiki extension. ie: I'd prefer that in both situations WMF would have a WikimediaTextExtraction extesion.

page_props is for storage of indexed and queryable data that results from the canonical parse run. ie: Something should only ever be stored there when there is also an equivalent parser cache entry.

page_props is for data you want to be able to query for not for storage. Since you're not going to be making SQL queries trying to match extraction results the extraction data should be stored in the parser cache using either ParserOutput::setExtensionData or adding a new prop + methods to ParserOutput instead.

Alternatively if you want to do this completely separate from the parser cache the proposed DataStore would probably be the best method of storage.