Wikimedia Product/Machine vision middleware

In Q1 FY 2019-20, the Wikimedia Reading Infrastructure team is designing and building a generalized solution for communicating with machine vision (MV) service providers and storing results for usage on the wikis.

Specifically, we need support for:


 * Requesting MV-generated image metadata from machine vision providers
 * Providing temporary storage for MV results pending human editor verification
 * Serving MV data to Commons users for verification and promotion to Structured Data on Commons
 * Providing the results of human editor verification back to third-party MV providers for model refinement

Use Cases

 * NSFW scoring of Commons images to improve vandalism detection (T214201)
 * MV-generated suggestions for structured "depicts" statements for newly uploaded Commons images (T226119)

Architecture
This functionality will be implemented as a new MediaWiki extension. Provider interaction can be handled in a supporting Node.js service, in a manner similar to cxserver for machine translation, if doing so would meet the standards for external services and provide a substantial advantage in functionality or implementation effort.

Metadata requests
Similarly to cxserver, the middleware will provide a Provider abstraction to be implemented with provider-specific details for each instance of an internal or external machine vision provider. Provider interaction will occur through these per-provider Provider classes.

Model verfication feedback
The middleware will also provide for recording the results of human editor verification of MV-generated data and submitting it back to providers for model refinement. (TODO: Decide on the requirements for considering data "verified," which may vary by specific kind of metadata.)

Storage and lookup
Storage of retrieved metadata will be primarily in a dedicated MariaDBtable per metadata type. The architecture will be a generalization (and, as needed, an extension) of the one currently being designed for NSFW image ratings.

In contrast to Content Translation/cxserver, it is not intended that the MV provider interaction occur synchronously upon user request. Rather, any user interaction will involve data pre-populated on these tables for quick lookup. For example, in the case of "depicts" suggestions on upload, the image upload flow will not block on image label evaluation, but the MV provider request will happen asynchronously, and the user will receive a notification after the response has been received and stored, and the suggestions will be retrieved from a table for use on a special page. Similarly, for NSFW classification, images will be scored asynchronously on upload and the results stored, but the results will not be used until the scored image is used in a page revision, at which point the stored score will be retrieved by AbuseFilter from the dedicated table.

Timeline

 * June 2019: Gather requirements, solicit advice, evaluate potential MV providers for depicts suggestions.
 * July 2019: Confirm approach and choose MV providers for depicts suggestions. Finalize design and begin coding work. Extension on beta by end of month.
 * August 2019: Finish coding, deploy extension to production.
 * September-October 2019: QA and release MV-generated depicts suggestions.