Wikimedia Product/Machine vision middleware

In Q1 FY 2019-20, the Wikimedia Product Infrastructure team is designing and building a generalized solution for communicating with machine vision (MV) service providers and storing results for usage on the wikis.

Specifically, we need support for:

Requesting MV-generated image metadata from machine vision providers
Providing temporary storage for MV results pending human editor verification
Serving MV data to Commons users for verification and promotion to Structured Data on Commons
Providing the results of human editor verification back to third-party MV providers for model refinement

Use Cases[edit]

MV-generated suggestions for structured "depicts" statements for newly uploaded Commons images (phab:T226119)
Storage for arbitrary numeric scores from MV image classifiers

Architecture[edit]

This functionality will be implemented as a new MediaWiki extension (MachineVision). If necessary, provider interaction can be handled in a supporting Node.js service, in a manner similar to cxserver for machine translation, if doing so would meet the standards for external services and provide a substantial advantage in functionality or implementation effort.

Provider interaction[edit]

Metadata requests[edit]

The middleware will provide a Provider abstraction to be implemented with provider-specific details for each instance of an internal or external machine vision provider. Provider interaction will occur through these per-provider Provider classes.

Model verfication feedback[edit]

The middleware will also provide for recording the results of human editor verification of MV-generated data and submitting it back to providers for model refinement. (TODO: Decide on the requirements for considering data "verified," which may vary by specific kind of metadata. phab:T227341)

Storage and lookup[edit]

Storage of retrieved metadata will be primarily in a dedicated MariaDB table per metadata type.

In contrast to Content Translation/cxserver, it is not intended that the MV provider interaction occur synchronously upon user request. Rather, any user interaction will involve data pre-populated on these tables for quick lookup. For example, in the case of "depicts" suggestions on upload, the image upload flow will not block on image label evaluation, but the MV provider request will happen asynchronously, and the user will receive a notification after the response has been received and stored, and the suggestions will be retrieved from a table for use on a special page where the suggestions can be confirmed or rejected.

Timeline[edit]

June 2019: Gather requirements, solicit advice, evaluate potential MV providers for depicts suggestions.
July 2019: Confirm approach and choose MV providers for depicts suggestions. Finalize design and begin coding work. Extension on beta by end of month.
August 2019: Finish coding, deploy extension to production.
September-October 2019: QA and release MV-generated depicts suggestions.