Extension:MachineVision/Developers

This page documents the internals of the MachineVision extension, with a focus on the backend (PHP) logic.

Overview
When a new image is uploaded to Wikimedia Commons, the MachineVision extension triggers a delayed job request to ensure that the image is still present (i.e., not deleted), and if so, request and store image label suggestions generated by one or more machine vision labeling providers. These label suggestions are then filtered and served to reviewers on the Special:SuggestedTags page on Commons. Accepted label suggestions are saved to the image's structured data as depicts (P180) statements.

When label suggestions are received for an image, an Echo event is fired to notify the uploader that image labels suggestions are available for review, according to the uploader's notification preferences.

Concepts

 * Image
 * Label
 * Suggestion
 * Waiting period
 * Filters
 * Review states
 * Concept mapping
 * Priority

Data storage
The extension tables (see Extension:MachineVision/Schema) are hosted in production in the extension1 cluster and wikishared database.

Image and label filtering
Label suggestions have multiple filters applied before storage, and each operates differently.

The first filtering pass, based on, is intended to withhold images completely from being shown on Special:SuggestedTags. If a label in  is among the suggested labels returned for an image, the initial review state for all suggested labels is set to WITHHOLD_ALL, which has the effect of excluding it completely. The image is not shown in either the "popular" or "personal uploads" tab on Special:SuggestedTags. The labels are, however, retained in the database.

The second filtering pass, based on, conditionally withholds images from the "popular" tab. If an image receives a SafeSearch rating that exceeds the allowed value on any of the configured dimensions, it is withheld from the "popular" tab but still available to the uploader in the "personal uploads" tab on Special:SuggestedTags. All suggested labels are retained in the database.

The third and final pass, based on, is intended to discard specific label suggestions judged not to be useful to the projects. Suggestions corresponding to labels in  are simply discarded before the remaining suggested labels are stored.