Structured Data Across Wikimedia/Image Suggestions

This page describes the work underway to design and build image suggestion features for experienced users, which is a tool currently in development by the sdaw>Special:MyLanguage/Structured Data Across Wikimedia|Structured Data Across Wikimedia team.

This work will build on the growth>Special:MyLanguage/Growth/Personalized first day/Structured tasks/Add an image|work already begun as part of the “Add an image” structured task project. However, its focus will be shifted towards improving the processes for experienced contributors. In particular, we will target users who have edited or watched a particular article or set of articles, since they are likely to be experts in the topic and to have interest in seeing that article(s) improve.

After collecting initial feedback from several communities, the project is now moving to a first test stage, that is experimenting with the use of notifications to alert users of potential useful images for Wikipedia articles.

Background
The image suggestion UI is a key component of the SDAW project, aimed at developing systems for structured data across all Wikimedia projects.

Images are key for illustrating concepts and helping people understand subjects. Considering that Wikimedia Commons contains 65 million images, we believe that it is possible to make Wikipedias substantially more illustrated with Commons images. We believe that Structured Data can open an enduring pipeline for enriching content between Commons and Wikipedia. This will help us, in turn, to grow and diversify contributors, improve content for readers, and narrow gaps in content.

Despite that, in many Wikipedias more than half of the articles have no images. This is mainly due to the complexity of the current workflow of adding media and making connections between content and images. We want to make this process easier.

Where are we starting from
As we already said, the tool will build on the work already done for the growth>Special:MyLanguage/Growth/Personalized first day/Structured tasks/Add an image|“Add an image” structured task project. The api>Core Platform Team/Initiatives/Image Suggestion API|Image Suggestion API, built by the Platform Engineering team, combines the results of the Image Suggestions Algorithm and MediaSearch to provide suggestions for images matches to unillustrated articles, using the following approach:
 * 1) Look at the Wikidata item for the article.
 * 2) If it has, suggest that image.
 * 3) If it has, suggest an image from the category.
 * 4) Look at the articles about the same topic in other language Wikipedias. Suggest a lead image from those articles.
 * 5) Search MediaSearch on Commons for the title of the article, which combines traditional text-based search with structured data from Commons and Wikidata. If an image ranks high enough in the results, suggest that image.

In initial tests, the combined algorithms can suggest images for up to 40% of all unillustrated articles on a given Wikipedia. We are currently doing further testing of the accuracy of the matches. Also, the Android and Growth teams are testing ideas that use the Image Suggestion API to allow newcomers to add images to articles via the growth>Special:MyLanguage/Growth/Personalized first day/Structured tasks/Add an image|suggested tasks interface.

We are also experimenting with adding the results from the Image Suggestions Algorithm directly to MediaSearch. We hope that will simplify the process technically and improve MediaSearch results. See on Phabricator for more information.

What we want to do
The project is currently experimenting with an approach based on notifications. The goal is to embed the suggestions in the user’s existing Wikipedia activities through weekly notifications, thus increasing the likelihood they will review such suggestions and add selected images as part of their current editing workflow. Contributors can choose to edit via Wikitext or Visual Editor, and can review the image and the article information in the notification.

Tentative workflow
The following is the current tentative workflow we defined for this stage of testing. Wherever appropriate, there is a link to the related task on Phabricator.
 * 1) Notifications are sent weekly to all users who have at least 500 edits at a predefined day and time (e.g. every Monday at 08:00 AM UTC), all across the globe → see  on Phabricator
 * 2) Notification includes a link to user’s preferences to allow users to opt out of the notifications → about opting-out, see  on Phabricator
 * 3) * “Image Suggestions” options would be added to bottom of the list of users’ opt-in/opt-out notifications
 * 4) Suggestions are selected randomly from the list of matched recent images to unillustrated articles, using the algorithms explained above
 * 5) Users are selected randomly from a group of users that:
 * 6) * have at least 500 edits on the project
 * 7) * did not opt-out of the notification
 * 8) * have received up to 2 image notifications in a given week
 * 9) The tool will check the user’s watchlist for articles edited in the last 30 days
 * 10) * If the user has already received a notification for the article ID, the tool skips to the next article
 * 11) * Otherwise, it matches one suggestion and sends the notification
 * 12) If image matches remain, then the tool checks for other articles on users watchlist
 * 13) The notification process will be generated weekly, until image matches are exhausted or applicable users are exhausted
 * 14) * Notifications for a particular article-image match notification will only be shown once to a particular user
 * 15) * The same match can be sent to multiple users to review (except in the case the image has been inserted)

User experience
Based on the information provided in the notification, the user can:
 * 1) go through their normal image addition workflow (e.g. choose to insert the image with wikitext or Visual Editor insert flow)
 * 2) * in this case, just the opportunity to review the match will be provided to the user - no additional help or feature will be provided
 * 3) click on “Review image” in the notification
 * 4) * this will redirect the user to the image on Commons
 * 5) click on “Review article” in the notification
 * 6) * this will redirect the user to the article on Wikipedia

Ideas for the future
The following bullet list lists all the ideas that are out of scope of the current test stage, but might be worked upon in the future:
 * Suggestions to users who have uploaded images on Commons that match articles
 * Suggestions given imgsugg2>Structured Data Across Wikimedia/Image Suggestions/2021-02|in other ways besides via notifications (e.g. suggestions in the image search dialogue in VisualEditor)
 * Illustrated articles
 * Section level image suggestions
 * A tool to help users add images to the article
 * A landing page that lets users review multiple suggestions at once
 * Limiting notifications only to users who have a history of adding images to articles in the last 30 days

Metrics and analytics
We are planning on measuring the following metrics, to analyse the performance of the current testing and determine whether the tool is successful:
 * 1) Number of notifications sent
 * 2) Number of image suggestions notifications opened (measuring engagement with notifications)
 * 3) Number of opt-outs (low number of opt-outs = notifications are useful)
 * 4) Number of images suggested that are added to the matched article within a month of receiving the notification
 * 5) Number of suggested images not reverted from their matched article (low revert rate = good quality of suggested matches)

What we don’t want to do

 * Create a new tool that will go unused due to lack of incentive for use
 * Annoy users with too many notifications
 * Encourage edits that will overwhelmingly be reverted
 * Encourage edits that go against existing policies and/or practises (NPOV, original research...)
 * Encourage edits that introduce additional bias in the article

Feedback
Project feedback is and will always be welcome. We are especially interested in your ideas, and we are looking forward talk>Talk:Structured Data Across Wikimedia/Image Suggestions|to hearing from you on the talk page about the following open questions:
 * 1) What is your opinion about the approaches outlined above?
 * 2) Should we be helping editors with image placement location?
 * 3) How can we help users make sure they are following the conventions of a particular wiki when choosing and placing an image?
 * 4) How can we help users add appropriate captions?
 * 5) How can we help users add appropriate alt-text?

Consultations

 * August 2021: imgsugg3>Structured Data Across Wikimedia/Image Suggestions/Feedback August 2021|First round of feedback (imgsugg4>Structured Data Across Wikimedia/Image Suggestions/Feedback Wikipedia|original RfC for Wikipedia, imgsugg5>Structured Data Across Wikimedia/Image Suggestions/Feedback Commons|original RfC for Wikimedia Commons)

What is the Image suggestion tool about?
The image suggestion tool is a key component of the Structured Data Across Wikimedia project, and it aims to make it easier for users to find potential images and media for currently unillustrated articles.

Does the Image suggestion tool somehow intersect with the “Add an image” tool from the Growth Team?
Technically yes. The two tools share the same algorithm, but they also serve different purposes:
 * the “Add an image” tool is intended for newcomers and less experienced users, who have little to no experience with adding images;
 * the Image suggestion tool is intended for more established users, who already have experience with adding images and other media to articles (i.e. users with more than 500 edits).

What is the relation between the Image suggestion tool and Wikidata/Structured Data on Commons?
This tool will leverage the data coming from Wikidata and Structured Data on Commons to find potential media to add on unillustrated Wikipedia articles.

More specifically, the tool will look at the relative Wikidata item, and will check if it has an image (through p18>:d:Property:P18|Wikidata property P18) or an associated Commons category (through p373>:d:Property:P373|Wikidata property P373). If in both cases no potential images are found, it will look at other Wikipedia articles in other languages to see if there is a lead image to be found. Finally, the tool will search MediaSearch on Commons for the title of the article, and if an image ranks high enough in the results, the tool will choose that image.