Wikimedia Apps/Team/Android/Add an image MVP



The Android, Structured Data, and Growth teams aim to offer "Add an Image" as a “structured task”. More about the motivations for pursuing this project can be found on the main page created by the Growth team. In order to roll out Add an Image and have the output of the task show up on wiki, a "minimum viable product" (MVP) for the Wikipedia Android app will be created. The MVP will enhance the algorithm provided by the research team and answer questions about behavior usage to further explore the concerns raised by the community.

The most important thing about this MVP is that it will not save any edits to Wikipedia. Rather, it will only be used to gather data, improve our algorithm, and improve our design.

With the Android app being where "suggested edits" originated, and our team has a framework to build new task types easily. The main pieces include:

  • The app will have a new task type that users know is only for helping us improve our algorithms and designs.
  • It will show users image matches, and they will select "Yes", "No", or "Skip".
  • We'll record the data on their selections to improve the algorithm, determine how to improve the interface, and think about what might be appropriate for the Growth team to build for the web platform later on.
  • No edits will happen to Wikipedia, making this a very low-risk project.

The Android team will be working on this in February and March 2021. Our hope is the Growth team will learn enough to deploy the feature on mobile web. Based on the success and lessons of the Growth team's deployment, the Android team will refine the MVP and turn it into a feature that produces edits to Wikipedia.

Product Requirements

As a first step in the implementation of this project, the Android team will develop a MVP with the purpose of:

  1. Improving the Image Matching Algorithm developed by the research team by answering "how accurate is the algorithm"? We want to set confidence levels for the sources in the algorithm -- to be able to say that suggestions from Wikidata are X% accurate, from Commons categories are Y% accurate, and other Wikipedias are Z% accurate
  2. Learn about our users by evaluating:
    • The stickiness of Add an Image across editing tenure, Commons familiarity, and language
    • The difficulty of Add an Image as a task and if we can determine if certain matches are harder than others
    • Learn the implications of language preference on the ability to complete of the task
    • Accuracy levels of users judging the matches because we’re not sure how accurate the users are, we want to receive multiple ratings on each image match (i.e. “voting”).
    • The optimal design and user workflow to encourage accurate matches and task retention
    • What, if any, measures need to be in place to discourage bad matches

How to Follow Along[edit]

We have created T272872 as our Phabricator Epic to track the work of the MVP. We encourage your collaboration there or on our Talk Page.

There will also be periodic updates to this page as we make progress on the MVP.


2021 Jun 25 - Final Report and Next Steps[edit]

The Android team completed the Train Image Algorithm experiment. The findings can be found below. There was enough favorable insights from the experiment that the Growth team decided to proceeds with the next phase of this work. You can read more about the Growth team building a Mobile Web feature to place images in articles on their project page. In the interim, the Android team will sunset the Train Image Algorithm task, and will add an Image Recommendations task to Suggested Edits based on the work from the Growth team.

The two most important questions to answer in making a decision to proceed with image recommendations work for newcomers are around engagement and efficacy. Each of those has more detailed questions underneath.

Engagement: do users like this task and want to do it?

  • Edits per session: do users do many of these edits in a row?
  • Retention: do users return on multiple days to do the task again?
  • Algorithm experience: is the algorithm accurate enough that users feel productive, but not so accurate that they feel superfluous?
  • Qualitative: is there anything we can see about the task in Play Store comments?

Efficacy: will resulting edits be of sufficient quality?

  • Accuracy of algorithm: what is the baseline accuracy before users are involved?
  • Algorithm improvement: what did we learn about the algorithm’s weak points?
  • Judgment: can newcomers identify the good matches from the bad, thereby improving the overall accuracy of the feature placing images on articles?
  • Effort: do newcomers seem to spend adequate time and care evaluating each match?

Engagement: do users like this task and want to do it?[edit]

Edits per session: we want to see users do many of these edits in a row, indicating that they like the task enough to keep on going.

  • On average, they do about 9 annotations per user and 10 annotations per session
  • We want to compare to the other Android tasks, using a 30 day sample of data, only Logged In, Suggested Edit editors.
  • We want to look at these numbers for English and Non-English users, if possible.
  • Note on positive reinforcement: the experience recommends that users do 10 per day as their “daily goal”.  Perhaps the fact that this number is close to 10 is an indication that the daily goal is influencing users. 
Average Edits per Unique User:[edit]
Task All users English Non-English
Image rec 11 11 11
Desc add 20 9 20
Desc change 8 3 8
Img caption add 6 6 6
Img tag add* 7 NA NA
Desc translate 18 11 19
Img caption translate 7 4 7

*Image tag edits are on Commonswiki, we don’t track language for those edits

Retention: we want to see users return on multiple days to do the task again.[edit]

  • Most recent in this Phab comment, on how to make an apples-to-apples comparison between the various Android tasks
  • Using a 30 day sample of data from only Logged In, Suggested Edit editors.
  • We want to compare to the other Android tasks.
  • We want to look at these numbers for English and Non-English users, if possible.
All users[edit]
Task 1 day 3 day 7 day 14 day
Image rec 8.7 % 6 % 3.8 % 1.7 %
Desc add 39.2 % 34.2 % 26.7 % 18.3 %
Desc change 32.6 % 28.0 % 22.8 % 15.9 %
Img caption add 20.7 % 16.5 % 11.8 % 7 %
Img tag add 17.8 % 13.2 % 8.8 % 4.3 %
Desc translate 30 % 23 % 16.1 % 6.1 %
Img caption translate 20.8 % 14.6 % 9.7 % 2.8 %


Task 1 day 3 day 7 day 14 day
Image rec 8.7 % 5.9 % 3.7 % 1.6 %
Desc add 30 % 23.3 % 18.9 % 11.1 %
Desc change 26.9 % 19.2 % 15.4 % 7.7 %
Img caption add 19.1 % 15 % 10.8 % 5.9 %
Desc translate 17.65% 11.8% 5.9% 0
Img caption translate 7.7 % 3.8 % 3.8 % 0


Task 1 day 3 day 7 day 14 day
Image rec 8.1 % 5.7 % 3.6 % 2 %
Desc add 40.1 % 33.9 % 27 % 18.3 %
Desc change 34.5 % 28.7 % 23.2 % 16.7 %
Img caption add 19.4 % 16.2 % 11.6 % 7.7 %
Desc translate 27.7 % 20.9 % 13.1 % 3.9 %
Img caption translate 21.6 % 15.1 % 10.1 % 2.9 %

Algorithm experience: is the algorithm accurate enough that users feel productive, but not so accurate that they feel unnecessary?[edit]

  • If users were saying “yes” or “no” over 90% of the time, we might worry that they’re bored.  If they say “unsure” more than a third of the time, we might worry that they’re frustrated.
  • Users say “yes” 65% of the time, “no” 20% of the time, and “not sure” 15% of the time.  In other words, they perceive the algorithm to be correct about two-thirds of the time, and they’re only unsure rarely. 
  • It would be helpful to find research from the industry or academy on how to think about and tune this ratio.
Response All users English Non-English
Yes 65% 65% 64%
No 20% 19% 22%
Not sure 15% 16% 14%

Efficacy: will resulting edits be of sufficient quality?[edit]

Accuracy of algorithm: what is the baseline accuracy before users are involved?

  • Our best estimate comes from the SDAW test, which tested in six languages, and ranges from 65-80% accurate depending on whether you count “Good” or “Good+Okay”, and depending on the wiki/evaluator (source).
  • The three sources in the algorithm have substantially different accuracy (source) and make up different shares of the coverage (source):
Source Accuracy (good) Accuracy (good+okay) Share of coverage
Wikidata 85% 93% 7%
Cross-wiki 56% 76% 80%
Commons category 51% 76% 13%
All 63% 80% 100%
  • Through the Android MVP, experts evaluated 2,397 matches. On average, experts assessed 76% of the matches to be correct. This is in line with the results above.
  • WMF staff also manually evaluated 230 image matches which were marked as “correct” by newcomer editors (<50 edits). We found that 80% of these matches are actually correct, which is in line with the numbers above.

Algorithm improvement: what did we learn about the algorithm’s weak points?[edit]

  • What is the distribution of responses for the follow-up questions for “no” and “not sure”?
  • We want to look at these numbers for English and Non-English users, if possible.

“No” responses

Response All users English Non-English
Not relevant 5,094 (43%) 4,034 (45%) 1,060 (37%)
Not enough information 2,014 (17%) 1,465 (16%) 549 (19%)
Offensive 159 (1%) 108 (1%) 51 (2%)
Low quality 969 (8%) 734 (8%) 235 (8%)
Don’t know this subject 1,132 (10%) 807 (9%) 325 (11%)
Cannot read the language 752 (6%) 554 (6%) 198 (7%)
Other 1,674 (14%) 1,210 (14%) 464 (16%)

“Not sure” responses

Response All users English Non-English
Not enough information 2,147 (24%) 1,742 (24%) 405 (23%)
Can’t see image 267 (3%) 213 (3%) 54 (3%)
Don’t know this subject 4,178 (47%) 3,325 (46%) 853 (48%)
Don’t understand the task 284 (3%) 215 (3%) 69 (4%)
Cannot read the language 1,095 (12%) 895 (12%) 200 (11%)
Other 996 (11%) 785 (11%) 211 (12%)

Judgment: can newcomers identify the good matches from the bad, thereby improving the overall accuracy of the feature placing images on articles?[edit]

  • Comparison with WMF staff annotations
    • 80% of the matches for which newcomers said "yes" are actually good matches
    • This number goes up to 82-83% when we remove newcomers who have very low median time for evaluations.
    • Since the algorithm is 65-80% accurate in the first place, and algorithm+newcomers is 80% accurate, but we think that we can boost that by screening the worst newcomers (those who go too fast; those who say yes too often), then perhaps newcomer+algorithm could be up at 85%+.
    • 85% of the matches for which Avg/Expert users said "yes" are actually good matches
  • Comparison with expert users (users with 1000+ Wikipedia edits)
For images labeled as "good matches" by newcomers Experts agree 74.9% of the time
For images labeled as "bad matches" by newcomers Experts agree 51.8% of the time
label description % positive responses %users with "all yes" % users with “all yes” and 5+ annotations
new <50 edits 76.90% 40.40% 30.06%
expert >=1000 edits 76.15% 22.45% 12.82%
avg otherwise 73.63% 17.12% 14.21%
  • There was agreement amongst users
  • Newcomers are more likely to select yes than experienced users.

Effort: do newcomers seem to spend adequate time and care evaluating each match?

  • What percent of users have a mean response time of less than five seconds?

All users

User Mean Median % of users with <5s response time
Newcomer (<50 edits) 9.6 7.6 31.7%
Medium (>=50 and <1000 edits) 10.2 8.5 11.2%
Expert (>=1000 edits) 11.6 9.6 13.2%

This table is at the task level (not the user level).

  • The more experienced someone is, the more time they spend evaluating


User Mean Median
Newcomer (<50 edits) 9.5 7.4
Medium (>=50 and <1000 edits) 10.1 8.2
Expert (>=1000 edits) 11.2 9.3

This table is at the task level (not the user level).


User Mean Median
Newcomer (<50 edits) 9.8 8.2
Medium (>=50 and <1000 edits) 10.8 9.2
Expert (>=1000 edits) 13.1 12.0

This table is at the task level (not the user level).

  • How often do users open the article to read more, and open the image to see details?

All users

User Percent open article Percent open image
Newcomer (<50 edits) 1.4% 16.4%
Medium (>=50 and <1000 edits) 1.2% 20.5%
Expert (>=1000 edits) 0.9% 15.7%

This table is at the task level (not the user level).


User Percent open article Percent open image
Newcomer (<50 edits) 1.3% 16.2%
Medium (>=50 and <1000 edits) 1.1% 21.4%
Expert (>=1000 edits) 1.0% 17.4%

This table is at the task level (not the user level).


User Percent open article Percent open image
Newcomer (<50 edits) 1.7% 17.3%
Medium (>=50 and <1000 edits) 1.9% 16.5%
Expert (>=1000 edits) 0.5% 6.4%

This table is at the task level (not the user level).

2021 May 25 - Initial Data Insights[edit]

The Android team met with members of the Growth, Platform Engineering and Research teams to have a high level review of our data thus far and make determinations of what adjustments we should make now for the MVP, as oppose to later phases of this project.

With the experiment officially running for two weeks, the Train Image Algorithm tasks has received engagement from over 2,800 unique users on over 20,000 image titles across several language wikis. Below you will find which language wikis have at least 200 completed tasks in order by the number of tasks completed:

  • English
  • German
  • Turkish
  • French
  • Portuguese
  • Spanish
  • Persian
  • Arabic
  • Russian
  • Italian
  • Hebrew
  • Ukranian
  • Czech
  • Vietnamese

The average Train Image Algorithm tasks completed per day by a user is 10, which is consistent with the daily goal set in the feature by the team. This data tells us there that participants in this task are motivated by the daily goal, a positive reinforcement element unique to Suggested Edits.

The Train Image Algorithm feature appears to be popular with both new users, as well as power editors.

47.85% of contributors of this task downloaded the app 30 days or less ago, while 20.86% of users completing the Train Image Algorithm task have more than 50 edits across platforms.

2021 May 7 - Production Release[edit]

The team incorporated minor tweaks to the Beta version and released the Train Image Algorithm task to the production version of the Wikipedia app. In two weeks we will do a check on the data to ensure data is coming in the way it should and share a few initial insights. We will also monitor our email, the play store and our phabricator board for any bugs that may arise.

2021 April 27 - Release to Beta and FAQ page[edit]

The team incorporated user testing feedback and released the feature to Beta. Our QA Analyst will review the feature in Beta for the rest of the week, and if there are not major blockers, the feature will become available in the production version of the app. We also created an FAQ page which is accessible in the app. We encourage feedback on this project's talk page.

Train Image Algorithm task on Beta

2021 April 5 - User Testing Prioritization[edit]

Based on our analysis of the user testing feedback, the team is making updates to the prototype ahead of the release of the MVP at the end of the month. The tweaks we are making, which is captured in T272872 will include:


  • T278455 The bottom sheet for image suggestions needs to be draggable in order to reveal the article contents below it. Also, participants tried to interact with the handle bar at the top of the bottom sheet.
    • If draggable sheet is not feasible: Consider a max height of the bottom sheet in order to not cover the article completely.
  • T278490 Optimize tooltip positioning and handling on smaller screens, as they are cut off on smaller screens.
  • T278493 Ensure words are not cut off and gracefully overflows
  • T278526 Create more suitable 'Train image algorithm' onboarding illustrations for all different themes.
  • T278527 The checkbox items in the the 'No' and 'Not sure' dialogs have issues in dark/black theme and need to be optimized.
  • T278528 The element of positive reinforcement/counter has displays in the dark/black theme and needs to be optimized.
  • T278529 Provide an easy way to access the entire article from the feed, e.g. by incorporating a 'Read more' link, tappable article title or showing the entire article right from the beginning.
  • T278494 Optimize copy 'Suggestion reason' meta information as the current copy ('Found in the following Wiki: trwiki') is not clear enough.
  • T278530 Might be worth to explore making the 'Suggestion reason' more prominent as participants rated its usefulness the lowest (likely due to low discoverability)
  • T278532 Optimize the 'No' and 'Not sure' dialog copy to reflect that multiple options can be selected. Some participants weren’t aware that multiple reasons can be selected.
  • T278496 Optimize copy of the 'opt-in' onboarding screen, as there’s an unnecessary word at the moment ('We would you like (...)').
  • T278497 Suppress “Sync reading list” dialog within Suggested edits as it’s distracting from the task at hand.
  • T278501 Incorporate gesture to swipe back and forth between image suggestions in the feed, as participants were intuitively applying the gestures.
  • T278533 Optimize design of positive reinforcement element/counter on the Suggested edits home screen, as it was positioned too close the task’s title.
  • T275613 Write FAQ page
  • T278534 Make it clear that reviewing the image metadata is a core part of the task. We can potentially do that by increasing the visual prominence and/or increase the affordance to promote always opening the metadata screen.
  • T278535 Optimize the discoverability of 'info' button at the top right as 2/5 participants had issues finding it.
  • T278555 Save previous answer state: Given users are able to go back, the selection made in the previous image or images should be retained
  • T278556 Reduce the font-size of the fields of the More details screen
  • T278545 Change the goal count to 10/10

Nice to Have

  • T278546 Add "Cannot read the language" as a reason for rejection and unsure
  • T278557 Show the full image contained instead of a cropped image
  • T278548 Include the same metadata in the card - notably the suggestion reason (in addition to filename, image description and caption) on the more details screen as well.
  • T278549 Show success screen (see designs on Zeplin) when users complete daily goal (10/10 image suggestions)
  • T278550 Explore tooltip "Got it" button
  • T278552 Incorporate pinch to zoom functionality, as participants tried to zoom the image directly from the image suggestions feed.
  • T278558 Remove full screen overlay when transitioning to next image suggestion. This allows users to orient better and keep context after submitting an answer.
  • T278561 Provide clear information that images come from Commons, or some more overt message about the image source and access to more metadata

2021 March 25 - User Testing Analysis[edit]

The team released an update to production that included minor bug fixes for TalkPage and Watchlist. We also show non-main name space pages in-app through a mobile web treatment.

The Android team leveraged to gain a better understanding of what aspects of the Image Recommendations MVP worked well and what things should be improved prior to release in English, German, French, Portuguese, Russian, Persian, Turkish, Ukrainian, Arabic, Vietnamese, Cebuano, Hebrew, Hungarian, Swedish, Polish, Czech, Basque, Korean, Serbian, Armenian, Bangla and Spanish.

We completed the analysis in partnership with the Growth team. Below is the Android team analysis.

Analysis of tasks T277861[edit]

🥰 = Good — Participant had no issues
😡 = Bad — Participant had issues
🤔 = Not sure if good or bad — Participant might had difficulties understanding the question, did not explicitly interact with it or ignored the task completely
Participant Tasks
#6 #9 #10 #11 #13 #15 #16 #17 #18 #19 #20 #21 #22 #23 #25 #26 #27 #28 #30
Battybrit 🤔 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🤔 🥰 🥰 🥰 🥰 🥰 🤔 🥰 🥰 🥰
brad.s 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰
147qb 🤔 🥰 🥰 😡 🥰 🤔 🤔 🥰 😡 🥰 🤔 🥰 🥰 🥰 🥰 🥰 😡 🥰 🥰
TestMaster888 🤔 🥰 🥰 🥰 🤔 🥰 🤔 🥰 🥰 🥰 🤔 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰
Cherry 928 😡 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🥰 🤔 🥰 🥰
Overall evaluation 😡 🥰 🥰 🥰 🥰 🥰 🤔 🥰 🥰 🤔 🥰 🥰 🥰 🥰 🥰 🥰 🤔 🥰 🥰

Onboarding and understanding of Suggested edits

Task #6 Do participants understand the tooltip? 😡

  • 2/5 discovered the tooltip but had issues understanding it.
  • 2/5 did not see the tooltip since it disappeared too quickly.
  • 1/5 discovered and understood the tooltip completely.

Task #9 Can participants explain the difference between tasks? 🥰

  • 5/5 were able to explain their understanding of the tasks in a sufficient way.

Task #10 Do participants understand what the 'Train image algorithm' task is all about? 🥰

  • 5/5 were able to describe the task in their own words well.

Task #11 What do participants associate with the robot icon? 🥰

  • 4/5 associated the robot icon with an algorithm, artificial intelligence (AI) or computer program
  • 1/5 didn’t know what it means

Train AI task - Onboarding and understanding

Task #13 Do participants understand the two onboarding screens? 🥰

  • 4/5 understand both onboarding screens.
  • 1/5 wasn’t reacting to the second onboarding screen (opt-in).

Task #15 How do participants interact with onboarding tooltips? 🥰

  • 3/5 understand the task due to the tooltips.
    • 1/5 mentioned that the tooltips are very helpful to understand the task.
  • 1/5 understands the task but did not pay attention to the tooltips.
  • 1/5 probably did not see or understand the tooltips.

Task #16 Is the tooltip copy clear enough? How’s the timing and positioning of the tooltips on various devices / screen sizes? 🤔

  • 3/5 read and understand the tooltip copy.
  • 2/5 did not interact with the tooltips.
  • 2/5 had tooltip display issues on a smaller phone.
  • 1/5 likes that the tooltip mentions the impact (help readers understand a topic)

Task #17 Do participants know what to do after all these onboarding measures? 🥰

  • 5/5 understand what to do now.

Train images task

Task #18 Do participants interact with the prototype naturally? 🥰

  • 4/5 are mostly comfortable interacting with the UI and make educated decisions.
  • 3/5 do not navigate to the file page without being prompted.
  • 2/5 navigate between the article and file page intuitively and without issues.
  • 1/5 is intimated to make decisions that affect Wikipedia articles, doesn’t know how to interact with the article (RS: possible due to small screen size) and doesn’t use file detail page intuitively.

Task #19 Do participants know how to navigate to the file detail page? 🥰

  • 5/5 successfully navigated to the file detail page after being prompted.
  • 1/5 tapped the 'info i' icon in the feed view first.

Task #20 How helpful is the meta information on the file detail page? 🥰

  • 3/5 consider the information on the file page as helpful.
  • 2/5 mention that the author is helpful.
  • 2/5 mention that the date is helpful.
  • 1/5 mentions that licensing info is helpful.
  • 1/5 mentions that the image description is helpful.

Task #21 Do participants know how to enlarge / zoom an image? 🥰

  • 5/5 tapped the image and used a pinch to zoom gesture to zoom the image.
  • 2/5 tried to zoom the image directly from the feed experience.

Task #22 Do participants know how to go back and forth between image suggestions? 🥰

  • 5/5 use swipe gestures to navigate back and forth between image suggestions.
  • 2/5 tapped the back button at the top left before using the swipe gesture.
  • 1/5 tapped the 'info i' button at the top right before using the swipe gesture.

Task #23 Do participants understand the 'Not sure' options? 🥰

  • 5/5 understand the 'Not sure' options.
  • 3/5 were selecting multiple reasons at once.

Task #25 Do participants understand the 'No' options? 🥰

  • 5/5 understand the 'Not sure' options.

Task #26 Do participants scroll or know how to reveal more of the article contents? 🥰

  • 4/5 were successful in scrolling the article to reveal more information
  • 2/5 wanted to use the pull indicator at the top of the image suggestion to reveal the article below before they scrolled the article
  • 2/5 tried to the tap the article title (1/5 scrolled afterwards)
  • 1/5 looked for a 'More' button to reveal more of the article’s content, then tapped the 'info i' button at the top right

Task #27 Do participants know how to access the FAQ? 🤔

  • 3/5 tap the 'info i' button at the top right to reveal the FAQ.
  • 1/5 explained that she would tap the back button and look for an FAQ there (RS: a possible way to success as there’s an FAQ section in the SE home screen)
  • 1/5 did not notice the 'info i' button at the top right

Task #28 How do participants interpret the element of positive reinforcement? 🥰

  • 5/5 understand what it is and identified the element as motivational, encouraging and/or daily goal
  • 1/5 wasn’t 100% sure about it but then identified it as a motivational element.

Task #30 Do participants notice the element of positive reinforcement that has been added to the card? 🥰

  • 5/5 participants identified the added progress indication in the card

3. Analysis of rating scale

1 = Not at all useful information 5 = Very useful information

Participant Rating
First paragraph (#36) Description (#33) Filename (#32) Caption (#34) Suggestion reason (#35)
Battybrit 5 5 5 5 4
brad.s 5 5 4 5 3
147qb 5 4 3 4 3
TestMaster888 5 4 3 4 2
Cherry 928 5 5 5 2 2
Overall rating 5 4.6 4 4 2.8

4. Analysis of follow-up questions

1. How do you think the suggested images for articles are being found? And how would you rate the overall quality of the suggestions?

  • 5/5 mentioned that the images presented were relevant.
  • 4/5 associated the image suggestions with an algorithm or computer program.
  • 2/5 mentioned that the suggestions are associated with keywords.
  • 1/5 mentioned these are random suggestions.

2. Was there anything that you found frustrating or confusing, that you would like to change about the way this tool works?

  • 3/5 replied that it’s easy to use.
  • 1/5 that it’s tedious and cumbersome.
  • 1/5 suggested to show more than 1 image choice per article.

3. How easy or hard did you find this task of reviewing whether images suggested were a good match for articles?

  • 4/5 find it very easy to evaluate if it’s a good match for the article.
  • 1/5 think it’s hard and time consuming but well worth it.

4. Would you be interested in adding images to Wikipedia articles this way? Please explain why or why not.

  • 4/5 are interested in such a feature
  • 1/5 mentions he would not be interested
  • 1/5 mentions that she wants to know how accurate she is when reviewing images

2021 February 23 - Finalizing Designs ahead of Usability Testing[edit]

The Android team has created designs that are currently being turned into a prototype for usability testing prior to deployment.

Once the prototype is created for user testing we will update this page with a link that anyone following along with this project can use and provide us feedback on our talk page.

2021 February 1 - Designs, Product Decisions and APIs[edit]

This week the Platform Engineering Team began building the API needed for this project with the projection of completion in early March, which is when we hope to deploy the MVP.

There were open Product questions the team's new Product Manager answered in T273055

Initial Product Decisions

  • We will have one suggested image per article instead of multiple images
  • This iteration of the MVP will not include Image Captions
  • There are no language constraints for this task. As long as there is an article available in the language we will surface it. We want to be deliberate in ensuring this task is completed by a variety of languages. For this MVP to be considered a success, we want the task completed in at least five different languages including English, an indic language and Latin language.
  • We will have a check point two weeks after the launch of the feature to check if the feature is working properly and if modifications need to be made in order to ensure we are getting the answers to our core questions. The check point is not intended to introduce scope creep.
  • We aren't able to filter by article categories in this iteration of the MVP, but it could be a possibility in the future through the PET API
  • We will surface a survey each time a user says no to a match and sparingly surface a survey when a user clicks Not Sure or Skip
  • We need three annotations from 3000 different users on 3000 different matches. By having these three annotations, the tasks will self grade.
  • We will know people like the task if they return to complete it on three distinct dates we will compare frequency of return by date across user type to understand if there was more stickiness for this task by how experienced a user is
  • Once we pull the data we will be able to compare the habits of English vs. Non English users. We can not / do not need to show the same image to both non English and English users. Non English users will have different articles and images. We will know if a task was hard due to language based on their response to the survey if they click no or not sure. We will check task retention to see how popular the task is by language.
  • In order to know if the task is easy or hard, we would like to be able to see how long it is taking them to complete it. ****NOTE: This only works if we can see if someone backgrounds the app. Of the people that got it right, how long did it take them?
  • In order to know if the task is easy or hard we should also track if they click to see more information about the task, in order to make a decision
  • We determined that it is not worth adding extra clicks to see what metadata is used that is found helpful. Perhaps we allow people to swipe up for more information and it generally provides the meta data??? Will need to see designs to compare this
  • It is too hard, at least for this MVP, to track if experienced users use this tool to add images to articles manually without using the tool, so we aren't going to track that.
  • In the designs we want to track if someone skips or press no on an image because the image is offensive in order to learn how often NSFW or offensive material appears

The Android Designer began work on mockups for the MVP and has started to receive feedback at T269594. The user stories the designer is creating mockups in response to include:

2.1. Discovery[edit]

When I am using the Wikipedia Android app, am logged in,

and discover a tooltip about a new edit feature,

I want to be educated about the task,

so I can consider trying it out.

2.2. Education[edit]

When I want to try out the image recommendations feature,

I want to be educated about the task,

so my expectations are set correctly.

2.3. Adding images[edit]

When I use the image recommendations feature,

I want to see articles without an image,

I want to be presented with a suitable image,

so I can select images to add to multiple articles in a row.

2.4. Positive reinforcement[edit]

When I use the image recommendations feature,

I want feedback/encouragement that what I am doing is right/helping,

so that I am motivated to do more.