Reading/Multimedia/Media Viewer/Retrospective

This is a summary of the multimedia team's retrospective on Media Viewer, held on December 11th, 2014, to identify lessons learned from this project.

Overview
Media Viewer​ is a new multimedia browser aimed at Wikimedia readers and casual editors. This tool was developed by Wikimedia Foundation's ​multimedia team​ from July 2013 to December 2014 (see ​project timeline​).

Media Viewer ​aims to:
 * i​mprove the viewing experience for readers and casual editors.
 * make it easier to preview and browse images, right where you are.
 * display images in larger sizes, with basic information -- and links to more details.
 * help enlarge, share, download or embed images on other web pages.

Research
Here are ​key findings​ from our research about this product: For more details, see the full ​Media Viewer Research 2014 Report​ and ​companion slides​.
 * Media Viewer serves a lot more images than before (~​17M intentional image views a day​ + 3M traditional file page views -- vs. pre-deployment: ~12M file page views / day)
 * Most users keep Media Viewer enabled (​99.5% enabled​)
 * Media Viewer key features were found ​easy to use​ (users completed all target tasks successfully in last rounds of usability tests)
 * Media Viewer is ​more useful for readers​ than active editors (optout rate: 0.5% of logged-in users who visit the site at least once a month have disabled Media Viewer, versus 4.5% of active editors, 14% of very active editors)
 * Media Viewer loads image as fast as the file page (1.5 second to show first image for the median user, 8-9 seconds for the 95th percentile)

Project Review

 * Media Viewer was intended to be a medium-size project, but ended up being a large-size project, and a lot more challenging than expected.
 * The project ran for about a year and a half, with a team that grew from two to six persons at peak, representing roughly ​4 person-years​ (see ​project timeline​)
 * community feedback was generally positive or neutral on most wikis, but the project was met with strong negative reactions from on English and German Wikipedias, and on Wikimedia Commons. On each of these three sites, RfCs asked that the project be opt-in or disabled.
 * This required the team to work longer than planned, to improve features based on user feedback.
 * Deliverables included 17 (physical) KLOC code modules (PHP+JS+LESS), 11 KLOC tests

Lessons learned
What worked well: What did not work well: What could be improved
 * Detailed activity and performance metrics (though tracking user actions with EventLogging was time consuming)
 * Design research, both before and after implementing a feature (would have worked even better if we started doing it earlier).
 * Working with community champions for getting feedback during the beta phase and preparing the deployments.
 * The agile process we used (SCRUM-ish with weekly sprints and Mingle cards), although we had to trim it down a bit towards the end as the team got smaller.
 * Setting apart a few story points per sprint for tech debt and other non-feature tasks. (We regressed after the enwiki deployment when we worked in permanent emergency mode, though.)
 * Having a good unit test suite. We did a fair amount of refactoring and last minute changes, and had close to zero live regressions.
 * Many community discussions did not effectively inform product development. Most channels did not work very well: talk pages were OK for bug reports but became counter-productive after the controversy on enwiki and dewiki; roundtables were helpful, but resource-intensive; mailing lists and IRC didn’t provide a lot of feedback.
 * Surveys. They were not representative, because they were optional (we just showed them to everyone and then a self-selected fraction of users filled them out), so the approval rates could not be used as a quantitative metric. The feedback was useful for identifying reader priorities, however.
 * Too much micromanagement after the community controversy started, with sometimes 4 different layers of management involved in decision-making about the user experience. This had a disenfranchising effect.
 * Trying to juggle feature and platform development at the same time. Feature development tended to eat up all time reserved for platform improvements. Separate teams or the same team working on them in different quarters might have worked better.
 * Tracking opt-out rates. We did not have a baseline for them at first and more research is needed to see how they correlate with community reactions.
 * The community consultation represented a big time investment for our team (and even larger for the Community Engagement team). Though evaluating its 130 responses helped validate and prioritize which improvements to focus on, few new product insights came out of this labor-intensive activity.
 * The Consultation led to further deterioration in relations with the Community. Four RFCs across three Wikis had concluded that Media Viewer should be opt-in, however the Community Consultation declared that subject to be "out of scope" for discussion. In a fifth RFC immediately after the Consultation the Community voted more than 2-to-1 that Media Viewer should still be opt-in even with the Consultation's improvements. In that discussion many community members referred to the Consultation as a "sham", expressed a complete breach of trust with the WMF, objected to the WMF ignoring the Community, objected to WMF staff censoring responses to the Consultation, or referred to the "toxic" atmosphere that had been created. Some community discussions resulted in nearly unanimous agreement that the situation had not meaningfully improved. The subsequent election for WMF board of trustees saw unprecedented participation. All of the incumbent elected board members were voted down, and every newly elected board member had run on platforms firmly critical of the WMF's handling of the Media Viewer situation. Since then the WMF made great efforts to improve community relations, however nearly a year and a half later the toxic atmosphere of distrust continued to linger, making progress slow and painful.

Biggest mistakes

 * Scope creep. Media Viewer started as a medium-size project, which we thought would not require a lot of effort. As we got deeper into the project, the workload grew larger and we did not stop to check whether the expected gains grew as well. We ended up with a feature list and expected level of quality that was unrealistic for the resources we had allocated to this project.
 * No clear success metric. We had no objective way to determine when the product was ready to go live. It was hard to demonstrate its benefits to critics of the project; and it was also hard to evaluate whether new changes we made were helpful or not (although in the later phases, design research helped validate its usability).
 * Launching on enwiki+dewiki+commons in two weeks. We were overconfident after the success of all the previous launches; while we did not have gradual rollout tools, stretching out the launch period more would have helped (or at least limited the controversy to one of those communities at a time).
 * We lacked the tools to get productive feedback from different user groups. While we received a lot of feedback from advanced users in talk pages and RfCs, it was harder to get quantifiable feedback from readers, our primary target users.

Planned changes

 * Do more research and do it earlier:
 * start product planning with design research, so that it can influence what kind of ideas we come up with
 * do rapid prototyping and concept testing as soon as possible
 * do usability tests to validate finished features
 * try to involve the widest group of users possible; do not limit validation to volunteer beta testers
 * Be more disciplined about starting with a MVP that has a minimum of features but those work well and have been validated via design research. Prefer polishing the most important features over adding more features.
 * Experiment with a workflow in which community members have a chance to discuss plans and priorities for the next sprint before the choice of tasks, design decisions and acceptance criteria are finalized.
 * Collect input from key community groups on their priorities and wishlists. Make it open-ended so it is not limited to the ideas we are aware of.
 * Get key metrics on the status quo before doing any changes.
 * At the planning of a project, set up clear, measurable goals, which have been discussed with and ideally agreed on by the community before the start of development. That includes what to achieve and also what to avoid (what metrics should not get worse as the result of the project).
 * At the planning of a project, quantify benefits and effort needed to estimate impact. When a project runs over budget or there is a significant change of scope, reassess impact and decide whether it still makes sense to go forward with it or it should be stripped down or abandoned.

Suggestions/needs

 * The WMF needs gradual deployment tools to test a product or feature with a small but not self-selected fraction of the user base, and to roll out step-by-step to a large community like enwiki. This would let developers validate the software with progressively larger user groups -- and roll back without serious impact on user experience if it does not perform as expected. This is an industry standard which we need to catch up to.

(In hindsight, we could have hand-coded such functionality for MediaViewer, but it is a lot of wasted effort to redo that for every project. MediaWiki needs proper feature switches.)
 * We also need better tools for community engagement, so we can get actionable, constructive and representative feedback from specific target groups. Talk pages are biased towards more active contributors, have limited effectiveness in the beta phase, and break down when there is a controversy.
 * There needs to be a frank discussion with the communities about RfCs. Whatever one thinks about the division of power between the Foundation and the community, an RfC right after deployment is an ineffective way to make software decisions. Editors need to understand this and get involved earlier and more thoroughly with the development process.
 * There should be a standard for user acceptance metrics, so that teams do not have to invent their own methodologies with insufficient resources -- and to make comparison across projects possible.
 * Better software support for community-driven brainstorming and idea prioritization would be great.

Thanks to

 * Everyone on the multimedia team for their fine work under difficult conditions
 * Aaron for the gracious offer of his time and for his healthy early influence on the team’s practices. Media Viewer wouldn’t have the test coverage it does now without him.
 * Our many community champions, who helped socialize this release in their home wikis: se4598, TMg, Denis, Birgit, Jan, Trizek, Binabik, Gryllida, Romaine, Justine, Oona, Henrique, Elitre, Tar, Stryn, Geraki, Amir, Orel, Tgr, Miya, Hym411, Revi, Vishnu, to name but a few -- thank you!
 * All the beta testers and all the subscribers to our mailing list who provided insightful feedback when we needed it
 * Ori for his help with analytics and performance
 * Dan Andreescu for his help with limn
 * Bryan for helping with Vagrant and dropping the merge hammer on Mark’s early patches
 * Antoine, Chris McMahon and Dan Duvall for their help with CI and browser tests
 * Timo and James for their help on OOJS/OOUI and JS best practices in general
 * Faidon for helping make the click-to-download feature possible
 * Filippo for helping with the thumbnail prerendering
 * The analytics team for their help in regards to the analytics DB and EventLogging
 * Erik Möller for his guidance in shaping the user experience and engaging our community
 * May and Kaity for their help on the early designs
 * Bawolff for writing the first version of CommonsMetadata