Wikimedia Engineering/2014-15 Goals/Q3

Early notes for consideration in December.

We distinguish between:


 * Above waterline - top 5 priorities
 * Below waterline - high priority projects/needs, close competitors for top spots
 * Below lava line - not a contender for a top priority

Right now we're not in the sorting stage yet, but we can collect some candidates already identified in the Q2 planning. As we get closer to Q3, we'll collect input from various team leads, the architecture team, and other stakeholders.

Strong Candidates

 * Editing performance cont'd from Q2, as anticipated, unless the project is just not working.
 * Why: We're only kicking this off in November, so need some time to dig in.
 * A/B and multivariate testing infrastructure. Create better foundations for testing, comparing and validating user experience changes.
 * Why: Must-have to increase product development velocity (though should be driven by concrete product needs for Q3).
 * Fundraising tech refactor. Make fr-tech less of an island internally and ensure the team can add and rotate team members.
 * Why: Fr-tech needs to staff up to support new integrations (e.g. mobile), and to support that, and create more team sustainability, this has long been identified as a must-have.

Why

 * Must-have to improve product quality.
 * Make deployments less painful
 * Improve our QA.
 * Deploying is still hard, Beta Cluster is part of that

What

 * Reconcile HHVM and Beta Cluster
 * See: HHVM fcgi restart during scap runs cause 503s (and failed tests)
 * Required crossover: MW Core, Ops
 * Dev code pipeline
 * A "nightly" build tested on Beta Cluster would give us a higher degree of certainty (and a firmer commitment to test against) before deploying
 * see also: True code pipeline
 * Required crossover: MW Core (minimal, hopefully)
 * Beta Cluster/Prod reconcilization
 * Identifying (we already have many) the rough edges between Prod and BC and addressing them
 * This is purposefully non-deterministic for "done-ness" at this point, but we would timebox the investigation portion to make it deterministic/do-able
 * Required crossover: Ops, MW Core
 * Puppet code pipeline
 * Need a method to allow ops/others to test their code changes on Beta Cluster before prod deployment
 * Why:
 * Will ensure "upstream" puppet changes don't kill Beta Cluster accidentally (and unknowingly)
 * Will allow for fuller testing of puppet changes
 * Required crossover: Ops

Why

 * Biggest organic growth & green field opportunity.
 * The work in Q2 was preparatory for a longer-term plan for new mobile contributor and reader engagement

We are working toward a single, integrated mobile system that feeds into and off of structured data. Our goal is to give mobile readers and contributors more engaging, device- and context-appropriate ways to consume and grow the sum of all human knowledge, both on the mobile devices that exist today and on those that are coming in the near future. This is a complex, multi-stage project, not a one-off experiment.

Apps mid-term (Q3)

 * If conservative goals met for both teams
 * Heavy focus on demonstrating more engagement via one new reader feature


 * If stretch goals met for both teams
 * expand on Q2 MVP
 * notifications to draw users in to the app
 * pilot one more reader-facing engagement-oriented feature
 * begin generating mobile infoboxes from Wikidata
 * successful micro-contribution workflow from mobile web to apps

Mobile web mid-term (Q3)

 * If conservative goals met for both teams
 * test WikiGrok with logged in users in stable
 * pilot additional micro-contribution feature


 * If stretch goals met for both teams
 * (if Wikidata query service is operational) expand on question set
 * test aggregation framework for WikiGrok responses
 * release WikiGrok to readers in stable
 * graduate another micro-contribution to stable
 * begin generating mobile infoboxes from Wikidata

Apps and web long-term (Q4 and beyond)

 * Drive traffic from mobile web to the app
 * Create self-sustained contribution and curation workflows on mobile that lessen the burden of quality control for the existing community (e.g., aggregating, patrol queues, mobile-only flows)
 * Replace more unstructured elements (e.g., infoboxes, tables, references) on apps and web with structured data that can be restyled natively and used in novel reader and contributor features on apps
 * Continue experimenting to stay ahead of mobile trends

Candidate

 * Phabricator for code review, phase out Gerrit.
 * Why maybe not: This may be premature as we'll still be in the early days of using Phabricator as a PM tool, and may not yet fully understand the requirements. It may also contend for resources with critical test infrastructure work.
 * Also, the team has been working full time at full speed with a high pressure for delivering, and now it's time to take it a bit easier, work on other things, and let Phabricator consolidate. The Code Review project may start, but slower, without being a top priority.--Qgil-WMF (talk) 07:33, 4 November 2014 (UTC)
 * Front-end standardization / UX standardization cont'd as a top priority.
 * Why maybe not: We're now establishing a lot of technical foundations and working parameters for the team; we may not need to continue it as a top priority to keep the momentum going.
 * Library infrastructure work cont'd.
 * Why: So it doesn't immediately fall by the wayside after making some initial efforts.
 * UX Testing Environment (State of the User)
 * Why: Our production environment is not set up for running 10/100/1000 users through a battery of tasks without one user invalidating tasks of subsequent users, and sandboxed environment that reflects the current production state of the sites, with or without modification and the addition of usability tracking software is necessary for the success of quantifiably measured qualitative analysis of our prototypes and productions features. Ops, platform, analytics, and design research would need to work together for this. Technical details
 * MicroSurveys
 * Why: Ability to quickly identify problems or positive sentiment around existing systems (whether static or in development). These would be brief, few-question surveys to gain quantifiable insight into the usability of current systems for contributors. Discussed as testing out mobile version as well. Good for getting information from both readers and contributors.  The two main approaches are:
 * Overall satisfaction (aka net promoter scores): One-click method for users to tell you whether they are generally satisfied.  Good for gathering baseline data, good for comparing different products (e.g., people reading on Mobile apps vs Mobile web), good for identifying the existence of "skunk in the doorway" problems (problems you didn't know to ask about, because you couldn't see the software from the user's perspective, i.e., that there was a serious problem in the user's path).
 * Structured feedback: Click here to make a suggestion, click here to file a bug report, click here to say that it's working okay for you.  This is still simple from the user's perspective, while being more informative from the product manager's perspective.

General thoughts

 * Are some of these still too specific, too project-focused? E.g. could fr-tech refactor and library infrastructure work be collapsed into cross-organizational efforts to reduce technical debt, to break up monolithic code, to increase test coverage, etc.?--Erik Moeller (WMF) (talk) 08:02, 17 October 2014 (UTC)
 * To me fr-tech is one of the few projects that absolutely needs to happen and always get shuffled to the background. Let's give them the resources they need to succeed given the critical need of those systems supporting everything else that we do Tfinc (talk) 18:45, 17 October 2014 (UTC)
 * We should be careful about repaying technical debt purely for the sake of repaying technical debt. IMO it is preferable to tie new stuff to the paydown of debt, or at least figure out how to tie the paydown of technical debt to some concrete deliverable - this a) helps us make sure that we're paying off debt in some kind of priority order [eg focusing on the stuff that matters first] while b) not totally sacrificing creating things of value and c) helps ensure that we're paying off the debt in the right way [eg not doing a massive refactor, only to discover when the next new feature comes up we should have refactored a little differently] Awjrichards (WMF) (talk) 18:50, 31 October 2014 (UTC)