VisualEditor/2015 Process Review

3rd DRAFT, 5 June 2015, Neil P. Quinn and Joel Aufrecht

About this report
We (Joel Aufrecht and Neil P. Quinn) joined the Wikimedia Foundation and the VisualEditor team during April 2015. During our first weeks, we embarked on a project with several goals: This report draft is the output of steps 1 and 2 in that process. Future drafts of the report will include a menu of possible solutions for each challenge, and specific recommendations for moving forward, but for now, we’re just trying to get consensus about the challenges.
 * 1) 	document the VE team’s strengths to help spread best practices in the WMF,
 * 2) 	identify challenges to the workflows of the team and its stakeholders, and
 * 3) 	lead specific process changes to help address those challenges.

We’ve based this report on more than 20 interviews with team members and stakeholders, review of multiple documents, and direct observation of the team’s work during April and May 2015. Several aspects of the team have changed since we started work (in particular, its membership shrank significantly during the April 2015 reorganization); in this report, we generally describe the team in its initial state, and incorporate any recent changes into our recommendations.

Additional notes

 * 1) The VE team’s process for working with its huge number of stakeholders in the Wikimedia community is an extremely important issue, but is outside the scope of this report.
 * 2) Anything in this document is subject to change, so please shout at us if you think we got something wrong. However, most of our recommendations are based on personal conversations and observations, so we will deviate from the wiki norm and ask that you not make any substantial changes directly.

Key findings
The VisualEditor (VE) team was particularly productive in FY2015Q3 (Q3), January to March 2015. The high-functioning, high-morale core completed many changes to VE very rapidly, clearing many of the major preconditions for proposing a relaunch on the English Wikipedia.

Several process issues decreased the VE team’s effectiveness. Priority balancing and work planning with other stakeholders such as the Design, Research, Communications, and Analytics groups was not always effective, which affected the schedule and the quality of the product. A lack of process for identifying work at mid-scale (smaller than quarterly, bigger than daily) limited visibility and predictability. The process and overall responsibility for launch, as opposed to development, was not clear or consistent.

The VE team is likely to show a decrease in productivity from the Q3 peak, due to a reduction in its size, the end of its period as the top Foundation priority with commensurate support, and a removal of the do-or-die condition which motivated the team to work at an unsustainable level.

The VE team
In the context of this report, these are the WMF employees who spend the bulk of their time working on VisualEditor and usually attend daily standup meetings.

The VE team included, during Q3, seven full-time engineers, four part-time engineers, two QA testers, and one product manager. It keeps in close contact via IRC, Google Hangouts chat, two daily stand-up meetings, Phabricator and Gerrit comments, and occasional emails and video chats. The team has been directed collaboratively by James Forrester, the product owner, and Trevor Parscal and Roan Kattouw, tech leads, for several years. Ed Sanders recently took over the role of tech lead.

Two community liaisons are embedded in the team. They participate mainly by Phabricator, a weekly meeting with the product manager, and by observing daily standups.

The core VE team performs engineering on a weekly release cycle within a quarterly planning cycle. Day to day the Product Manager identifies and assigns the majority of tasks for the team, working from a dozen or more sources of potential tasks, although the team tends to see his assignments as suggestions which they accept because of their trust in his technical expertise and knowledge of their strengths, and engineers frequently take bugs for themselves.

QA performs a mix of automated and manual testing. Committed code approved by engineers is automatically launched unless a tester raises objections (as opposed to a process where changes must wait until approved by QA); the testers reported that they were able to keep up with the flow of changes. The Product Manager is deeply involved in the engineering process; often does hands on merging and reviewing of code, takes charge of closing tasks when a fix is merged and coordinating QA and release cycles.

Consulting stakeholders
These are the stakeholders who offer specific expertise and services to VisualEditor, but spend much of their time working on other projects. These include Research, Analytics, Communications, Advancement, senior management, and (before its break-up) Design.

The Design team had weekly scheduled meetings and some participation in Phabricator discussions. The Parsing team keeps in close contact with VE using Phabricator tickets, as well as a weekly sync-up meeting and ad-hoc conversations over IRC, email, and the wikitext-l mailing list. Other teams tend to have more sporadic, ad-hoc communication with the team.

Unit of work
The VE team’s standard unit of work is a task in Phabricator, which prompts a code change, which is ultimately released to the production servers. Work from consulting stakeholders in the Foundation generally arrives in quarterly and weekly meetings. Work from the community tends to arrive in a continuous flow of individual feedback and requests spread across dozens of possible locations; the community liaisons mediate and triage much of this flow. All these processes either lead directly to Phabricator or to the product manager, who ultimately adds accepted work to Phabricator directly.

Workflow
As of April 2015:





Q3 Scope of work
The work completed in Q3 can be quantified, with the caveat that estimating software work is very inexact. The team used the following Burndown chart during Q3:

However, in that chart the blue Start Points and green Ideal Points are updated retroactively, so historical information about changes in scope of work are lost. In the next chart, the scope of work ("Total") and the amount of work completed ("Resolved") are shown evolving over time:

In this chart, tasks are shown as a sum of story points:

Recurring Processes
As of April 2015

Productivity
The team is productive by several measures. In Q3, it met its Q3 goals, successfully burned down a large number of key tasks, and delivered many high-value features including a 50% decrease in load time, major usability improvements, and the delivery of Citoid.

Prior to the April 2015 reorg, most the team’s members had worked together for years with very little turnover, and had developed deep expertise in multiple areas of the codebase.

The team generally feels that the the leads take a great deal of process off their shoulders, and appreciate that.

Morale
The VE team’s morale is very high. In the most recent Team Health Check survey, the VE team self-reported the highest possible health in eight of eleven areas, and second-highest in the other three. (Since the ratings are self-reported, this is a good indicator of morale but not necessarily of objective success.)

During interviews, core VE team members, even those who were remote or shared between teams, unanimously said that they enjoyed the team’s atmosphere. Specific strong points that they mentioned included: The fact that the team had had only one departure of any kind before the reorganization, and that most of the team’s members had been hired directly onto the team may have contributed to this positive atmosphere.
 * The team leads treat mentorship and building camaraderie as core responsibilities
 * There is a very strong culture of strict but friendly code review
 * All members of the team, including its leaders, feel comfortable taking responsibility for mistakes
 * All team members feel invested in the quality and success of the product
 * Compared to other open source projects, the team was very welcoming towards women.

Coordination between remote and co-located people
Almost without exception, remote members of the VE team said they felt satisfied with their inclusion on the team.

Likely causes include:
 * The team has always contained multiple remote members, not just one.
 * The team takes time to socialize during offsites and daily standups (one team member believes this is the main benefit of the standups).

Definition of success criteria
During Q3, the team successfully created specific, measurable goals which allowed them to more confidently assess their progress and better prioritize potential tasks, although not all stakeholders made ongoing use of the goal list. Several team members reported that the involvement of senior management in this process was helpful.

Technical condition
VisualEditor is in production and mature. Current “launch” activities mainly have to do with building sufficient consensus to make it the default for more people; the challenges of releasing and scaling a completely new product are largely in the past.

Major challenges
By gathering our opinions together with those of the team leads and of all stakeholders we interviewed, we identified three top challenges. Because many challenges are complex and overlapping, this is necessarily not a perfect list.

(1) Consulting stakeholders have difficulty engaging with VE’s development.
The core VE team and some consulting stakeholders sometimes reach oppositional viewpoints, with the core VE team seeing other participants as blockers, unable to keep up with the rapid pace, and other participants seeing the core VE team as exclusionary and unduly hasty. This was observed at both higher scale, such as quarterly planning, product definition and conceptual design, and at lower scales, such as day to day task completion and testing. This model conflicts with research and exploratory work, which may take months of planning and execution. This can lead to mutual frustration: for example, Design has expressed concerns that implementations are often not done to agreed-upon design specifications, which impedes their ability to test and refine their ideas. The core VE team, in turn, tends to make these on-the-fly implementation changes to address technical considerations, and doesn’t see how to consult the design team in greater detail without unacceptable delays.

Several standing weekly meetings have broad participation, but are not always effective due to time limitations and a lack of effective process (e.g. agendas, separating information-sharing from problem solving). In between these meetings, the core VE team tends to work effectively but in isolation from other stakeholders. Periods of cooperative iteration between core VE team members and others are uncommon, and in some cases have been interrupted from within the core VE team by other priorities.

External participation via Phabricator, which is intended to be a single centralized forum for discussions about software, is particularly difficult for stakeholders outside the VE team. (This applies to community stakeholders as well as consulting stakeholders within the Foundation).

First, the norms about what parts of tasks non-team stakeholders can change are not clear or documented. For example, if the VE team changes UI language at the request of Communications and closes the bug, but the Communications team notices that the implemented language differs from their draft in important ways, Communications is likely to bring up the matter by email or in person because they are unsure whether they are "allowed" to reopen the task to discuss those differences.

Second, holding discussions on Phabricator tends to be difficult because of the sheer volume of information that Phabricator contains and its lack of good notification features. Bugmail tends to be a firehose, drowning out useful information, and even Phabricator’s inbuilt notifications widget lacks important features like auto mark-as-read and intelligent prioritization. This makes it difficult for both Phab-centric teams like VE and Phab-light teams like Communications to rely on Phabricator to notify them when a discussion needs their input.

This means stakeholders can often engage with VE only through ad-hoc discussions and information requests, which can be particularly difficult because they are reluctant to bother or are unable to reach overstretched leaders like James.

The stakeholders who feel they lacked full voice included Design, Research, and Analytics. Other teams, like Parsing, generally expressed satisfaction with their communication with VE.

(2) The process for early-stage requirements and design decision-making is informal and incomplete.
VE engineering has tended to focus on releasing a continuous flow of small changes, often urgent bug fixes, on a daily or weekly time frame. This model has been very efficient at writing, testing, and releasing new code for VE. However, it has been less effective at drawing in the broad range of viewpoints necessary to help define and prioritize problems and brainstorm solutions—especially at scales smaller than quarterly goals and larger than individual tasks.

For this reason, consulting stakeholders, including non-Foundation community members, have often felt limited in their input and visibility into the process of brainstorming and iterating new features. This includes problems of visibility: it is difficult to learn, understand, or communicate the state of the roadmap/backlog on a month-to-month basis. This also includes problems of input: The process for deciding engineering priorities at the scale of weeks comprises the Product Owner and lead developer making tactical decisions. Because this limits the amount and quality of information available to the decision makers, it isolates other stakeholders from this decision and its consequences.

The VE development model is organized around Phabricator tasks which tend to comprise either well-defined new features of limited size or bug fixes. However, this model has not been effectively extended into earlier stages of product work, such as requirements and design, which are managed in less structured ways and are frequently not tracked in Phabricator. Some high-level tasks exist, but they aren’t used consistently for all roadmap items, and that there's no way to see just broad tasks. This not only limits external participation but also constrains attempts by the VE team to plan effectively. In addition, the sheer number of Phab tickets, and the inability to break them out by scope makes it very difficult for a consulting stakeholder to find the roadmap information that Phabricator does contain.

(3) The team has a high reporting load which may no longer be justified.
During FY2015Q3, the Foundation-wide focus on VE meant the team received both additional support and greater demands for reporting to senior management. The team leadership began doing detailed estimation via story pointing and burnup charting, writing weekly status reports, and leading large weekly status meetings. After the end of the quarter, the support ended and half of the engineers and QA were reorganized out of the team, but the reporting burden has not decreased.

The team leadership feels that both granularity and frequency of this reporting needs to be reassessed, both because it has a significant time penalty and because much of it is duplicative, which invites inconsistent input. In addition, they worry that making one team the primary focus of the entire Foundation, even if that focus is more a burden than a privilege, can lead to culture problems.

Additional challenges
We identified additional areas where improvements can be made.

Rough workflow within the core VE Team
While non-engineer members of the core VE team, including QA, Documentation, and (since the re-org) Design largely feel included, their workflow is not as smooth. This has been identified as a problem by many people, including engineering members. In part this is because Phabricator and Gerrit do not track the complete workflow process; while much of it directly follows automated paths, some parts of development are instead maintained and executed through manual processes, shared understanding, and workarounds. For example, on the team, QA testers track what needs to be tested by a mix of manually-updated Phabricator tags (projects), comments on individual tasks, and direct conversations with team members.

Consequences include time-consuming workarounds to track work, and extra communication and delays as people have to ask each other about unclear task status.

Estimation and forecasting
The VE team (like most software development teams) has a limited ability to estimate and forecast the precise amount of engineering output required to produce certain outcomes.

Recently, the work required to meet VE’s launch objectives has been higher than estimated in several areas, most notably work required to support large-scale A/B testing and to pass the usability test criteria. In particular, the work required to get VE to reach preliminary usability targets took many more iterations than expected. This was exacerbated by communication and trust issues between VE and some of its consulting stakeholders. The consequences include unplanned schedule delays, a lack of visibility of status, and friction between stakeholders.

Executives express a general preference for better forecasting tools, although the priority of this need relative to other visibility issues and to cost is not clear.

Leadership
The high-level decision-making process for VE can be diffuse. In consequence, some decisions can be made in one group, get separated from their origin by, for example, being documented or repeated without a source, until they become fixed assumptions that the broader team is bound by but cannot reexamine. In addition, this diffuse decision-making may become a greater challenge if the changes made so far to VE, contrary to expectations, still fail to secure sufficient community acceptance. It is not clear who is in charge of planning for and building understanding of such contingencies, or communicating these changes to other stakeholders.

Mismatch of roles and team needs. In the current VE development process the Product Manager assumes a broad range of duties. Some are part of traditional Product Management, such as triaging requests, defining requirements, and designing solutions. Other required duties, such as release management, group facilitation, managing developer work allocation, forecasting, reporting, planning, and scheduling, extend the position beyond what any person can focus on simultaneously. In addition, some of these responsibilities tend to be contradictory (for example, it’s difficult to simultaneously maintain the decisiveness required for product decision-making and the neutrality required for group facilitation) and so are better performed by different people.

Responsibility for the VE launch as a distinct, one-off project fell to the VE Product Manager by default. Due to time and availability constraints, not all VE launch activities were effectively coordinated across all parties, which may have delayed the launch and increased cost.

Productivity
The FY2015Q3 productivity, in terms of completing large numbers of code changes to VE while maintaining quality standards, is not sustainable. The Foundation designated VE its top priority and made more people and attention available, which will not continue indefinitely. The April reorganization, which roughly halved the number of engineers and QA on the team, came on the heels of Q3, when core VE team members worked unsustainable hours.

= Changes already in progress or completed =
 * Joel and Neil and Nirzar added to VE team
 * Use Milestone Criteria and Go/No-Go to clarify requirements and coordinate actions
 * More accurate FY2015Q3 burnup

= Recommendations =

Recommendation: Maintain Goals, Epics, and Tasks in alignment in Phabricator.

 * 1) Clarify data model
 * 2) Move current (Q4) OKRs into Phabricator.
 * 3) Move future OKRs into Phabricator
 * 4) Define near-future milestones and add to Phabricator
 * 5) Align all Tasks and Epics to Goals
 * 6) Identify and deal with edge cases

Expected Cost

 * One-time cost of several people to make transition.
 * Ongoing cost to VE Product Owner to maintain information in Phabricator.
 * Cost to either stakeholders or VE Product Owner/PM to convert incoming requests into Phabricator.
 * Cost to stakeholders to learn how to access desired data in Phabricator.
 * Cost to VE Product Owner/PM to create/improve/document Phabricator queries.
 * Possible cost to Foundation to add Phabricator functionality.

Expected Benefits

 * Stakeholders can view VE Scope of work via Phabricator-based queries that are at the appropriate level of data.
 * Stakeholders can use higher-level queries as starting point to negotiate scope and priority.
 * VE Team can use to prioritize backlog.
 * VE Product manager can consolidate all non-Phabricator requests into Phabricator.

Questions and barriers
What exactly is “goals”? Objectives, Key Results, Milestones? What is the relationship between milestones and KRs (same thing, or one is child of other, or peers?)

What is the best fitting data structure for everything? Separate levels of Goal, Epic, Task, so that each Task has 1 Epic parent and each Epic has 1 Goal parent? Or put many assorted tasks into grab-bag Epics and Goals? What are the useful queries for different stakeholder groups, and does this support them?

What are the edge cases? E.g., Tasks which are technically small but have complex implications or dependencies - treat as Task and pass-through Epic, or Task is Epic, or Task rolls up into catch-all Epic?

How exactly do we get the roadmap back from this? Is it the list of Epics?

Is it useful to try and identify all queries and use cases before designing a data model, or is it more useful to design a logically consistent data model that works in Phab, then produce real query results and test them with shareholders?

How do the results of pitch meetings integrate into this data model?

Recommendation: Regularly groom the VE backlog using Card Mapping, Milestone Criteria-driven breakdown, and re-prioritization.

 * 1) Use Milestone Criteria to identify Epics and Tasks.
 * 2) Use Card Mapping and Goals to group tasks in bulk into smaller numbers of Epics, split up and prioritized by planned release.
 * 3) Review the backlog in whole or part on a regular basis to confirm priorities.

Questions
Since VE releases continuously and reactively, is the concept of a themed release (e.g., “Zero Bug Bounce”) across several weeks to a month or longer useful? If not, can anything substitute?

What logical feature areas or other groups are appropriate for Card Mapping?

Expected Cost

 * Recurring cost to VE Product Manager and stakeholders to perform breakdown and triage.

Expected Benefits

 * The VE Product Manager can flag proposed work that does not match planned Goals and Milestones.
 * The VE Product Manager, by working with a single integrated list of VE priorities, can flag conflicts.
 * VE Stakeholders can use regular, planned work breakdown as a channel to communicate and participate in VE planning.
 * Regular work breakdown activities VE Stakeholders can work with the detailed backlog to contribute as appropriate.

Recommendation: Use simple estimation and historical data to maintain rough forecasts.
Expert estimation refers to having one person (typically a lead developer) do all estimates individually. If calibrated and validated, this can be as accurate as much more time-consuming team estimation methods.
 * 1) Develop estimates for all items in the backlog via expert estimation
 * 2) As new items are created, assign estimates via expert estimation
 * 3) Maintain a Burnup forecast with quarterly or monthly precision.

Expected Cost

 * ongoing cost to expert estimator
 * Cost to Ops/RelEng or Foundation for Phabricator improvements

Expected Benefits

 * Stakeholders will have relatively realistic (or at least, not too unrealistically optimistic) dates for planning
 * VE Team can prioritize work breakdown on near-term items and avoid overly detailed work on far-future items

Questions

 * What amount of estimation validation is appropriate? (re-calibrate expert estimators to each other?  to the whole group?)
 * If a bunch of existing tasks are grouped into an Epic, how should the Epic be pointed?  A sum?  Re-pointed top-down?
 * Since a big part of estimation is in work breakdown, what validation or check should be performed on the estimation work implicit in work breakdown?  Possibly, but informally.
 * Is it worthwhile to do any validation by measuring actual effort (hours spent on a task, other measures?) and reporting back against estimates?
 * How should the burnup be created - via the experimental script, via Kevin’s Phabricator effort, or other?
 * How should velocity be measured, if not via Scrum standards?  Count weekly points and average/trend over time?

Recommendation: Negotiate Service Level Understandings between stakeholders, use this discussion to uncover conflicting commitments, and facilitate trade-off decisions.
Example:
 * 1) Identify key working pairs of stakeholders groups
 * 2) Under mediation of Agile Coach, identify mutual expectations in each pair
 * 3) For each expectation, identify counter-expectations.
 * 4) Negotiate expectations and counter-expectations.
 * 5) Identify barriers, such as conflicting commitments, to fulfilling negotiated obligations.
 * 6) Address and resolve underlying barriers.

“As HIPAA officer, expect to have 3 months to evaluate any proposed change.”  “3 months notice would mean our launches are all delayed, increase our engineering costs 20%, and prevent us from participating in hackathons.” “Okay, 1 month if you can always prioritize the changes I recommend and do them in 1 week.”  “We can only do them in 1 week if you are 100% on call during that week” “Can’t be 100% on call because I have six other projects”. Fundamental conflicting commitment identified - address as resource conflict or otherwise escalate or bring in other, indirect, stakeholders to find creative solution.

Expected Cost

 * Cost to all stakeholders to participate in meetings

Expected Benefit

 * All stakeholders reduce fundamental conflicts of commitments
 * VE Planning becomes more predictable

Questions
What pairs are significant other than the star-shaped stakeholder<->Core team pairs?

Are there any relationships that have to be investigated as multi-party groups, not just pairs of people/groups?

Recommendation: Clarify what additional reporting is appropriate under what circumstances.

 * 1) Use the Service Level Understanding process to identify expectations, and costs, for reporting under launch circumstances
 * 2) Define expectations for reporting under more typical circumstances
 * 3) Identify dates and criteria for changing to normal reporting regime

Expected Cost

 * Cost to all relevant to participate in meetings

Expected Benefit

 * Reporting load for VE made commensurate with situation
 * VE Team provided resources appropriate to reporting load

Improve Phabricator’s ability to mechanize our process
Vocabulary: mechanize, as opposed to automate. A one-click build is mechanized; a zero-click build is automated. Processes that require human input/judgment can be definition not be fully automated, only mechanized.

(Note that Kevin is working on smaller version of this, focused specifically on burnup charts)

Expected Benefits

 * Would benefit project management throughout the Wikimedia movement
 * Would likely save staff time overall once the initial investment is made

Expected Costs

 * Significant expense
 * Phacility might not be ready to do this quickly

Account for backlog grooming as tasks in Phabricator
Use Phabricator tasks and story pointing to account for backlog grooming (work breakdown, task definition, design, etc) as forecastable work.

Redouble efforts to centralize discussion within Phabricator.
Take active steps (education, training, bugging people, favoring training/process development over expediency) to make it clear to all stakeholders to what extent they can comment on, edit, and reopen tasks, with these being taken as requests rather than demands, and having the core VE team make and use tasks earlier in the ideation process.

Expected Benefits

 * Would increase transparency of the process, both to other Foundation teams and to the community at large.
 * Would reduce wasted effort trying to remember whether information is found in Phabricator, in email, on a wiki page, or somewhere else.

Expected Costs

 * May require additional work for the core VE team to spend time to
 * Using Phabricator for discussion is generally hampered by its lack of good notification tools.

Group pending stories by threshold
Where possible, group items according to “threshold”. Example: Suppose there are 50 RTL bugs, many of which overlap or have shared causes, and RTL is unacceptable to 90% of users unless the 30 most important ones are done. If so, it may not make sense to work on ANY RTL bugs until time is available to work on all 30 critical ones, since there is no benefit to users until all are done, and since fixing them one at a time may be very inefficient. Then the 30 critical ones can be grouped as one unit, and the remaining 20 pushed down until after the unit of 30 is done.

Move some of the functions currently performed by the Product Manager to other people

 * For distinct, unique projects (like VE launch as opposed to VE ongoing development), assign a specific project manager to coordinate with stakeholders
 * Release Management (coordinating tasks, patches, gerrit, builds, releases, etc)
 * Coordinators.  Hire additional employees, either within VE or within its stakeholders, who at least partly function as coordinators. For example, the Communications team is considering whether it would be worthwhile to hire product marketing experts who could communicate with VE and other engineering teams at a higher, more consistent level.

Expected Benefits

 * Does not require significant cultural or process change.
 * Fresh voices can be catalysts for additional change.

Expected Costs

 * Does not address more fundamental inefficiencies in communication patterns. It is possible that there are process changes which, if implemented, would make it possible for existing roles to “work smarter, not harder.”

Embed non-core stakeholders within the core VE team.
This has already been done for Design and Community Relations. Other groups for which this approach might be relevant: Design Research?

Expected Benefits

 * provides a stakeholder voice within the core VE team in a way which is proven to work

Expected Costs

 * expands the size of the core VE team beyond efficient communication limits
 * limited fix: Puts, e.g., one designer within the core VE team but still leaves the core VE team cut off from the broader Design stakeholder group.
 * Conflict between 100% participation in VE and participation in role-centered groups

Match engineer and QA staffing levels to scope of work.
VE’s engineering staff was cut 50% during the re-org, so expected productivity should decrease a commensurate amount. Therefore, expectations for team velocity and quarterly scope of work should be adjusted

Address physical factors that affect productivity, including noise, space, and meeting space.
VE’s engineers are all remote; staff in bullpen environment (Product Owner, consulting stakeholders) may benefit from coordination enough to partially compensate for productivity losses caused by open workplan.

Measure VE Launch work versus ongoing maintenance
(James’ guess is 80%+ launch) and decide if this needs to change before launch; vs resources)

Next Steps
See VisualEditor Team Process#Plans for the overall process around this report.

Appendix: Resources

 * Interview List
 * Wikimedia Engineering/2014-15 Goals
 * Preliminary A/B test timeline
 * Wikimedia Product Development/Product Development Process/Draft