VisualEditor/2015 Process Review

3rd DRAFT, 5 June 2015, Neil P. Quinn and Joel Aufrecht

About this report
We (Joel Aufrecht and Neil P. Quinn) joined the Wikimedia Foundation and the VisualEditor team during April 2015. During our first weeks, we embarked on a project with several goals: This report draft is the output of steps 1 and 2 in that process. Future drafts of the report will include a menu of possible solutions for each challenge, and specific recommendations for moving forward, but for now, we’re just trying to get consensus about the challenges.
 * 1) 	document the VE team’s strengths to help spread best practices in the WMF,
 * 2) 	identify challenges to the workflows of the team and its stakeholders, and
 * 3) 	lead specific process changes to help address those challenges.

We’ve based this report on more than 20 interviews with team members and stakeholders, review of multiple documents, and direct observation of the team’s work during April and May 2015. Several aspects of the team have changed since we started work (in particular, its membership shrunk significantly during the April 2015 reorganization); in this report, we generally describe the team in its initial state, and incorporate any recent changes into our recommendations.

Additional notes

 * 1) The VE team’s process for working with its huge number of stakeholders in the Wikimedia community is an extremely important issue, but is outside the scope of this report.
 * 2) Anything in this document is subject to change, so please shout at us if you think we got something wrong. However, most of our recommendations are based on personal conversations and observations, so we will deviate from the wiki norm and ask that you not make any substantial changes directly.

Key findings
The VisualEditor (VE) team was particularly productive in FY2015Q3 (Q3), January to March 2015. The high-functioning, high-morale core completed many changes to VE very rapidly, clearing many of the major preconditions for proposing a relaunch on the English Wikipedia.

Several process issues decreased the VE team’s effectiveness. Priority balancing and work planning with other stakeholders such as the Design, Research, Communications, and Analytics groups was not always effective, which affected the schedule and the quality of the product. A lack of process for identifying work at mid-scale (smaller than quarterly, bigger than daily) limited visibility and predictability. The process and overall responsibility for launch, as opposed to development, was not clear or consistent.

The VE team is likely to show a decrease in productivity from the Q3 peak, due to a reduction in its size, the end of its period as the top Foundation priority with commensurate support, and a removal of the do-or-die condition which motivated the team to work at an unsustainable level.

The VE team
In the context of this report, these are the WMF employees who spend the bulk of their time working on VisualEditor and usually attend daily standup meetings.

The VE team included, during Q3, seven full-time engineers, four part-time engineers, two QA testers, and one product manager. It keeps in close contact via IRC, Google Hangouts chat, two daily stand-up meetings, Phabricator and Gerrit comments, and occasional emails and video chats. The team has been directed collaboratively by James Forrester, the product owner, and Trevor Parscal and Roan Kattouw, tech leads, for several years. Ed Sanders recently took over the role of tech lead.

Two community liaisons are embedded in the team. They participate mainly by Phabricator, a weekly meeting with the product manager, and by observing daily standups.

The core VE team performs engineering on a weekly release cycle within a quarterly planning cycle. Day to day the Product Manager identifies and assigns the majority of tasks for the team, working from a dozen or more sources of potential tasks, although the team tends to see his assignments as suggestions which they accept because of their trust in his technical expertise and knowledge of their strengths, and engineers frequently take bugs for themselves.

QA performs a mix of automated and manual testing. Committed code approved by engineers is automatically launched unless a tester raises objections (as opposed to a process where changes must wait until approved by QA); the testers reported that they were able to keep up with the flow of changes. The Product Manager is deeply involved in the engineering process; often does hands on merging and reviewing of code, takes charge of closing tasks when a fix is merged and coordinating QA and release cycles.

Consulting stakeholders
These are the stakeholders who offer specific expertise and services to VisualEditor, but spend much of their time working on other projects. These include Research, Analytics, Communications, Advancement, senior management, and (before its break-up) Design.

The Design team had weekly scheduled meetings and some participation in Phabricator discussions. The Parsing team keeps in close contact with VE using Phabricator tickets, as well as a weekly sync-up meeting and ad-hoc conversations over IRC, email, and the wikitext-l mailing list. Other teams tend to have more sporadic, ad-hoc communication with the team.

Unit of work
The VE team’s standard unit of work is a task in Phabricator, which prompts a code change, which is ultimately released to the production servers. Work from consulting stakeholders in the Foundation generally arrives in quarterly and weekly meetings. Work from the community tends to arrive in a continuous flow of individual feedback and requests spread across dozens of possible locations; the community liaisons mediate and triage much of this flow. All these processes either lead directly to Phabricator or to the product manager, who ultimately adds accepted work to Phabricator directly.

Workflow
As of April 2015:





Recurring Processes
As of April 2015

Productivity
The team is productive by several measures. In Q3, it met its Q3 goals, successfully burned down a large number of key tasks, and delivered many high-value features including a 50% decrease in load time, major usability improvements, and the delivery of Citoid.

Prior to the April 2015 reorg, most the team’s members had worked together for years with very little turnover, and had developed deep expertise in multiple areas of the codebase.

The team generally feels that the the leads take a great deal of process off their shoulders, and appreciate that.

Morale
The VE team’s morale is very high. In the most recent Team Health Check survey, the VE team self-reported the highest possible health in eight of eleven areas, and second-highest in the other three. (Since the ratings are self-reported, this is a good indicator of morale but not necessarily of objective success.)

During interviews, core VE team members, even those who were remote or shared between teams, unanimously said that they enjoyed the team’s atmosphere. Specific strong points that they mentioned included: The fact that the team had had only one departure of any kind before the reorganization, and that most of the team’s members had been hired directly onto the team may have contributed to this positive atmosphere.
 * The team leads treat mentorship and building camaraderie as core responsibilities
 * There is a very strong culture of strict but friendly code review
 * All members of the team, including its leaders, feel comfortable taking responsibility for mistakes
 * All team members feel invested in the quality and success of the product
 * Compared to other open source projects, the team was very welcoming towards women.

Coordination between remote and co-located people
Almost without exception, remote members of the VE team said they felt satisfied with their inclusion on the team.

Likely causes include:
 * The team has always contained multiple remote members, not just one.
 * The team takes time to socialize during offsites and daily standups (one team member believes this is the main benefit of the standups).

Definition of success criteria
During Q3, the team successfully created specific, measurable goals which allowed them to more confidently assess their progress and better prioritize potential tasks, although not all stakeholders made ongoing use of the goal list. Several team members reported that the involvement of senior management in this process was helpful.

Technical condition
VisualEditor is in production and mature. Current “launch” activities mainly have to do with building sufficient consensus to make it the default for more people; the challenges of releasing and scaling a completely new product are largely in the past.

Major challenges
By gathering our opinions together with those of the team leads and of all stakeholders we interviewed, we identified three top challenges. Because many challenges are complex and overlapping, this is necessarily not a perfect list.

(1) Consulting stakeholders have difficulty engaging with VE’s development.
The core VE team and some consulting stakeholders sometimes reach oppositional viewpoints, with the core VE team seeing other participants as blockers, unable to keep up with the rapid pace, and other participants seeing the core VE team as exclusionary and unduly hasty. This was observed at both higher scale, such as quarterly planning, product definition and conceptual design, and at lower scales, such as day to day task completion and testing. This model conflicts with research and exploratory work, which may take months of planning and execution. This can lead to mutual frustration: for example, Design has expressed concerns that implementations are often not done to agreed-upon design specifications, which impedes their ability to test and refine their ideas. The core VE team, in turn, tends to make these on-the-fly implementation changes to address technical considerations, and doesn’t see how to consult the design team in greater detail without unacceptable delays.

Several standing weekly meetings have broad participation, but are not always effective due to time limitations and a lack of effective process  (e.g., agendas, separating information-sharing from problem solving). In between these meetings, the core VE team tends to work effectively but in isolation from other stakeholders. Periods of cooperative iteration between core VE team members and others are uncommon, and in some cases have been interrupted from within the core VE team by other priorities.

External participation via Phabricator, which is intended to be a single centralized forum for discussions about software, is particularly difficult for stakeholders outside the VE team. (This applies to community stakeholders as well as  consulting stakeholders within the Foundation).

First, the norms about what parts of tasks non-team stakeholders can change are not clear or documented. For example, if the VE team changes UI language at the request of Communications and closes the bug, but the Communications team notices that the implemented language differs from their draft in important ways, Communications is likely to bring up the matter by email or in person because they are unsure whether they are "allowed" to reopen the task to discuss those differences.

Second, holding discussions on Phabricator tends to be difficult because of the sheer volume of information that Phabricator contains and its lack of good notification features. Bugmail tends to be a firehose, drowning out useful information, and even Phabricator’s inbuilt notifications widget lacks important features like auto mark-as-read and intelligent prioritization. This makes it difficult for both Phab-centric teams like VE and Phab-light teams like Communications to rely on Phabricator to notify them when a discussion needs their input.

This means stakeholders can often engage with VE only through ad-hoc discussions and information requests, which can be particularly difficult because they are reluctant to bother or are unable to reach overstretched leaders like James.

The stakeholders who feel they lacked full voice included Design, Research, and Analytics. Other teams, like Parsing, generally expressed satisfaction with their communication with VE.

(2) The process for early-stage requirements and design decision-making is informal and incomplete.
VE engineering has tended to focus on releasing a continuous flow of small changes, often urgent bug fixes, on a daily or weekly time frame. This model has been very efficient at writing, testing, and releasing new code for VE. However, it has been less effective at drawing in the broad range of viewpoints necessary to help define and prioritize problems and brainstorm solutions—especially at scales smaller than quarterly goals and larger than individual tasks.

For this reason, consulting stakeholders, including non-Foundation community members, have often felt limited in their input and visibility into the process of brainstorming and iterating new features. This includes problems of visibility: it is difficult to learn, understand, or communicate the state of the roadmap/backlog on a month-to-month basis. This also includes problems of input: The process for deciding engineering priorities at the scale of weeks comprises the Product Owner and lead developer making tactical decisions. Because this limits the amount and quality of information available to the decision makers, it isolates other stakeholders from this decision and its consequences.

The VE development model is organized around Phabricator tasks which tend to comprise either well-defined new features of limited size or bug fixes. However, this model has not been effectively extended into earlier stages of product work, such as requirements and design, which are managed in less structured ways and are frequently not tracked in Phabricator. Some high-level tasks exist, but they aren’t used consistently for all roadmap items, and that there's no way to see just broad tasks. This not only limits external participation but also constrains attempts by the VE team to plan effectively. In addition, the sheer number of Phab tickets, and the inability to break them out by scope makes it very difficult for an consulting stakeholder to find the roadmap information that Phabricator does contain.

(3) The team has a high reporting load which may no longer be justified.
During FY2015Q3, the Foundation-wide focus on VE meant the team received both additional support and greater demands for reporting to senior management. The team leadership began doing detailed estimation via story pointing and burnup charting, writing weekly status reports, and leading large weekly status meetings. After the end of the quarter, the support ended and half of the engineers and QA were reorganized out of the team, but the reporting burden has not decreased.

The team leadership feels that both granularity and frequency of this reporting needs to be reassessed, both because it has a significant time penalty and because much of it is duplicative, which invites inconsistent input. In addition, they worry that making one team the primary focus of the entire Foundation, even if that focus is more a burden than a privilege, can lead to culture problems.

Additional challenges
We identified additional areas where improvements can be made.

Rough workflow within the core VE Team
While non-engineer members of the core VE team, including QA, Documentation, and (since the re-org) Design  largely feel included, their workflow is not as smooth. This has been identified as a problem by many people, including engineering members. In part this is because Phabricator and Gerrit do not track the complete workflow process; while much of it directly follows automated paths, some parts of development are instead maintained and executed through manual processes, shared understanding, and workarounds. For example, on the team, QA testers track what needs to be tested by a mix of manually-updated Phabricator tags (projects), comments on individual tasks, and direct conversations with team members.

Consequences include time-consuming workarounds to track work, and extra communication and delays as people have to ask each other about unclear task status.

Estimation and forecasting
The VE team (like most software development teams) has a limited ability to estimate and forecast the precise amount of engineering output required to produce certain outcomes.

Recently, the work required to meet VE’s launch objectives has been higher than estimated in several areas, most notably work required to support large-scale A/B testing and to pass the usability test criteria. In particular, the work required to get VE to reach preliminary usability targets took many more iterations than expected. This was exacerbated by communication and trust issues between VE and some of its consulting stakeholders. The consequences include unplanned schedule delays, a lack of visibility of status, and friction between stakeholders.

Executives express a general preference for better forecasting tools, although the priority of this need relative to other visibility issues and to cost is not clear.

Leadership
The high-level decision-making process for VE can be diffuse. In consequence, some decisions can be made in one group, get separated from their origin by, for example, being documented or repeated without a source, until they become fixed assumptions that the broader team is bound by but cannot reexamine. In addition, this diffuse decision-making may become a greater challenge if the changes made so far to VE, contrary to expectations, still fail to secure sufficient community acceptance. It is not clear who is in charge of planning for and building understanding of such contingencies, or communicating these changes to other stakeholders.

Mismatch of roles and team needs. In the current VE development process the Product Manager assumes a broad range of duties. Some are part of traditional Product Management, such as triaging requests, defining requirements, and designing solutions. Other required duties, such as release management, group facilitation, managing developer work allocation, forecasting, reporting, planning, and scheduling, extend the position beyond what any person can focus on simultaneously. In addition, some of these responsibilities tend to be contradictory (for example, it’s difficult to simultaneously maintain the decisiveness required for product decision-making and the neutrality required for group facilitation) and so are better performed by different people.

Responsibility for the VE launch as a distinct, one-off project fell to the VE Product Manager by default. Due to time and availability constraints, not all VE launch activities were effectively coordinated across all parties, which may have delayed the launch and increased cost.

Productivity
The FY2015Q3 productivity, in terms of completing large numbers of code changes to VE while maintaining quality standards, is not sustainable. The Foundation designated VE its top priority and made more people and attention available, which will not continue indefinitely. The April reorganization, which roughly halved the number of engineers and QA on the team, came on the heels of Q3, when core VE team members worked unsustainable hours.

Next Steps
See VisualEditor Team Process#Plans for the overall process around this report.

Appendix: Resources

 * Interview List
 * https://www.mediawiki.org/wiki/Wikimedia_Engineering/2014-15_Goals
 * Preliminary A/B test timeline
 * https://www.mediawiki.org/wiki/Wikimedia_Product_Development/Product_Development_Process/Draft