WMF product development process/Proposal

=Concepts= We can consider Foundation development work within several, complementary models:

Input, Output, Outcome
With planning and oversight Input can be controlled fairly directly. Outputs can be prioritized and forecast. Outcomes, the changes to the world as a result of the Outputs, but also affected by factors beyond our control, can only be affected indirectly. This model is somewhat fractal in its application: it can give some understanding of a multi-year process, and also of one-hour task. '''We define success by Outcomes, but Outcomes are only indirectly caused by Inputs and Outputs. '''

People, Process, Priority
In this model, People (and money and time and other resources) are the primary input to the process, and Product is the primary output to Product. Prioritization affects all of these. All of these activities are interdependent and overlapping, so this is more of a high-level conceptual model that doesn't necessarily translate to specific daily activities. Changing all areas of WMF's product development process may be a self-defeating approach, so we may want to pick one area to focus on. However, we can't fully isolate any area from the others.

Fixed Teams vs Variable Teams
In the fixed-team model on the left, each team has a fixed composition, and work is parceled out to different teams. In the ad-hoc team model on the right, teams are tied directly to projects, so as a project expands, the team expands. A benefit of fixed teams is that, since teams develop trust and common language and understanding over time, stable long-term teams may be more productive than volatile teams. However, they may be slower to adapt to changing product needs.

In Engineering, WMF uses fixed teams in "verticals", each with their own area of focus. So a team may take on several projects in series or parallel, and most of these projects cannot be moved between teams. Many other, "horizontal" teams in the Foundation support Product work, for example Design Research, Communications, and Community Liaisons; this happens both by horizontal people being embedded in Engineering teams, and by horizontal teams doing project work for multiple verticals.

The process of assigning a person to a project in a variable-team model has the equivalent, in a fixed-team model, of getting a team to accept a new project.

Tomasz: this means that products will be based on teams rather than product.

One List


The diagram above shows how many different stakeholders want changes to VisualEditor; the Product Manager collects and prioritizes all of this input. This model is used to some degree by most Foundation Product teams. Katie: That's great if there's already a Product Manager. For the Education program, with no product manager, the work that’s been done has been through begging. Trevor: Vertical managers (Toby, Wes, and Trevor) play that role. Luis: the product box can be incredibly opaque if you’re outside of engineering.

Teams can have many different sources for work, such as high-level goals, product requirements, functional specifications, emails, community feedback, in-person requests, and Scrum story backlogs. A team with multiple contradictory lists cannot prioritize effectively, and may waste extra time on context switching. Consolidating to one stack-ranked list forces tradeoffs to be explicit, which will disappoint some stakeholders.

The diagram above shows that, for VisualEditor, a wide range of stakeholders want changes to VisualEditor; almost all of this input is consolidated by the Product Manager. This model is common to many Foundation products. Katie: Great if there's already a Product Manager.

Most Engineering teams maintain a backlog of tasks in Phabricator. However, this may not capture all of the requests from diverse sources that a team is considering, anticipating, or surprised by.

Some Definitions of Team
Tomasz: How does the Foundation create new teams? Trevor: case by case. Maybe defining teams is appropriate; people change managers every quarter based on team makeup, but stable management structure.
 * Set of people with the same manager
 * Set of people who expect to share most of their daily task information with each other, e.g., have the same informational meetings.
 * Set of people who are working together toward same goal.
 * Set of people who are sharing a single work queue.
 * Since a team, or even a single person, can do more than one thing at once, a team can work on more than one thing on a list at once, but it is still useful to maintain a stack-ranked list to help the team stick to lower, more efficient levels of work-in-progress.

Maintenance vs New Feature
The Foundation can categorize work by maintaining existing products and services for the movement versus creating new products and services. See Team_Practices_Group/Measuring_Types_of_Work. Each Engineering team devotes a varying proportion of its effort to maintenance versus new functionality.

The landscape of possible work
The Maintenance vs New Feature dichotomy is complicated by the fact that it's not always obvious what will and will not succeed. The Foundation can categorize projects as: The graphic below very roughly illustrates this concept; in particular, imagine that the landscape is mostly invisible so that new projects don't initially know if they are in a hill or a valley.
 * capitalizing on existing success. For example, adopting a proven technology to reduce downtime.  Investing $1million to reduce an annual bill from $2m to $1.5m.
 * exploring new possibilities. For example, researching and experimenting with three different new technologies to see which might reduce downtime.
 * Expanding on a new feature to see if it's as effective at is seems like it should be.

Timeboxing vs feature boxing
Timeboxing is a prioritization technique in which a project is given a fixed amount of time. Whatever is complete by the end of the time period is released if it meets minimum release criteria. A decision is then made whether to proceed for another fixed time period, and if so, with what priorities.

Feature boxing is the opposite, in which a project is continued until all necessary features are complete; thus, the time to complete a project can only be predicted, not specified.

It is not generally possible to guarantee both a feature set and a time, even if resources are unconstrained. This is like guaranteeing an outcome; we can make a prediction that supplying a certain input is very likely to produce an output leading to the desired outcome, and we can become arbitrarily certain (60%, 99%, 99.9%) by selecting inputs that forecast to a high level of confidence, but the cost increases dramatically for higher certainty.

Product Lifecycle


Software development work comprises fairly consistent and distinct phases, each with a different character: Initiation, Development, Release, and Maintenance. Closing is also often considered a standard phase.



Each of these phases has specific typical work and involves different people with different skills and experience in different proportions.

Tomasz: The product lifecycle makes sense for developing, but what about staffing for maintaining?

The number of people that can contribute efficiently to a product varies consistently through the phases. For example, only a few people can productively work on a product in initial phases, doing work such as product definition, demand research, prototyping, and exploratory design research. The mix of engineering teams, non-engineering teams, and Movement participation also varies by phase. If a new product or service is released and succeeds, this will likely permanently increase the staffing needs of the Foundation.

This model remains useful even as product sizes change dramatically.

Definition
A milestone can be used to organize work. A typical software milestone has Success Criteria, which are the conditions under which the milestone is achieved. Each success criterion indicates who (person and team) are responsible for judging it.

Terry: will it be possible to clarify the milestone definition? Is a Milestone an output or an outcome? Joel: Option 1: Maybe both, depending on what level and what it's used for. Option 2: Milestones could be just Outputs, and OKRs may be better suited to represent desired Outcomes.

As a nexus of functionality, priority, and people
The people who decide the definition of a milestone (the success criteria and their judges) may be different from the people who judge the success criteria. The people who complete the work to achieve a milestone may differ from each of the other sets of people.

Bootstrapping with Milestones


A Milestone can be used to bootstrap into a process that otherwise seems circular or contradictory. Example 1: Initiate milestone that proves next 2-4 milestones. Example 2: Pull a medium-sized project bundle off the shelf and see resource assignments for next 5 months.



Bundles
Milestones can be re-used for work of similar scope. They can be sequenced and grouped.

Compatibility with common development processes
Milestones are used in most common software development processes.

Waterfall
Each waterfall project is defined completely by a set of fairly standard milestones (such as Requirements Complete, Design Complete, Code Complete, Testing Complete, Alpha Released, Beta Released, Final Released).

Scrum
In Scrum, milestones are used in the product backlog, one level higher than Scrum backlogs, to group and prioritize stories.

Kanban and Continuous work
Milestones are not a natural fit, but can be stretched somewhat to define maintenance work, e.g., "Milestone: The servers stayed up with 99.9% uptime in Q3."

Libraries of Milestones
A library of reusable milestones would use some metadata to identify and group milestones and match them to new needs.

WMF may not have consistent enough processes for a Milestone Library to work well. For example, the Foundation initiates very large projects (like VE) only every 2-5 years, with a different process each time.

Size and Depth of Processes
The size and effort of process should be commensurate to the scope of decisions being made.

Alignment
How do we align the different parts? How do we make sure that people are working on the stuff in Phab, and that the stuff in Phab lines up with the Goals, and the Goals line up with the mission?



Should we even have everything in Phab? Tomasz: we track development team tasks. Joel: we don’t track 100% of the work. This meeting isn’t in Phab. We don’t track everything. May be overkill to try that level of tracking. How do we account for engineering management work? Recursive problem.

Follow-up
The main tool to align different levels is a review meeting, where we examine everything at one level against a list of categories at a higher level. How often should we do that?

=Current state of WMF Engineering=

Task tracking and planning


Engineering teams have processes (mostly Scrum/Kanban), and they work at the micro level. But at the level of foundation-wide processes and resource decisions, the processes don't align well. Teams set their own backlogs, filtered by product people. The range of inputs spans bottom-up (community tickets) to top-down (quarterly goals). Adding work to a backlog is the de facto way of assigning WMF resources to a task, so that’s a bottom-up, semi-organic process. For most teams, that goes through a Product Manager or team lead, synthesizing many inputs from many levels of hierarchy.

A team accepting work into the backlog is the bottom-up equivalence of WMF assigning the people on the team to tackle the work. But teams petition for permanent and temporary people, which is a top-down process. Neither process is perceived as consistent, accessible, or transparent.

Teams are juggling new work and maintenance.

Work assignment doesn’t align very well with resources

Goals
Team Goals are set at the manager level on a quarterly process.

Example: FY2015/2016 Q1 Goals process

 * Week Ending 6/7    - Lila’s Direct Report (level 1)
 * Week Ending 6/14   - Level 2
 * Week Ending 6/21   - Level 3 and Individual Contributors
 * Week Ending 6/28   - Staff Goals
 * Sep 29 to Oct 1: Presentation Clinics
 * October 5 through Oct 8: Quarterly Goal Review Meetings

Alignment of Goals
Those goal lists do not include all of the work that everybody is doing; i.e., they don't map precisely to Phabricator. they don't necessarily synchronize with the Foundation's goals overall, which in turn don't perfectly reflect the Movement's needs (and can't, since the Movement is too big and diverse to have a single, consistent set of requirements). As we get away from maintenance, it’s not clear we get to “what the community needs”. We’re good on the “lights on” stuff, and less accurate at broader scopes of work. Discussion: "We don’t know what the community’s priorities are." "We could optimize for different questions." "We need to build a consensus on what we’re trying to fix."

How important is it for the high-level goals to align with the bottom-up work? Terry: I would question whether tasks and goals connect in a tightly coupled way. Trevor: the goals that we set only represent a part of what we do.

Joel: We don’t have a process for teams to measure what X% isn’t covered in our accounting. Terry: We need to get goals that relate to OKRs. Goals and task tracking are typically separated rather than tightly coupled. Joel: Agreed that we shouldn't try to track 100% of everybody's time in Phabricator/goals/etc, but what how tight should it be? Do we want a warning when the amount of work a team is doing that doesn't track to any of their goals exceeds x%? A sign that either the goals or the work is off track? Trevor: Managers handle this by leaving room in the workload for engineers to have flexibility. Katie: I have a massive backlog of stuff I want to do to give us more visibility, but the pace moves quickly, we haven’t had the chance to make our own stuff nice for ourselves. We can’t just decide we just want a new payment method. Terry: I think you can overregulate. we have to build slack into the system. Scale the overhead to the decisions. Trevor: precision needs to be calibrated. You can get so obsessed with precision. Katie: I’ve seen arguments whether we should do X last longer than doing X.

Two models of how tightly goals should tie to daily work: Slack model - don’t book every minute. Prioritization framework: Tie goals at least in bulk. E.g., be able to report that 50% of work last quarter was on Goal 1, 20% on Goal 2, 10% didn't match to any goal, and 20% was explicitly overhead.

Output vs Outcome


Currently we don't differentiate clearly between output and outcome in setting goals and OKRs. Trevor: just look at our quarterly goals. Some teams tend to do more of one or the other.

=Modular Milestone Planning Model=

Key points:
 * Use Milestones to define Foundation’s scope of work (the “X Board”).
 * Adjust Quarterly Planning so that changing scope is inseparable with changing staffing.
 * Use both Outcome Dashboard and Output Dashboard in evaluation.

Coming soon: Slides 44 to 61 from SPDPP Proposal