Wikimedia Technology/Annual Plans/ERF OKR: Culture, Equity and Team Practices

= Culture, Equity and Team Practices = FY21/22 Organization Efficacy & Resilience OKR for Wikimedia Technology Department

Accountable: Corey Floyd 

OKR Overview
Culture, Equity and Team Practices: Modern and Inclusive Practices. Objective: Modern software practices and inclusive communication practices will remove or significantly reduce barriers to entry and collaboration, and streamline deployments for all technical contributors in the wiki ecosystem, enabling a faster path to innovation.

Key Result 1 At least one user-facing service or feature, and all its supporting infrastructure services, are covered by SLOs, and those SLOs are used to drive engineering decisions including when to roll back deployments. For all supporting services within this slice developed at the Foundation, including MediaWiki, change failure percentage is reduced by 50% while keeping the deployment frequency steady.

Key Result 2 100% of non-inclusive language is removed from our documentation and code repositories according to https://www.mediawiki.org/wiki/Inclusive_language list.

Key Result 3 Deliver 4 milestones towards one recommendation resulted from the Effective and Responsible Communication discovery phase (by Q4). list.

Key Result 4 100% of development teams across Technology and Product departments build using the DEI framework



Objective Rationale
By improving our software practices, developing inclusive communication practices and implementing the product DEI framework we will be able to respond more quickly to use needs and build technology actually representing and for the entire world. New developers will be able to more quickly on-board and everyone will be able to feel like they can do their best work at the Wikimedia Foundation.



Key Result 1: SLOs and Change Fail Rate
''At least one user-journey, and all its supporting infrastructure services, are covered by SLOs, and those SLOs are used to drive engineering decisions including when to roll back deployments. For all supporting services within this slice developed at the Foundation, including MediaWiki, we measure and track lead-time as an indicator of the objective of maximizing the speed of innovation.''

Accountability for this Key Result is assigned to Mark Bergsma

Intent and Desired Outcomes Optimizing speed of delivery while ensuring good enough reliability We’d love it if all our systems were perfect, they responded instantly and worked 100% of the time, but we also know that’s unrealistic, and chasing that target of perfection becomes exponentially expensive as well. By choosing specific objectives based on what is important to our users, we can aim to keep our users happy, and still be able to prioritize and accelerate other work as long as we’re meeting those objectives. If the performance starts to dip down toward the threshold, objectively we know it’s time to refocus on short-term reliability work and deprioritize other work. And by breaking up our complex production landscape into individual services, each with their own reliability objectives, we know specifically where to focus that work.

We are working on introducing the modern SRE concept of Service Level Objectives (SLOs) and Error Budgets at the Wikimedia Foundation. This will allow us to define and objectively agree on reliability and performance targets between teams and continuously maintain the best balance between velocity and reliability at any point in time to optimize our impact and make best use of our limited resources.

With SLOs in place we can identify and track meaningful indicators of the speed of innovation while maintaining desired reliability. One such indicator is “lead-time”, one of the 4 key metrics of Accelerate (deploy frequency, mean-time-to-recover, change fail rate, and lead-time). By tracking lead-time alongside our quarterly SLOs, we can more accurately determine if the supporting improvements we’re making are having the intended effect.

Definitions
 * SLO: A Service Level Objective (SLO) is an understanding between teams about expectations for reliability and performance, for what is considered “good enough” from the perspective of the user. An SLO is a target value or range of values for a service level that is measured by an SLI. A natural structure for SLOs is thus SLI ≤ target, or lower bound ≤ SLI ≤ upper bound. (e.g. More than 99% of all request are successful)
 * SLI: An SLI is a service level indicator—a carefully defined quantitative measure (metric) representative of some aspect of the level of service that is provided. The measurements are often aggregated: i.e., raw data is collected over a measurement window and then turned into a rate, average, or percentile.
 * Error Budget: The error budget for a service is the the available “headroom” left for service unreliability for a particular time period (a month or a quarter); at the start of the period the error budget is equal to: 1 minus the SLO, i.e. the space between perfection and the reliability target we have committed to. A service with a 99.99% availability SLO, starts the period with a 0.01% error budget for unavailability, which can be spent during the period to move faster, by taking a bit more risk.
 * 4 Key “Accelerate” Metrics
 * Lead-time: The amount of time it takes a commit to get into production.
 * Mean-time-to-recover: How long it takes to recover from a failure in production.
 * Change failure rate: The percentage of deployments causing a failure in production.
 * Deployment Frequency: How often we deploy to production.

Activities & Deliverables After introducing the concepts of SLOs and Error Budgets to the organization last year, and experimenting with and developing the process and tooling for it through the definition of SLOs for some backend/infrastructure services, we are moving to the important next step: bringing SLOs to one user interaction or service used directly by our worldwide user base. In a collaboration between Product and Technology, we will select the user journey to focus on, and establish SLOs representing the most important reliability and performance aspects that users care about, in order to drive engineering decisions going forward. This process will allow us to maximize our development velocity while ensuring reliable service.

We expect that we will undertake the following activities in fulfillment of our objective:
 * Establish SLOs for a user-journey
 * Identify the user journey and define what is in scope
 * Between relevant Product and Technology teams, we select a single common, important workflow (sneak peek: user saves a wiki page edit), and carefully define which functional aspects and parts of the user experience we consider in scope for the definition of service levels.
 * Identify, establish and measure SLIs for the user-facing service
 * For the selected user journey and within the defined scope, led by Product insights we analyze the reliability and performance related aspects of the service having the biggest impact on the user experience, and identify a representative set of metrics (Service Level Indicators) that we will be capable of measuring, in the following quarters, going forward.
 * Reach agreement on targets for the user-facing SLIs, and publish the resulting SLO
 * With the representative set of SLIs defined and actively being measured, and some initial data in hand, we will use Product insights and available data to establish shared targets in the form of Service Level Objectives. These SLOs will be monitored and reported on a continuous basis going forward, and be used to drive engineering decisions such as the prioritization of related work.
 * Identify and scope a set of supporting infrastructure services, and publish SLOs for each
 * While the reliability and performance of user-journeys - as perceived by our users - are the key metrics to measure and optimize for, it’s also important to realize that in most cases they are supported by and dependent on a large and complex layered stack of infrastructure and services underneath. We also need Service Level Objectives defined for some of these individual infrastructure services to be able to investigate and pinpoint problems when we are (in danger of) not meeting our global targets, so we know in which areas we need to prioritize reliability work, or move slower and take less risk.
 * Establish SLO culture and practices (stretch goal)
 * Document process for quarterly reporting on SLO performance
 * At the end of each SLO monitoring quarter, we evaluate our services’ performance against the targets. The results are presented at the Tech Dept. tuning session, providing transparency and accountability for taking our published SLOs seriously. Presently this is fairly manual and ad-hoc; formalizing this in a more mature process will ensure the entire SLO life cycle, including end-of-quarter reporting, is sustainable and dependable.
 * Document process for iterating on published SLO definitions
 * Our published SLOs represent formal, enduring commitments -- but over time, some SLOs need to be tightened, reformulated, or (rarely) relaxed to reflect changes in our production landscape. We’ll develop a process for reviewing changes to SLOs, ensuring that we iterate responsibly in an environment where users, product teams, and other service owners, are depending on the commitments we’ve made.
 * Document process for making engineering decisions based on SLO outcomes
 * Measuring SLO outcomes allows us to prioritize reliability work when necessary, and accelerate development when we have error budget to spare. In order to take full advantage of this ability to change stance, we’ll incorporate SLO outcomes in our documented decision-making process for rollbacks, for incident response (see the Resilience OKR), for quarterly planning, and others.

Resourcing



Key Result 2: Removing Non-inclusive Language
100% of non-inclusive language is removed from our documentation and code repositories according to https://www.mediawiki.org/wiki/Inclusive_language list.

Accountability for this Key Result is assigned to Kate Chapman

Intent and Desired Outcomes Work has already been done to remove some of the non-inclusive and problematic language from our documentation and code repositories, but we are not all the way there. The main goal of this activity is to remove such language. This work can be performed in a relatively short period of time, the key is getting the attention to get it finished.

Definitions
 * Linter: A script that runs on code check-in
 * Non-inclusive language: Any language with racist, sexist, ableist or other non-inclusive connotations

Related Quarterly OKRs
 * Q2: Plan for sprint to remove non-inclusive language. Presentation to leadership to Tech and Product to get agreement for proposal winners to have the time available to work.
 * Q3: Proposals for sprint
 * Q4: Sprint completed
 * Q4: Retrospective and outline of further work that is required (if any)

Activities & Deliverables Prior to solicitation for proposals senior management buy-in is crucial for this sprint to be a success. If technologists don’t have time to work on their proposals then this work will fall flat. In Q2 agreement and support for this work will be sought from Product and Technology Leadership. Development of a punch list of all the areas to cover will be developed during this time through consultation across the Product and Technology Departments.

One of the difficulties of this activity is it is everyone’s responsibility to use inclusive language, so it is also nobodys. To complete removing problematic language from our documentation and code base we will need to focus to complete it. To do that the Wikimedia Foundation is going to run a focused sprint. This will begin with an opportunity for technologists within the WMF to propose projects to work on this issue. People are encouraged to work together and after proposals are submitted the proposal review team will work together to introduce similar projects to each other. Those that are selected will be given 2 weeks within Q4 to work solely on their project.

The key areas of work proposals should come from are:


 * Removal of language from documentation
 * Removal of language from code
 * Prevention of language coming back into documentation
 * Prevention of language coming back into code
 * Other ideas that contribute to inclusive language

Any areas of the punch list not covered by proposals will need to be resourced in another way. It will be determined if contracting or setting aside further team resources should be utilized depending on what is remaining on the list.

Resourcing



Key Result 3: Effective and Responsible Communication
Deliver 4 milestones towards one recommendation resulting from the Effective and Responsible Communication discovery phase (by Q4).

Accountability for this Key Result is assigned to Leila Zia

Intent and Desired Outcomes To build an environment where everyone can do their best work we must improve our communication culture and practices. After an extensive discovery phase (FY21) with the Technology department during in partnership with consultants we now have specific recommendations to build towards effective and responsible communication. While improving communications and culture change will require work from everyone in the department everyday that we come to work, through this key result we intend to mobilize department-wide resources towards implementing 1 actionable recommendation from the list developed during the discovery phase.

Definitions and Scoping
 * Definitions
 * ERC -- Effective and Responsible Communication or Effective and Responsible Communication project, as relevant.
 * Scoping
 * The focus of the ERC project is the Technology department within the Wikimedia Foundation. We intend to keep this focus during FY22. It is also important to emphasize that while the recommendations from the discovery stage are numerous (12 Foundational commitments which will involve work to commit to as well as 21 actionable recommendations), our commitment for the first year of implementation is the implementation of 4 milestones towards 1 recommendation. Any additional implementations should be considered in light of the availability of resources and time within the department, the changes in resourcing and capacity in Talent and Culture, and only after we make significant progress towards the first implementation and we feel confident that we will conclude it.
 * The scoping of this work is highly related to the issue of trust that has been brought up by some during the ERC discovery stage. There are folks in the department who are concerned that we will not take action on the recommendations. We can gain trust by making sure that our actions match our words. This includes not overburdening the department by picking up too many efforts at the same time that we don’t have the collective bandwidth to contribute to.

Related Quarterly OKRs
 * Disseminate the ERC Discovery Report in the Technology department, engage leaders and teams with the report, determine the top department priorities [Q1-Q2]
 * Build the ERC team, identify the vendor to work with, secure contracts [Q2]
 * Establish a portfolio of metrics and baselines (as they relate to FY23 annual plan) to measure the impact of the communication culture change [Q3]
 * Implement 1 recommendation [Q2-Q4]

Activities & Deliverables Activities: Deliverables:
 * All department staff read and engage with the report through team level discussions.
 * VPs create alignment within the Director level to move from the Discovery stage to Implementation.
 * Managers organize team level reflection sessions and facilitate conversations for their teams to choose their top 3 priority areas.
 * The ERC team develops a process for recruiting more staff from across the department.
 * Managers nominate members to join the ERC team.
 * The ERC team conducts feasibility assessment on the recommendations and develops a process to choose the top 3 focus areas for the department.
 * The ERC team develops norms and processes for collaboration towards the implementation of 1 recommendation during FY22.
 * The ERC team develops a roadmap for the rest of the year including the FY23 annual planning process.
 * The ERC team (in collaboration with Talent and Culture and/or vendors) develops a portfolio of metrics to track for the project informed by the metrics developed through the discovery stage.
 * The ERC team sources external vendors if relevant to assist with the implementation of 1 recommendation.
 * The ERC team actively coordinates with other bodies within the organization active in the culture space, particularly Talent and Culture.
 * A fully staffed ERC team with clear and sustainable norms and practices.
 * A clear roadmap for the remainder of the year and the implementation of 1 recommendation.
 * A portfolio of metrics to track the progress towards effective and responsible communication.
 * 4 milestones towards 1 implementation (which may mean fully implementing 1 recommendation or making significant progress towards 1, depending on the complexity of the recommendation chosen).
 * A plan for FY23 activities developed as part of the annual planning process.

Resourcing



Key Result 4: Product Framework
100% of development teams across Technology and Product departments build using the DEI framework.

Accountability for this Key Result is currently TBD (working with Carol Dunn and the Product DEI Group to establish accountability)

Intent and Desired Outcomes Without a more codified and intentional approach to product development we would be in danger of not being inclusive of historically overlooked/marginalized groups. A DEI-centered product development process is critical to achieving our product strategy.
 * The growth, or atrophy, of the Movement depends on our successful engagement with emerging communities
 * A crucial tenet of this strategy is empowering others by building ‘with’ and not ‘for’

This Framework codifies an approach to product development that will ensure our products are as inclusive as possible to the widest range of audiences. Community-centric: enabling welcoming, vibrant communities where new and experienced people come together to create, share, and discover knowledge through collaboration. Usable for all: promoting equity through usable, useful, and inclusive tools and services that meet the needs of a wide range of people and machines across user experiences. Intentionally transparent: demystifying the knowledge-creation process and encouraging participation by giving everyone visibility into how information is created, verified, and improved over time. Extensible and sustainable: creating the conditions for people and machines to use, reuse, and build on top of our platform, extending free knowledge and supporting a sustainable future for Wikimedia.
 * A set of shared principles that will guide decision making
 * A shared process that extends what we do already to account for more intentional phases and checkpoints along the way
 * A set of shared tools that every team can use to activate this process in their own context

Definitions and Scoping Product Principles reflect our goals for the product itself: the what and the why Product Development Principles are values-infused guidance about how to build products that can be applied to every function, level, context and language - bringing the team into alignment on the change we want to see in the world

Related Quarterly OKRs TBD

Activities & Deliverables
 * Pilot the framework with a small number of teams
 * Gather feedback
 * Iterate and refine
 * Roll out to the entire team