Code stewardship reviews

From mediawiki.org

Goal: This page outlines a process for dealing with what to do with under-, un-, or not clearly maintained code and services in use in Wikimedia "production."

Why: New code (features, services, infrastructure) is deployed to Wikimedia production frequently. This is great! Our users get new features, our developers get new services, and generally everyone is aware of the tradeoffs of maintaining something new. Sadly, the world changes over time; the underlying platform may evolve, security patches need to be applied, the hardware may need to be decommissioned/refreshed, institutional knowledge can get lost, new services make others redundant, etc....

What: Below is a method to identify, prioritize the severity of, review, and potentially undeploy or re-invest in inadequately maintained code; focusing on the decision-making process.

If it is decided to undeploy the code (means: removing the code from Wikimedia servers; also called "sunsetting"), the mechanics of undeploying will vary based on when and what is being undeployed. The owners of that undeploying will be jointly determined by the Wikimedia Foundation's CTO and CPO.

Process[edit]

tl;dr: Components are proposed and a rubric is filled out, based on that rubric the Technical Debt Program Manager adds it to a prioritized list, that list is submitted to the CTO/CPO on a once per quarter basis, as needed.

  1. Propose a component, extension, or service to be reviewed.
    1. How: Check that there is no task yet in the #code-stewardship-reviews Phabricator project. If there is no task, then create a task and include, at minimum, the name of the software, links to repository/documentation pages, and any relevant initial/free form discussion of undeploy worthiness.
    2. For example: you use a service that has no clear active maintainer and that service is starting to degrade
    3. Projects can be proposed by anyone at any time.
  2. Fill out the rubric for the project.
    1. The rubric (see below) will be filled out in the public Phabricator task.
    2. This can be done by anyone with an interest to do so.
    3. Who: All strongly-related stakeholders must be notified. This includes (where identifiable): current/past maintainers or major-code contributors, current highly-active users of the project (at the primary relevant talkpage and/or mailing list).
  3. Upon completion of the rubric, the Technical Debt Program Manager will add it to a prioritized list.
    1. Note: The Technical Debt Program Manager can decide to not put something on the prioritized list and instead work with any relevant teams/potential owners directly.
  4. Submission to the CTO/CPO for review and consideration
    1. List of projects and their review status
    2. This happens once per quarter on an as-needed basis, with any that need to be addressed in time for the Annual Plan to be submitted at least 6 weeks prior to the deadline for Annual Plan program draft proposals
    3. A 3 week community consultation period will begin at the same time (to be completed at least 1 month prior to AP program drafts deadline) for components that could be potentially undeployed.
      1. The consultation is to increase awareness of the discussion and to get any remaining feedback. This is not a vote nor a consensus making exercise.
    4. The CPO and CTO, directly or through delegates, will jointly decide what to do with the components on the prioritized list (eg: funding the ongoing maintenance or undeploying of the component, and incorporate into the Annual Plan).
      1. Delegation of the details of this will go to a relevant Product Manager and/or Code Steward.
    5. This is the point at which considerations of fit to the strategic direction and core mission are assessed.
    6. NOTE: if circumstances arise where something needs to be reviewed outside of the Annual Planning process then it is still submitted to the CTO/CPO and they will be responsible for any changes to priorities.
  5. All completed reviews are archived on wiki.

Rubric[edit]

All entries in the rubric could be augmented with commentary that clarifies/explains, especially for the pure numbers focused entries.

Note: A purely numerical decision will not be made on the below items. It is a part of a holistic decision making process.

  • A succinct problem statement to give context for why the review was initiated.
  • Entry in Developers/Maintainers with:
    • Code Steward
    • Maintainer (non-WMF team)
    • In-training
  • Number, severity, and age of known and confirmed security issues
  • Was it a cause of production outages or incidents? List them.
  • Does it have sufficient hardware resources for now and the near future (to take into account expected usage growth)?
  • Is it a frequent cause of monitoring alerts that need action, and are they addressed timely and appropriately?
  • When it was first deployed to Wikimedia production
  • Usage statistics based on audience(s) served
  • Changes committed in last 1, 3, 6, and 12 months
  • Reliance on outdated platforms (e.g. operating systems)
  • Number of developers who committed code in the last 1, 3, 6, and 12 months
  • Number and age of open patches
  • Number and age of open bugs
  • Number of known dependencies?
  • Is there a replacement/alternative for the feature? Is there a plan for a replacement?
  • Submitter's recommendation (what do you propose be done?)

Sources of data for the rubric:[edit]

Special Roles[edit]

Technical Debt Program Manager[edit]

  • Is the owner of the process, the prioritized list, and submission to the CTO and CPO.

Senior Leadership of Product and Technology[edit]

  • Reviews the prioritized list in time for Annual Plan draft programs submission deadline
  • Reviews exceptional cases as needed

FAQ[edit]

  • Is undeploying the primary/expected outcome of this process?
    • No. Although undeploying may be one of the outcomes, it's not the only or even default outcome. The objective of this process is to discuss and decide upon a course of action for under/un funded components, extensions, and services currently in "production".
  • Is this process the only way that things should be undeployed at the Foundation?
    • No. For instance, a product team trying out experiments should not be forced into this process to stop an experiment. Additionally, migrating to a new tool or product (and deprecating the older) is not inherently covered by this.
    • The purpose of this process is to address the forgotten products and services.
  • Should there be an "owner by default" (eg: all "user facing" things are to be decided by Product)?
    • With regards to the decision making: That would be the Technical Debt Program Manager.
    • With regards to the actual maintenance/undeploying actions: That would be decided by the CTO/CPO.
  • What about components owned by the community (both funded and non-funded teams/individuals)?
    • A component being owned by, for instance, a completely volunteer developer is not an inherently a negative trait. All ownership qualities will be assessed alongside other parts of the rubric.
  • At what point to involve community and get input?
    • Note: All of the process is public (on Phabricator) by design except for the CTO/CPO consultation deliberation.
    • The explicit wider community consultation happens during the CTO/CPO review (see step 4 in the process).
  • What should/can we do differently as an organization to reduce the number of future undeploying decisions or accrual of new technical debt?