Core Platform Team/PET Work Processes/Clinic Duty

Current Clinic Duty Rotation

 * Core Platform Team

2020-04-15 to 2020-05-06:


 * Holger Knust
 * Petr Pchelko
 * Hugh Nowlan
 * Daniel Kinzler
 * Visiting Engineer: Ariel

2020-03-30 to 2020-04-15:


 * Brad Jorsch
 * Holger Knust
 * Petr Pchelko
 * Hugh Nowlan
 * Daniel Kinzler
 * Visiting Engineer: Mukunda

As of 2020-02-28:
 * Brad Jorsch
 * Holger Knust
 * Petr Pchelko
 * Hugh Nowlan

Purpose
The goal of the team is to provide space to work on important maintenance and tech debt tasks that are not covered by existing projects, to ensure that inbound UBN and reactive tasks are handled in a timely manner and to provide urgent support needed by other teams.

Responsibilities
The Clinic Duty Team is tasked primarily with handing inbound reactive work, including


 * Handling of Unbreak Now (UBN) tasks within the team's scope.
 * Triaging incoming issues on the team workboard.
 * External and internal requests for code review.
 * Fixing regressions and simple, high-priority bugs.
 * Progressing ongoing maintenance work.

Team Formation
The Clinic Duty team works on a continuous basis, with members transitioning in and out as needed. This is coordinated by an Engineering Manager, with 2 to 3 team members working on Clinic Duty at any time.

Daily check ins are conducted asynchronously, and a meeting is held once weekly for check in and task triage.

Members should know at least a week in advance if they are going to rotate onto Clinic Duty, particularly for planning around holidays and vacations.

Processes
Engineers should daily perform, and should devote much of their remaining time to and.

Initial task triage
Team members are responsible for initial triage of all tasks in the Inbox columns of the Core Platform Team workboard and the Clinic Duty Team workboard. This is done daily, if not more frequently.

While it's impossible to account for every possibility, the outcomes of this triage may include:


 * Moving requests to.
 * Moving the task to the Triage Meeting Inbox column of the Core Platform Team workboard, for discussion during the weekly triage meeting.
 * Moving the task to other columns of the Core Platform Team workboard, if they clearly belong there.
 * Moving the task to other CPT workboards, e.g. Green Team or one of the Initiative workboards, if they clearly belong there.
 * Claiming the task for.
 * Avoid cookie-licking! Team members should avoid claiming more work for Clinic Duty than they can complete in a reasonable period of time.
 * Untagging CPT, if the task is mistagged. This may also be done for old tasks with no real activity that were tagged automatically by Herald due to an unrelated action (e.g. someone subscribing or unsubscribing).
 * Asking another team member (on Clinic Duty or off) for help in triaging.

Team task triage
The team will have a weekly meeting to process tasks in the Triage Meeting Inbox column of the Core Platform Team workboard. This meeting should be attended by engineers on Clinic Duty, at least one Engineering Manager, and at least one Product Manager.

The purpose of this meeting is to triage tasks where questions exist around scope, resourcing, and priority. Again, it's impossible to account for every possibility, but outcomes of this triage may include:


 * Assignment of the task (to an engineer, PM, or EM) for investigation.
 * Assignment of the task to an engineer for implementation within Clinic Duty.


 * Scheduling of the task for one of the team planning meetings.
 * Moving the task to Feature Requests to Review or Future Initiatives.
 * Moving the task to other columns, including Volunteer Needed, Tracking/Watching, or Icebox.
 * Untagging CPT from the task.

If, after the meeting, the Triage Meeting Inbox is not empty, another meeting should be scheduled later in the week to finish the triage.

External review
Tasks with patches from volunteers or other teams are tracked on the CPT External Code Reviews workboard.

Patches needing review progress through the board as follows:


 * Tasks start in Review Needed.
 * Once a review is given, they move depending on the review:
 * If merged or +1 with no further CPT review expected, move to Review Completed.
 * If -1 or otherwise needing more work, move to In Progress.
 * If you want additional review from another team member, you should ask them (directly or on the task or patch).
 * A task in In Progress can move back to Review Needed once the -1 has been addressed, via comments or new patchsets.

Tasks needing input or advice (but not implementation) from Core Platform are also tracked on the Clinic Duty worboard in the Discussing column. They are removed from the board once the needed input or advice has been provided.

Internal work
Work internal to the Clinic Duty team is tracked on the Clinic Duty worboard.

Tasks are added to this board when a team member claims or is assigned the task. All tasks on the board should be assigned to a team member (in the Phabricator sense).


 * The next task a team member intends to pick up may be placed in Ready.
 * Any tasks a team member is actively working on should be in Doing.
 * Tasks where further progress depends on external factors (other than waiting for a deployment) should be in Blocked Externally.
 * Tasks needing review by other team members should be in Waiting for Review.
 * If the review results in a -1, the task should be moved back to Doing.
 * If the review results in a +2, the task should be moved to the appropriate later column.
 * Tasks waiting for deployment, either via the train or manual, should be in Waiting for Deployment.
 * Tasks where all Clinic Duty work is done should be moved to Done, or if appropriate may be untagged.

Internal review
Team members on Clinic Duty are expected to look to the Waiting for Review column on that workboard to provide each other with the reviews that are needed for progress.

Other

 * The Engineering Manager is responsible for updating on this page as assignments change. Or else we should figure out some other way to handle it.
 * Should we give more guideance in this section for EMs/PMs pushing work on choosing between CD versus other things not to be named yet? Or is  enough?
 * Another source of reactive work is looking for production errors (see Performance/Runbook/Kibana_monitoring)

Metrics
In order to measure the impact of the CD team work we want to try to get a snapshot of current state of

Potential Metrics

 * Number of backlog tasks
 * Number of unsized tasks
 * Number of unprioritised tasks
 * Number of tasks created vs resolved in last X days
 * Number of "untouched" tasks
 * Average age of tasks
 * Average response time on UBN/Reactive tasks
 * Average response time on CR requested from outside of CPT