Phlogiston

Phlogiston is a reporting tool for Phabricator. How to install: see https://github.com/wikimedia/phlogiston/blob/master/README.md.

Forecasting Model
This model does not attempt to predict changes in size of backlog.
 * 1) Calculate velocity per category
 * 2) Count number of resolved tasks each week, and compare week to week
 * 3) Today = base day, so last week = today - 7 days
 * 4) Make optimistic, nominal, and pessimistic forecasts of velocity
 * 5) Optimistic is the highest 3 weeks in the last 3 months
 * 6) Nominal is the average per week over the last 3 months
 * 7) Pessimistic is the worst 3
 * 8) For teams that resolve tasks every 2 weeks, and thus have a bunch of 0 weeks, this will produce a falsely low forecast
 * 9) If any forecast velocity is below 1 (point or count per week), increase to 1
 * 10) Divide the total points remaining (per category) by each of the three velocities to determine # of weeks remaining

Data Loading Model

 * 1) For each day since the start date specified in the   file, for each of the Phabricator projects listed in the source file, find all tasks belonging to that project on that day and add each task∙day to the list of task∙days for this team.
 * 2) For each day, find each task in the list that is tagged "Milestone" and construct its complete descendent tree.
 * 3) The tree includes only tasks present in the list of task∙days, so if a child is not present in the data (because it doesn't belong to any projects in the source project list), but the grandchild is, the grandchild will not be included in the tree.
 * 4) A child is any task that blocks the parent in Phabricator.
 * 5) If a Milestone task is not in the source project list, it will not be included in this process.
 * 6) For each task∙day, build a raw category text string from:
 * 7) The name of the Phabricator project the task belongs to, counting only the first project in the list in the source file.
 * 8) So, a task that belongs to several included projects will only be counted once.
 * 9) The name of the relevant projectcolumn in that project.
 * 10) The title of all ancestor Milestone tasks.
 * 11) Go through the   file in   order and, for each line,
 * 12) Look for the matchstring anywhere in the raw category text string
 * 13) If a match is found, set the category of the task to the pretty category title.

Monitor backlog per goal
When we look at our backlog in aggregate, we can only see the overall growth of our planned work: If we divide our backlog by team goals, we can differentiate planned (i.e., quarterly goal) work from un-planned, which means we can measure how much more work until we reach a goal.

We can also measure the relative proportion of our work by goal and see scope creep per goal.

Benefit
Improved chances of completing our quarterly goals. Easier to say no, and to see when we aren't saying no enough.

Prerequisites

 * Have defined end-points
 * Divide work by category

Monitor Maintenance Fraction
We can measure the amount of our work that is not part of our quarterly goals.

Benefit
Able to balance work within the team. See how our work matches our goals.

Forecast Velocity
The Velocity forecast shows actual data as bars and forecast data as lines, based on one-week (Sunday to Sunday) snapshots. The lines represent a plausible range of values. The pessimistic forecast is the lowest three weeks of the previous three months; the optimistic forecast is the highest three; and the nominal forecast is the average of all weeks in the previous three months. (For teams using two-week Sprints or other processes in which most tasks are marked Resolved at a different cadence than weekly, this should produce weird-looking bars but accurate lines. Probably.)

In the first example, the team has high variability in weekly output, so the range of forecasts for the following week is very broad. In the second example, the range is still very broad but a trend is emerging. The degree to which the bars remain within the boundaries of the forecasts provide some measure of the reliability of the forecasting.

Note that Phlogiston currently calculates this every week (Sunday to Sunday). For teams that close tasks at a bi-weekly Review meeting, this misleadingly causes the pessimistic forecast to remain close to zero. Example:

Forecast completion dates and track forecasts over time.
We can forecast when we are likely to complete a given piece of work. Or, more realistically, we can identify work that is slipping indefinitely.

Velocity forecasting. Phlogiston now does simple forecasting of best, worst, and nominal velocity (best 3 weeks in last 3 months, worst 3, and average for whole 3 months) http://phlogiston.wmflabs.org/ve_tranche1_velocity_points.png

Completion forecasting. Based on velocity forecasts, this shows not only the current forecast, but a history of forecasts by week, which can give a lot of information about the reliability of the forecast (i.e, a forecast of "2 more weeks" that remains "2 more weeks" for 2 months is not a reliable forecast, whereas a forecast of "8 weeks" that becomes "4 weeks" the following months and "1 week" the month after is probably more accurate.)

Notes on how to get higher-quality forecasts:
 * do progressive chunking: put in large epics immediatly and break them down over time
 * re-calibrate by looking at backlog growth in past periods to better pre-set backlog size in new periods.
 * smaller tasks, closing more frequently (more than 1 task per dev per week? what heuristic?)
 * Do a short period of time tracking to let estimators recalibrate themselves.

Benefit
More likely to complete defined work. Limit goal-setting and other commitments based on evidence.

Identify Task data quality issues
Regularly review reports that highlight potentially incorrect or problematic data.

Work actually completed
http://phlogiston.wmflabs.org/ve_done_count.png (TODO: replace with stable image)

Work of unknown Maintenance type
Forthcoming

Benefit
Improve the quality of tracking data.

Identify discrepencies between intentions and beliefs and reality.

Spot missed, dropped, forgotten, and otherwise unintended outcomes for tasks.

Cycle-Time Reports
These are reports that show how long work is spending in different stages of progress, such as "in testing" or "in deployment". Phabricator's built-in status field has a very limited range of status, so a full cycle-time report depends on a sequence of statuses typically built with Phabricator's projectcolumn field. These reports are not currently supported in Phabricator but have been prototyped and could be added on demand.

Benefit
Identify bottlenecks.

Measure the levels of Work in Progress to compare to optimal levels. (too much WIP = wasting time on context switching; too little WIP = running dry).

Configuring Phlogiston
Phlogiston has a stable labs server for WMF use, here. This downloads data from Phabricator every night, reconstructs all data, and generates reports.

Each team should use a separate Phlogistor Project (not related to a Phabricator Project) to configure their reports. Each project must have a control file, in the main phlogison directory, named using the pattern:. The  should be a short (2-3 character) unique identifier for this project.

The control file must contain these variables:

Source_prefix is used internally, but should generally match the. The source_title is used to title reports. Default_points are assigned to all unpointed stories. The project_list is a list of all Phabricator projects that the reconstruction will include; names must exactly match, there should not be spaces around the commas or within names, and tasks that are in multiple projects will count only in the first project in the order listed here. Reconstruction and report generation will begin with the specified start_date.

Defining projects in Phlogiston
With the default configuration, Phlogiston will create one category for each combination of Project and Projectcolumn. This list of categories can be manually retitled, re-ordered, and consolidated with the optional file, which can be edited in a spreadsheet. See rel_recategorization.csv for an example. In this file, zoom_list determines whether the category will be included in the master burnup and per-category charts (all categories are included in the remaining charts), title is what will be displayed in the report, and matchstring is a SQL-formatted wildcard that will be matched against Project + Projectcolumn. PhlogOther is a magic word that will match everything else.

Custom data processing
If your Phabricator projects do not follow these conventions, or have data by multiple configurations over time which you wish to preserve, you can write custom SQL code to replace the default conversion of reconstructed data into Phlogiston reporting data. A file called, if present, will be run instead of generic_make_history.sql. See also ve_make_history.sql for an example.

Adding new projects
Contact the Phlogiston administrator Joel A to get your project added to the batch script so that it runs every night.

To set up Phlogiston independently, see README.md.

Track Quarterly Goals
Which is basically a project, but using Category here to differentiate from Phabricator Project.

Option 3: Each quarterly goal is a Milestone task
.  Either change Team-Practices so that each column is a category, or add a third Team-Practices board, e.g., Team-Practices-Projects, with one column per category. Kevin has also suggested using a separate Phabricator Project for each category, which make is easier to bulk-tag tasks into Projects but harder to move tasks between categories.

Categories comprise Quarterly Goals, plus we would have some number of well-defined projects (like SPDPP, or THC) that fall under Permanent Goals, plus we would have a grab-bag project to track everything else. Total # of projects should be under 10 for the quarter.

Cost
Up to 30 minutes added to the weekly triage meeting for ~½ of TPG. Should not add time to working with individual tasks. 1 hour initial setup and debate over categories.


 * Open Question: How would we handle Epics, i.e., smaller than projects but bigger than tasks.  Proposed: Adopt an explode-or-shave rule, Epics are just big, high-point tasks in the backlog.  Everything in the Epics column of Team-Practices would become a Project/projectcolumn.


 * Open Question: how would we handle intermediate milestones, within each project?  Proposed: Don't.  Each project should be something that ends in less than 3 months, and if we absolutely need intermediate endpoints, break those out into more projects.

Option 1: Use explicit WorkType tags for each task
This can be done by tagging all stories either #WorkType-Maintenance or #WorkType-NewFunctionality.

This requires all tasks to be tagged one or the other.

In theory a team could use only one tag and let all other stories default to the other tag; this would require custom SQL to apply the default, and would limit the reliability of this data by removing the ability to differentiate between inadvertantly untagged stories and intentionally untagged stories.

Option 2: Designate existing categories as Maintenance vs New Work
VE currently does not use the tags but instead designates one category as Maintenance; the custom SQL (lines 108 to 116 in ) converts this information into the same form within Phlogiston as the tags.

Option 3: Default to one type, and override when the other tag is present
Community Tech is experimenting with defaulting all work to New Work (via custom SQL), and overriding this to Maintenance for any task explicitly tagged as #WorkType-Maintenance. However, this may introduce a bias to undercount Maintenance.

Estimate Stories by Points
Improved accuracy for metrics and forecasts.

Cost
To do it properly, probably some planning poker or bulk estimation to get the team calibrated. 1-2 hours of everybody's time to get everything pointed, and then an extra 10 minutes per week to estimate going forward.

Links

 * Source Code
 * Live instance on wmflabs