Team Practices Group/Improving burndown charts

Introduction
This page is a work in progress, reflecting current thinking. It is not authoritative in any way.

Terminology

 * Burndown Chart
 * https://en.wikipedia.org/wiki/Burn_down_chart
 * Burnup Chart
 * http://brodzinski.com/2012/10/burn-up-better-burn-down.html
 * Phabricator
 * The issue-tracking system used by the WMF
 * Phabricator Sprint Extension
 * Adds "Sprint" type projects, which add a "Story Points" field to any task in that project
 * Written by the WMF (?)
 * Phragile
 * A tool that generates graphs of data pulled from phabricator's API
 * Release
 * A "marketing" feature-driven release, OR a timeboxed release
 * For feature-driven, the question we want to answer is "When?"
 * For timeboxed, the question is "How much can we get done?"
 * Release Burnup Chart (or Product Burnup Chart)
 * A burnup chart whose scope spans multiple sprints
 * Release Cumulative Flow Chart
 * http://www.agilesherpa.org/agile_coach/metrics/cumulative_flow/
 * Sprint Burnup Chart
 * A burnup chart whose scope is limited to a single sprint

Assumptions

 * Burnup charts will be accepted by management (in lieu of Burndown charts)
 * Product/Release charts are the focus of this initiative (as opposed to Sprint charts)
 * Forward-looking prediction is the focus of this initiative (as opposed to retrospective)
 * An external chart generation tool is acceptable (it doesn't have to be built into phab)
 * At least for now, it would be acceptable for phab to export raw data, to allow fancy charts to be generated by a spreadsheet
 * We have some (unknown $) budget to spend on upstream Phabricator coding, as needed
 * We have the ability to configure our phab instance however we wish
 * We have limited WMF human hours available for phab coding
 * Any phab coding we do should be in the form of patches pushed upstream
 * Is backward-looking important, given that projects wouldn't have used the necessary conventions???

Current State

 * Phabricator's built-in burndown charts:
 * Work OK within a single Sprint, but require a ton of work to span multiple sprints
 * Trigger on a task being in a "Done" column, which not all teams use
 * But note that "resolved" is also problematic
 * Generates Burndown charts, not burnup charts
 * Does not show the scope line moving up or down as scope changes--it always just shows as a flat line, at the *current* scope level
 * Handles weekends in ways that some people dislike, but that's really a Sprint chart issue more than a Release chart issue
 * Phragile:
 * Is being developed by wmde
 * Provides ??? charts (I haven't seen examples yet)
 * Is an external tool, which would be acceptable
 * Would need to interface with phab authentication/authorization system?
 * To be able to handle security issues, yes, but for normal projects, is this true?
 * Currently provides BOTH graph generation AND sprint creation
 * Should these be separated?
 * Robla's "wbstatus" script:
 * Is written in python
 * Scrapes phab html to generate a state model of issues on a workboard
 * Could be used as a template to create a similar tool for burnup purposes
 * Example output: https://mw-core-wbstatus.wmflabs.org/?r=2015-03-16_to_2015-03-20
 * Source code: https://github.com/robla/phab-wbstatus

Issues to Remember

 * Backlog stories may not have estimates
 * Assign arbitrary value (e.g. average story size)?
 * "Release" might mean fixed-date/train, or feature-driven
 * Not all teams will have "sprints"--some are Kanban
 * Need to avoid double-counting when task and subtasks were both estimated
 * Could be an issue both for "work completed" and for "target scope" calculations

Data Model

 * Helpful phab queries, if they are/were possible:
 * List all tasks that were in Column C of Project P on Date D (NO?)
 * List all projects matching wildcard name search (YES?)
 * List all tasks which were EVER in Project P (NO?)
 * List all transactions for Task T, with timestamps (YES except not sure about timestamps)
 * List all transactions for Project P, with timestamps (NO?)
 * If we did a nightly snapshot of all tasks in Project P,Q,R..., then where would we store that data
 * Can a Release project also be a Sprint project? (NO?)
 * (would be nice to have estimated future epics only be in the Release)

Desired Datasets/Charts

 * Points "done" per sprint over time (velocity)
 * Note: Should scale by sprint length, to accommodate Kanban teams
 * Note: Is this based on a "done" column, or marked "resolved"?
 * Current points of all tasks in all columns of all sprints PLUS non-sprint backlog
 * Historical points of all tasks in all columns of all sprints PLUS non-sprint backlog
 * Average task estimate across multiple sprints (or maybe just within each sprint?)
 * To provide a placeholder size for unestimated tasks
 * Average Lead Time per story (from entering a sprint backlog until being "done")
 * Especially helpful for Kanban projects

Possible Solution(s)

 * Completed points are calculated from a set of wildcard-filtered Sprint projects
 * CONVENTION: Tasks are considered completed if they are in a workboard column named "Done"
 * Graph-requesting user could specify the column name when they give the wildcard spec
 * Brute-force algorithm:
 * For each matching Sprint project,
 * Sum story points of all tasks in the "Done" column
 * Note the timestamp at which that Sprint ended (assuming it is in the past)
 * Scope of release is calculated from a Release project
 * FEATURE REQUEST: New Phab API call: List all transactions that refer to a specific project
 * CONVENTION: The Release project must contain *every* task that is part of that release
 * Any task with subtasks must exclude any subtask estimates from its own current estimate
 * Brute-force algorithm:
 * Given a Release project, find all transactions affecting it
 * Create event history of tasks being added/removed on that project
 * Replay the history to know which tasks were in that project at any timestamp
 * Record the story point sum at desired timestamps