Phlogiston/Data Model

From mediawiki.org

Vocabulary[edit]

Ancestor: A task that is tagged in Phabricator with one of the ancestor-qualifying tags.

Category: A grouping of tasks within Phlogiston.

Phlogiston Scope: A set of tasks that are analyzed as a group, under the assumption that they are the same body of tasks that one team of people work on. A Phlogiston scope usually contains all tasks from multiple Phabricator projects. This word was chosen, instead of project, to avoid confusion with Phabricator projects.

Phabricator Project: A tag for grouping and labeling tasks in Phabricator.

Source: synonym to Phlogiston Project.

Status: The Phabricator status field.

Data Model[edit]

All tables, and how they are logically grouped[edit]


Load[edit]

These tables hold the Phabricator data imported from a dump file. These tables are wiped and replaced with each load. These tables are used for reconstruction. They are [probably?] not referenced after reconstruction is complete. They do not reflect the concept of a "scope".

Table Definition
maniphest_blocked Each row is one blocking relationship between a task and its immediate blocker, identified by task IDs.
maniphest_blocked_phid Each row is one blocking relationship between a task and its immediate blocker, identified by task PHIDs.
maniphest_edge_transaction Each row is one transaction affecting a relationship between a task and a project, identified by task IDs.
maniphest_task Each row is one task in Phabricator.
maniphest_transaction Each row is one transaction in Phabricator, containing raw json data about transactions.
phabricator_column Each row is one column in Phabricator (any column, from any board).
phabricator_project Each row is one project in Phabricator.

Reconstruct[edit]

These tables hold a historical reconstruction of Phabricator data back to the beginning of available data; they represent a denormalization of the transaction data into an easier-to-query data set. They are partitioned by scope, so that a wipe of one scope does not affect any other scope. These tables are very time-consuming to create, and so are typically updated incrementally with each nightly dump.

Table Definition
category Each row is one category
maniphest_edge Each row is the membership of one task in one project. This table is not partitioned by scope (because this data is invariant across scopes, and for optimization reasons).
phab_parent_category_edge Each row is the inclusion of one task in one ancestry.
task_on_date Each row is one task for one day within one scope. This is the reconstruction of what the state of that task was on that day. The range of dates included depends on the configuration of the scope that include the task. A task in two scopes will have two rows in this table per day.

Report[edit]

These tables hold data necessary to generate reports. They are partitioned by scope, and wiped by partition at the beginning of each report.

Table Definition
maintenance_delta Each row is the change in aggregate information over one seven-day period (identified by the final day), scope, and maintenance type, with aggregate information about all relevant tasks.
maintenance_week One row per (one seven-day period (identified by the final day), scope, and maintenance type), with aggregate information about all relevant tasks.
recently_closed One row per (time-period, category, date, and scope), with aggregate count and points over that time period. Time Period may be a week, month, or quarter.
recently_closed_task Each row is one recently closed task within one scope.
task_on_date_agg One row per (status, category, maint_type, date, and scope), with aggregate count and points for that combination.
task_on_date_recategorized Each row is one task for one day within one scope, copied from task_on_date at the beginning of each scope report, with different task metadata fields.
velocity One row per seven-day period for one category in one scope, with many fields for historical and forecast velocity data.

The scope partition of category is also wiped and reloaded with each report, to enable users to change the report settings and re-run the report without having to reconstruct again.