Phlogiston/Data Model

From MediaWiki.org
Jump to: navigation, search

Updated 27 Sep 2017. Incomplete but probably not wrong or out of date.

Vocabulary[edit]

Ancestor: A task that is tagged in Phabricator with one of the ancestor-qualifying tags. This is a performance optimization, to avoid having to calculate parent-child for all tasks.

Category: A grouping of tasks within Phlogiston.

Phlogiston Scope: A set of tasks that are analyzed as a group, under the assumption that they are the same body of tasks that one team of people work on. A Phlogiston scope usually contains all tasks from multiple Phabricator projects. This word was chosen, instead of project, to avoid confusion with Phabricator projects.

Phabricator Project: A tag for grouping and labeling tasks in Phabricator.

Source: synonym to Phlogiston Project.

Status: The Phabricator status field.

Data Model during Phlogiston execution[edit]

The following tables are updated in this order:

Load[edit]

These tables hold the Phabricator data imported from a dump file. These tables are wiped and replaced with each load. These tables are used for reconstruction. They are [maybe?] not referenced after reconstruction is complete. They do not reflect the concept of a "scope".

  1. phabricator_project
  2. phabricator_column
  3. maniphest_task
  4. maniphest_blocked_phid
  5. maniphest_transaction
  6. maniphest_blocked

Reconstruct[edit]

These tables hold a historical reconstruction of Phabricator data back to the beginning of available data; they represent a denormalization of the transaction data into an easier-to-query data set. They are partitioned by scope, so that a wipe of one scope does not affect any other scope. These tables are very time-consuming to create, and so are typically updated incrementally with each nightly dump.

  1. task_on_date. Each row is one task for one day.
  2. category. Each row is one category.
  3. maniphest_edge. Each row is the membership of one task in one project. This table is not partitioned by scope (for optimization reasons).
  4. phab_parent_category_edge. Each row is the inclusion of one task in one ancestry.

Report[edit]

These tables hold data necessary to generate reports. They are partitioned by scope, and wiped by partition at the beginning of each report.

  1. task_on_date_agg
  2. task_on_date_recategorized
  3. recently_closed
  4. recently_closed_task
  5. maintenance_week
  6. maintenance_delta
  7. velocity
  8. open_backlog_size