Phlogiston/Data Loading Model

From MediaWiki.org
Jump to: navigation, search

Data Loading[edit]

  1. Download the latest http://dumps.wikimedia.org/other/misc/phabricator_public.dump (updated around 0400 UTC).
  2. Discard previously loaded data and load this file into the database.

Data Reconstruction[edit]

Once for each scope:

[THIS SECTION IS OUT OF DATE AND WRONG]

  1. For each day since the start date specified in the <prefix>_scope.py file, for each of the Phabricator projects listed in the source file, find all tasks belonging to that project on that day and add each task∙day to the list of task∙days for this team.
  2. For each day, find each task in the list that is tagged "Category" and construct its complete descendant tree.
    1. The tree includes only tasks present in the list of task∙days, so if a child is not present in the data (because it doesn't belong to any projects in the source project list), but the grandchild is, the grandchild will not be included in the tree.
    2. A child is any task that blocks the parent in Phabricator.
    3. If a Milestone task is not in the source project list, it will not be included in this process.
  3. Using the file XXX_make_history.sql if present, or generic_make_history.sql if not, which does:
    1. For each task∙day, build a raw category text string from:
      1. The name of the Phabricator project the task belongs to, counting only the first project in the list in the source file.
        1. So, a task that belongs to several included projects will only be counted once.
      2. The name of the relevant projectcolumn in that project.
      3. The title of all ancestor "Category" tasks.
    2. Winnow the possible status of all tasks to only open or resolved.
  4. Go through the <prefix>_recategorization.csv file in sort_order order and, for each line,
    1. Look for the matchstring anywhere in the raw category text string
    2. If a match is found, set the category of the task to the pretty category title.
    3. If this file is not present, use the raw list of categories as the final list of categories, using alphabetical order for priority.

Data Reporting[edit]

Copy task_on_date to reporting

  1. a