Phlogiston/Data Loading Model

Data Loading

 * 1) Download the latest http://dumps.wikimedia.org/other/misc/phabricator_public.dump (updated around 0400 UTC).
 * 2) Discard all previously loaded Phabricator data (the "Load" tables)
 * 3) Load the dump file into the database.
 * 4) All project, column, task, and unparsed transaction information is loaded.
 * 5) As an optimization, all edge transactions are parsed from the transaction log and a list of edge transactions is generated.
 * 6) Everything keyed by a PHID is re-keyed to ID.

Data Reconstruction
Once for each scope:


 * 1) Generate a list of project IDs relevant to the scope.
 * 2) All projects listed in the recategorization file are relevant
 * 3) The Status Report project in  is relevant.
 * 4) The hard-coded IDs for certain keyword tags, e.g., 'category', are relevant.
 * 5) Determine the range of dates to be processed:
 * 6) If an incremental run, start the day after the last day in the data.
 * 7) If a complete run,
 * 8) wipe reconstruction tables of any data for this scope
 * 9) Set the start date to the date in.
 * 10) For each day since the start date,
 * 11) For each task in the complete list of tasks in Phabricator,
 * 12) Get the list of edges from the most recent edge transaction (not later than the working day) associated with the task. For each project in the list of edges,
 * 13) If the project is also a project relevant to this scope
 * 14) Make a record in maniphest_edge for this combination of date, task, and project.
 * 15) Example: In edge transaction data, there is a single record, "on 2018-04-01, Project 300 was added to Task 142.". After reconstruction, there is a one record linking Project 300 and Task 142 for each day from 2018-04-01 to today.
 * 16) For each day since the start date,
 * 17) For each task associated with any of the relevant project IDs,
 * 18) Reconstruct the state of the task for that day.
 * 19) For each of the Phabricator projects listed in the source file, find all tasks belonging to that project on that day and add each task∙day to the list of task∙days for this team.
 * 20) For each day, find each task in the list that is tagged "Category" and construct its complete descendant tree.
 * 21) The tree includes only tasks present in the list of task∙days, so if a child is not present in the data (because it doesn't belong to any projects in the source project list), but the grandchild is, the grandchild will not be included in the tree.
 * 22) A child is any task that blocks the parent in Phabricator.
 * 23) If a Milestone task is not in the source project list, it will not be included in this process.
 * 24) Using the file   if present, or  if not, which does:
 * 25) For each task∙day, build a raw category text string from:
 * 26) The name of the Phabricator project the task belongs to, counting only the first project in the list in the source file.
 * 27) So, a task that belongs to several included projects will only be counted once.
 * 28) The name of the relevant projectcolumn in that project.
 * 29) The title of all ancestor "Category" tasks.
 * 30) Winnow the possible status of all tasks to only open or resolved.
 * 31) Go through the   file in   order and, for each line,
 * 32) Look for the matchstring anywhere in the raw category text string
 * 33) If a match is found, set the category of the task to the pretty category title.
 * 34) If this file is not present, use the raw list of categories as the final list of categories, using alphabetical order for priority.

Data Reporting
Copy task_on_date to reporting


 * a