Phlogiston/Running

Phlogiston has a stable labs server for WMF use, here.

In normal operation, Phlogiston should run automatically every day on both the production (phlogiston-1) and development (phlogiston-2) servers. It runs right after the Phabricator dump is made available. The normal sequence of operation is:
 * 1) Download new dump
 * 2) Load the dump into the database
 * 3) For each specified scope:
 * 4) Reconstruct data
 * 5) Normally, this runs incrementally compared to the last date processed, so it only runs on one day of data.
 * 6) Regenerate the report completely.

Manual Control
The current practice is for only the Phlogiston developer to work on production, and for the Phlogiston developer to be the primary user of the development server. One use case for shared development is supported: users of reports may reconfigure their reports and then re-run the reports on the development server to see results immediately instead of the next day.

Phlogiston has no run control or locking; multiple Phlogiston reports run at the same time will have bad results. We therefore have a manual convention on the development server that all Phlogiston runs should happen in a shared console session, to prevent two runs from happening at once.

To re-run a report on the development server
This convention is not followed on phlogiston-1, which should not have multiple users running reports.
 * 1) Change the configuration files to make the desired changes, and commit to github.
 * 2) Log in to phlogiston-2.
 * 3) You must already have a wikitech labs (not Tool Labs) shell account.  See Getting Started.
 * 4) Your account must be set up to access Phlogiston.
 * 5) Change to be the phlogiston user:.
 * 6) Your phlogiston shell account on phlogiston-2 must be in the   group.
 * 7) Join the shared console:.
 * 8) If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with.
 * 9) Re-run Phlogiston.
 * 10) This will automatically update files from git, and then rerun the report.  It will not reprocess any of the data.
 * 1) This will automatically update files from git, and then rerun the report.  It will not reprocess any of the data.

Automation
Phlogiston is run on both servers automatically every day with the crontab entry

Data integrity and idempotency
The data dump includes all historical data from Phabricator, so only the most current dump is required for operation. Each data dump load will provide Phlogiston with complete information, and the data dump does not need to be reloaded until a new dump is available. Loading the dump is independent of any specific scopes.

Reconstruction and reporting are partitioned by scope. Changes to one scope will not affect any other.

An incremental reconstruction will operate from the most recent date available in the already-processed data, so if it is run a second time on the same day, it will not corrupt data. A complete reconstruction will begin by wiping all data for that scope.

A report will wipe the existing report on the website prior to generating a new report, so it is possible to end up with a broken report if the new report fails.