Phlogiston/Running

From mediawiki.org

The Servers in the Cloud[edit]

Server ssh location URL
Development phlogiston-2.eqiad.wmflabs http://phlogiston-dev.wmflabs.org
Production phlogiston-3.eqiad.wmflabs http://phlogiston.wmflabs.org

In normal operation, Phlogiston should run automatically every day on both the production and development servers. A cron job (of the phlogiston user) runs right after the Phabricator dump is usually made available. The normal sequence of operation is:

  1. Download new dump
  2. Load the dump into the database
  3. For each specified scope:
    1. Reconstruct data
      1. Normally, this runs incrementally compared to the last date processed, so it only runs on one day of data.
    2. Regenerate the report completely.

Access[edit]

  • You must already have a wikitech VPS shell account (formerly known as labs, not Tool Labs). See Getting Started.
  • Your account must be set up to access Phlogiston.
    1. Admin note: user must be in group project-phlogiston; not sure if this is set at server or at labs level.
    2. Your phlogiston shell account on phlogiston-2 must be in the project-phlogiston group.
    3. The OpenStack page for the project is: https://tools.wmflabs.org/openstack-browser/project/phlogiston
    4. A typical ssh string following this convention is ssh phlogiston-2.eqiad.wmflabs

Adding a new report[edit]

  1. Create the configuration files.
    1. Configuration files are stored in the github folder https://github.com/wikimedia/phlogiston.
    2. Configuration files for a given your_scope_prefix include
      1. your_scope_prefix_recategorization.csv
      2. your_scope_prefix_scope.py
  2. Test the new configuration on the dev site.
    1. In the shared mission_control virtual screen console, run bash ~/phlogiston/batch_phlog.bash -m rerecon -s your_scope_prefix
  3. If the new report looks good, add it to the cron job on test and production
  4. Add links to the new report in the appropriate files in https://github.com/wikimedia/phlogiston/html and copy those files to the live html folders on the servers.

Manual Control[edit]

The current practice is for only the Phlogiston developer to access on production, and for the Phlogiston developer to be the primary user of the development server. One use case for shared development is supported: users of reports may reconfigure their reports and then re-run the reports on the development server to see results immediately instead of the next day.

Phlogiston has no run control or locking; multiple Phlogiston reports run at the same time will have bad results. We therefore have a manual convention on the development server that all Phlogiston runs should happen in a shared tmux console session, by convention called mission_control, to prevent two conflicting runs from happening at once. This convention is not followed on production, which should not have multiple users running phlogiston.

To re-run a report on the development server[edit]

  1. Change the configuration files to make the desired changes, and commit to github.
  2. Log in to phlogiston-2.
  3. Change to be the phlogiston user: sudo su - phlogiston.
    1. Your phlogiston shell account on phlogiston-2 must be in the project-phlogiston group.
  4. Join the shared console: tmux a -t mission_control.
    1. If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with tmux new -s mission_control.
  5. Re-run Phlogiston.
    1. cd ̃/phlogiston
    2. bash batch_phlog.bash -m reports -l false -s your_scope_prefix
      1. replace your_scope_prefix with the code for your scope, for example, and for Android, or ve for VisualEditor. This is determined when the files for this scope report are originally created.
    3. This will automatically update files from git, and then rerun the report. It will not reprocess any of the data.

To create a new scope[edit]

  1. Create new configuration files and add them to github.
  2. Log in to phlogiston-2.
  3. Change to be the phlogiston user: sudo su - phlogiston.
  4. Join the shared console: tmux a -t mission_control.
    1. If this fails with a message about "no sessions", then there is not already a mission_control session. Create it with tmux new -s mission_control.
  5. In the ~/phlogiston directory, get the new files from github:
    1. git pull
  6. Build the new scope data reconstruction and report.
    1. ./batch_phlog.bash -m rerecon -l false -s your_scope_prefix
    2. Rerecon will generate or regenerate the scope reconstruction completely, and generate the report, but will not download and load fresh dump data.
  7. After the report is complete, verify it through a browser. If it looks good, add the scope to the phlogiston crontab command on both develompent and production. It may also be helpful to add a link to the report to the file html/index.html and to deploy that file to development and production.

Debugging[edit]

To test whether the dump is fresh:

$ openssl s_client -connect dumps.wikimedia.org:443

[...]

HEAD /other/misc/phabricator_public.dump HTTP/1.1

host: dumps.wikimedia.org

followed by two carriage returns.

Automation[edit]

Phlogiston is run on both servers automatically every day with the crontab entry

# m h  dom mon dow   command

15 4    *   *   *    bash ~/phlogiston/batch_phlog.bash -m incremental -l true -s ana -s and -s col -s cot -s discir -s dismap -s dis -s diswik -s fr -s ja -s ios -s lan -s phl -s red -s rel -s tpg -s ve >>~/phlog.log 2>&1

Data integrity and idempotency[edit]

The data dump includes all historical data from Phabricator, so only the most current dump is required for operation. Each data dump load will provide Phlogiston with complete information, and the data dump does not need to be reloaded until a new dump is available. Loading the dump is independent of any specific scopes.

Reconstruction and reporting are partitioned by scope. Changes to one scope will not affect any other.

An incremental reconstruction will operate from the most recent date available in the already-processed data, so if it is run a second time on the same day, it will not corrupt data. A complete reconstruction will begin by wiping all data for that scope.

A report will wipe the existing report on the website prior to generating a new report, so it is possible to end up with a broken report if the new report fails.