Phlogiston/Installation

From mediawiki.org

Install Prerequisites[edit]

Operating System[edit]

These instructions assume installation of Phlogiston on a Debian GNU/Linux Stretch (9.5) system.

In Labs, to enable the bigger hard drive, go to https://horizon.wikimedia.org/project/puppet/ and activate the puppet "profile::labs::lvm::srv*

Move Postgres's working directories to the new folder.

System-wide Installation[edit]

Nginx[edit]

sudo apt install nginx

Postgresql[edit]

sudo apt install postgresql postgresql-contrib

Python Modules[edit]

sudo apt install python3-venv

R[edit]

Install R repository to get the newest version (from DigitalOcean instructions)

sudo apt install software-properties-common

sudo apt-key adv --keyserver keys.gnupg.net --recv-key 'E19F5F87128899B192B1A2C2AD5F960A256A04AF'

sudo add-apt-repository 'deb https://cloud.r-project.org/bin/linux/debian stretch-cran35/'

sudo apt update

sudo apt install r-base

sudo apt install build-essential

Set up Accounts[edit]

A shell account for Phlogiston[edit]

This account is used to run Phlogiston, store data, and publish for the webserver. By convention it's called phlogiston. Create it and apply whatever login rules, ssh, configuration, and security as is appropriate.

Set up Python[edit]

As user phlogiston:

python3 -m venv phlog_env

source phlog_env/bin/activate

pip install python-dateutil psycopg2-binary pytz jinja2

Set up R[edit]

As phlogiston, type R to enter R command line. In R,

install.packages('RColorBrewer', dep=TRUE)

install.packages('ggplot2', dep=TRUE)

install.packages('ggthemes', dep=TRUE)

install.packages('argparse', dep=TRUE)

install.packages('reshape', dep=TRUE)

install.packages('fivethirtyeight', dep=TRUE)

If prompted where to install, Install locally.

A Postgres account for Phlogiston[edit]

The local account must have access to a PostGreSQL database for data storage and reporting. As root:

sudo su - postgres

createuser -s phlogiston

createdb -O phlogiston phlogiston

Superuser access is required to because load_tables.sql installs the intarray postgresql extension. This also allows the script to create or reset its own data tables. Probably don't do this on a shared server.

Access to Phlogiston directories for postgresql[edit]

The Phlogiston scripts run some commands on the postgresql server, which runs under the postgres user, which needs to have access to phlogiston directories via the phlogiston group.

sudo usermod -a -G phlogiston postgres

sudo service postgresql restart

Install Phlogiston[edit]

Get the phlogiston code by cloning it from github. As the phlogiston user:

sudo su - phlogiston

git clone https://github.com/wikimedia/phlogiston.git

exit

Set up web publishing of results[edit]

Configure Nginx[edit]

Configure Nginx to publish from the phlogiston html output directory. Create the following file as /etc/nginx/sites-available/phlogiston:

server {
        server_name localhost;
        listen 80 default_server;
        listen [::]:80 default_server ipv6only=on;
        root /home/phlogiston/html;
        index index.html index.htm;
        ssi on;
        location / {
                 autoindex on;
                # First attempt to serve request as file, then
                # as directory, then fall back to displaying a 404.
                try_files $uri $uri/ =404;
                # Uncomment to enable naxsi on this location
                # include /etc/nginx/naxsi.rules
        }        
}

And run these commands to configure Nginx to use it

sudo rm /etc/nginx/sites-enabled/default

sudo ln /etc/nginx/sites-available/phlogiston /etc/nginx/sites-enabled

sudo service nginx restart

Set up the reports home page[edit]

mkdir /home/phlogiston/html

cp /home/phlogiston/phlogiston/html/index.html /home/phlogiston/html/

cp /home/phlogiston/phlogiston/html/style.css /home/phlogiston/html/

And edit index.html to reflect the scopes being reported.

First run[edit]

sudo su - phlogiston

cd phlogiston

createdb phlogiston

python3 phlogiston.py --initialize

bash batch_phlog.bash -m reconstruct -l true -s xxx

where xxx is a correctly configured scope. Validate the results.

Automate daily reports[edit]

  1. Run a complete reconstruction for all scopes:
    1. bash ~/phlogiston/batch_phlog.bash -m reconstruct -l true -s xxx -s yyy
      1. where xxx and yyy are reporting scopes. Append as many -s zzz as needed.
  2. Create a cron job for the phlogiston user, of the form
    1. 15 4 * * * bash ~/phlogiston/batch_phlog.bash -m incremental -l true -s xxx -s yyy >>~/phlog.log 2>&1
    2. This runs daily at 4:15 am UTC. Set this to be right after the dump is generated from Phabricator, and to run as often as the dump is updated. (Phlogiston can take hours to run, so anything more than daily may not be practical without optimization.)
    3. The file ~/phlog.log can be inspected for status.
      1. In particular, grep Done ~/phlog.log will show one line per scope per reconstruction and/or report.

How to use on other Phabricator instances besides Wikimedia Foundation[edit]

Untested:

1) set up a dump script on Phabricator, like this one, to generate dumps like this one.

2) Customize batch_phlog.bash to point to the new dump file.