User:KartikMistry/TPA

Initial setup
Login to stat1007:

Create virtualenv
Active the virtual environment by:

Now, install jupyter notebook:

Next, add the following lines to your .profile file:

You can additionally add these two lines to make your life easier:

Close the session, and you will have everything configured.

Starting notebook
Make sure to check Kerberos authentication timeout first. Default is set to 48 hours now.

Extend it by running kinit:

Now, you can login again and you will just need to do this:

Press ESC,

And check in which port the jupyter notebook is running (usually you should have 8888 or 8889), in this example is 8889

Then, in your local machine, create a tunnel by running:

And then using your browser you will see the normal notebook in:

Running scripts
1. Run all notebooks in order.

2. 00ExtractNamedTempates.ipynb overwrites existing output if runs again, so it is better to save products json files somewhere to save time.

Using Python
1. Convert ipynb to Python files: ```bash jupyter nbconvert --to python nb.ipynb ``` 2. Update config.json, remove unneeded pair.

3. Put Wikipedia dumps under: `templatesAlignment/../../dumps/%swiki/latest/` only.

4. Rename dump to reflect dumpdate as `latest` to simplify script run.

5. Run all scripts in order.

Also see

 * Issues related to Kerberos access: https://wikitech.wikimedia.org/wiki/SWAP#Access_and_infrastructure