Manual:Pywikibot/Installation

Initial setup
First you have to choose which branch of PWB you want to use:
 * Core or formerly "rewrite" which is better for doing running in WMF projects (like Wikipedia) but be noticed all of scripts hasn't been migrated to this branch so maybe you can't run some scripts
 * Compat or formerly "trunk": It's old and slow and dirty but you can use it so much more easier in non-WMF projects (for example for wikis without API support) and all of the scripts exist in this branch

Depends on which branch you want to install and which OS you're using you have to do things: The two things to do are:
 * 1) Download Python (for Windows users) or update it, if needed (for Mac users)
 * 2) Install httplib2 (for just core, not compat) or install BeautifulSoup (for just compat, not core)
 * 3) Download pywikibot

After that you just need to make a file named "user-config.py" and you DO NOT need to "install" pywikibot to be able to make use of it. You can actually just run it from directory you downloaded it (Via Git or nightlies)

Download Python
Requirement: To run Pywikibot, Python v2.7.2 or higher is required, but Python v3.x isn't currently supported.


 * For Windows, download the latest Python v2.x here ( not 3.x version! ) Install the program.
 * For Mac and Unix, Python is already present. On Mac OS X and on some Unix versions (although it might be necessary or recommended to update it if you have a very old version).

Download Pywikibot
The easiest way to download Pywikibot is to use the latest nightly release. Just download the pywikibot zip file to your computer and decompress the file - there is no further installation required.

Download Pywikibot with Git
For installing with Git you should run: the  option. automatically install required submodules (like i18n messages or spelling) but if you didn't use this option, after cloning you should install externals. There are two (i18n is really required even for English language bots):
 * For compat (formerly "trunk")
 * For core (formerly "rewrite")

Download Pywikibot with SVN
If you don't want to use Git you can I still use SVN. For installing with SVN you should run
 * For core (formerly rewrite)


 * For compat (formerly trunk)

Updating the code

Download Pywikibot with TortoiseSVN for Windows user
If you like using TortoiseSVN you may use it as follows:


 * For core release (formerly rewrite)
 * 1) Right-click on your prefered directory and execute
 * 2) Choose   and paste the URL
 * 3) Choose   and change the default directory which is   e.g. to   if you like.
 * 4) Confirm with


 * For compat release (formerly trunk)
 * 1) Right-click on your prefered directory and execute
 * 2) Choose   and paste the URL
 * 3) Choose   and change the default directory which is   e.g. to   if you like.
 * 4) Confirm with

Right-click on your working copy and choose
 * Updating the working copy

Shortcut in command line
To allow your source code to be developed outside of the pywikibot source directory, add something like: to a file that gets run on login, usually ~/.bashrc - this avoids typing the export PYTHONPATH part in each time you log in. Naturally, change paths to match your installation.

Similarly, you can set the PYWIKIBOT_DIR environment variable to specify the directory in which user-specific information is stored (in particular, user-config.py which contains login data for the bot).

Windows users: create a shortcut
How to make a quick shortcut to run commands (Windows users):

If you're installing Pywikibot in a folder such as "My Documents" it may be troublesome to repeatedly use the "chdir" command to go into the folder to run the bots.

On Windows you can create a shortcut which will open the command box which you can use to run bots easily. Just follow these steps to create one:
 * 1) Right click the folder pywikibot is installed in.
 * 2) Click "Create shortcut". A new shortcut icon with the arrow key will be created.
 * 3) Right click on the new shortcut, and click properties.
 * 4) In the properties window, in the target box type in cmd.exe.
 * 5) In the "start in" box enter the full address of pywikibot.
 * 6) Click "OK".
 * 7) Click the shortcut and the cmd.exe opens up with the full path listed.
 * If you click properties again on this shortcut, you will notice that the shortcut icon has changed.

Updating

 * If you used Git for downloading Pywikibot, you must go to the your directory and command:
 * If you are using a nightly version, the process is a bit more complicated. You have to re-download a full copy from . Before installing it, backup your configuration files and scripts (user-config.py, any family file, or custom script that you might have created, and any current dump xml file you're using for a wiki). Replace your pywikibot directory by the new version you just downloaded. Restore your configuration files. If you're not sure of what you're doing, do not erase but keep a backup of your complete old pywikibot directory, to avoid losing any important files.

Automatic Updating on Tool Server
For automatic updating you can make update bash file and put it in root and fill it with these commands: and then run crontab -e and enter the following to make your bot to run every day at 00:00AM (midnight): Notice: in these cods username is your username.

Dependencies
The pywikibot framework is quite a big and complex code and as such needs external python modules (libraries) from other sources also in order to work properly. The dependencies can be installed manually or automatically (not supported by core yet).

If any issues arise during installation (of dependencies) please file a bug report or write to the pywikipeda-l maillist.

Automatic (recommended)
If available this is the recommended way, because this will result in an identical setup on all machines. All you have to do is just execute your favorite script after installation and pywikibot will ask you whether you want to install missing packages.

Packages will be installed from OS package management is possible (all Linux, not under win). If they cannot be found they will be downloaded as archive from original source, extracted and installed. In the course of this process a few packages have to be slightly modified in order to work seamlessly with pywikibot. This modification needs and additional binary tool called patch (patch.exe under win). Unfortunately this is not available from MS, therefore we use a port of the original linux code to win called gnuwin32 patch.exe:

Patch for Windows - Patch: apply a diff file to an original Version: 2.5.9 Homepage: http://www.gnu.org/software/patch/patch.html (sources freely available) Description: `patch' takes a patch file containing a difference listing produced by diff and applies those differences to one or more original files, producing patched versions. Win32, i.e. MS-Windows 95 / 98 / ME / NT / 2000 / XP / 2003 / Vista with msvcrt.dll and msvcp60.dll. If msvcrt.dll or msvcp60.dll is not in your Windows/System folder, get them from Microsoft, or (msvcrt.dll only) by installing Internet Explorer 4.0 or higher.

It is worth mentioning here that - despite the OS package management "install mode" - all files are installed locally into the externals/ directory of pywikibot. This is a very useful feature for users that do not have permission to install software to their system, e.g. non-admins.

Manual (for experts)
In order to install the needed packages manually, you first need to know which ones. A full list of all needed modules can be found in externals/__init__.py and contains: Which ones you really need, depends strongly on the script you intend to run - if you are unsure use the automatic mode above. In order to check correct installation just run a bot script. If the dependencies are satisfied everything will be ok, else the framework will complain and ask whether it should install missing packages automatically.
 * framework core code:
 * i18n [git submodule]
 * spelling [git submodule]
 * httplib2 [git submodule]
 * BeautifulSoup.py
 * patch.exe
 * depending of which script will be used:
 * opencv, opencv/haarcascades [git submodule]
 * pycolorname [git submodule]
 * irclib
 * mwparserfromhell
 * parse-crontab
 * odfpy
 * openpyxl
 * python-colormath
 * jseg, jseg/jpeg-6b
 * mlpy
 * music21
 * ocropus
 * pydmtx
 * py_w3c
 * zbar
 * (slic)
 * (bob, xbob_flandmark)

Setup on Wikimedia Labs/Tool Labs server
In order to install your bot onto the Wikimedia servers and run it from there, make sure first to become familiar with Wikimedia Labs/Tool Labs environment.

In the next step you have to request several accounts (for labs, for the tools project, your tool), provide an ssh key and so on. How to do this and then proceed, is described in full detail in Setup pywikibot on Labs.

Pywikibot source repo moved (from svn) to git, please confer Manual:Pywikipediabot/Gerrit first.

The bots projects here has become obsolete use tools now, in order to do so follow Tools/Help to get an account. Then create your tool (service group).

If you used the toolserver in the past and know how everything used to work there, confer migrating from toolserver for more info.

Now you are ready to start. Login to Labs tools project: $ ssh USERNAME@tools-login.wmflabs.org switch to the tool account with maintainer@tools-login:~$ become toolname local-toolname@tools-login:~$

Now install/clone the pywikibot code to your tool account as described below.

core
Similar to the instructions given in this mail do:

$ git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/core.git pywikibot-core $ cd pywikibot-core then you might want to compress the code down to the necessary parts (this is what you definitively wanted to do on the TS, but on Labs this is not needed) by $ git gc --aggressive --prune $ cd scripts/i18n/ $ git gc --aggressive --prune $ cd ../../externals/httplib2/ $ git gc --aggressive --prune

which results in a repo of size ~9MB. Now you have to setup pywikibot, by running your favourite bot script (e.g. )

$ python pwb.py clean_sandbox.py -simulate since you are doing this in a fresh clone, it will trigger a bunch of questions on how you want to configure your local copy, answer them carefully in order to proceed. Alternatively, if you have already config file from previous version, you can copy those existing config files into the right places (e.g. pywikibot-compat/) instead.

Further things you might have to to do (depending on what bot scripts you want to run) is to setup all externals properly - which still has to be done manually in core $ cd externals $ cat README and follow the instructions there.

You will also have to enter the password for your bot eventually.

Now you have finished the configuration of core and can continue setting up the jobs to execute.

compat
Follow the instructions given in this mail and do:

$ git clone --recursive https://gerrit.wikimedia.org/r/pywikibot/compat.git pywikibot-compat then you might want to compress the code down to the necessary parts (this is what you definitively wanted to do on the TS, but on Labs this is not needed) by $ cd pywikibot-compat $ cd i18n/ $ git gc --aggressive --prune $ cd ../externals/opencv/ $ git gc --aggressive --prune $ cd ../pycolorname/ $ git gc --aggressive --prune $ cd ../spelling/ $ git gc --aggressive --prune

(a first ' ' in the  directory is not needed anymore)

this results in a repo of size ~25MB. Now you have to setup pywikibot, by running your favourite bot script (e.g. )

$ python pwb.py clean_sandbox.py -simulate

similar as described in in the core section above.

You may setup all externals manually if you want - but this is not needed in compat, confer Manual:Pywikipediabot/Installation for further info.

You will also have to enter the password for your bot eventually.

Now you have finished the configuration of compat and can continue setting up the webspace and jobs to execute.

setup web-space
Per default, the directory listing on http://tools.wmflabs.org/TOOLNAME is disabled. If you want to allow it for all users, login to your tool account (as already described) and $ cd ~/public_html $ echo Options +Indexes > .htaccess

If you run a bot with the  option, you will find the log files within the logs/ directory. If you want to allow users to access it from the web, do $ cd ~/public_html $ mkdir logs $ cd logs $ ln -s ~/pywikibot-core/logs core If you want a specific file type to be handled different by your browser, e.g. .log files like text files, use (confer this) $ echo AddType text/plain .log > .htaccess and (don't forget to) clear your browsers cache afterwards.

Next you might want to consider you cgi-bin directory $ cd ~/cgi-bin follow the hints given at wikitech:Nova Resource:Tools/Help exactly, e.g. even the two commands $ /usr/bin/python     # valid $ /usr/bin/env python # in-valid work and do the same in shell, only the first one is valid and works here, the second is invalid! Another point to mention is that PHP scripts go into public_html, not cgi-bin. Python scripts on the other hand can be placed in public_html or cgi-bin as you wish. I would recommend to use public_html for documents and keep it listable, whereas cgi-bin should be used for CGI scripts and be protected (not listable).

setup job submission
In order to setup the submission of the jobs you want to execute and use the grid engine you should first consider wikitech:Nova Resource:Tools/Help and if you are familiar with the Toolserver and its architecture consult Migrating from toolserver also.

In general labs uses SGE and its commands like qsub et al, this is explained in this document which you should use in order to get an idea which command and what parameters you want to use.

An infinitely running job (e.g. irc-bot) like this (cronie</tt> entry from TS submit host):

06 0 * * * qcronsub -l h_rt=INFINITY -l virtual_free=200M -l arch=lx -N script_wui $HOME/rewrite/pwb.py script_wui.py -log

becomes

$ jsub -once -continuous -l h_vmem=256M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log or shorter $ jstart -l h_vmem=256M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log the first expression is good for debugging. Memory values smaller than 256MB seam not to work here, since that is the minimum. If you experience problems with your jobs, like e.g. Fatal Python error: Couldn't create autoTLSkey mapping you can try increasing the memory value - which is also needed here, because this script uses a second thread for timing and this thread needs memory too. Therefore use finally $ jstart -l h_vmem=512M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log

Now in order to create a crontab follow Scheduling jobs at regular intervals with cron and setup for crontab file like: $ crontab -e and enter PATH=/usr/local/bin:/usr/bin:/bin 06 0 * * * jstart -l h_vmem=512M -N script_wui python $HOME/pywikibot-core/pwb.py script_wui.py -log

additional configuration
Furthermore additional tools to support you and your bot at work are available:
 * wikitech:Nova Resource:Tools/Help, basically out-of-the-box but just for a short time period
 * wikitech:Nova Resource:Tools/Help
 * Gerrit/New repositories
 * Git/New repositories/Requests