Manual:Pywikibot/Mac

From MediaWiki.org
Jump to: navigation, search
Pwb icon.svg
Pywikibot
Quick overview
Quick Start Guide
Installation (Mac)
user_config.py

Wikidata
Non-Wikimedia wikis
Basic use
Scripts
Development
Further help

v  d  e

If you need more help on setting up your pywikipediabot visit the irc channel #pywikipediabotconnect @ freenode server or pywikipediabot mailing list.

This is a complete step-by-step guide on how to install and run your own pywikipedia bot on Wikimedia projects from a Mac.

Revision (3 July 2013) Content revised to reflect OSX Mountain Lion 10.8.4.

All of the Bot software can be run on your local Mac against the particular Wikimedia server -- a "Shell Account" on that Wikimedia server is not necessary. However, a "Bot Account" is!

  • You can also run against a Wikimedia installation running on your own Mac under OSX Server!
(OSX Server includes a standard distribution WIKI. Mediawiki software needs to be installed separately.)
  • While a number of the actions can be performed directly from the Finder, most require that you are familiar with the use of the Terminal window. This provides what Unix and Linux users call "Shell access." The OSX Terminal application defaults to the Bourne Shell.
  • It is also recommended that you are conversant with a "Text Editor," not a "Word Processor."
While "Text Edit" is an application supplied by Apple, it is NOT well suited to most of the work which needs to be done.
One solution is to use one of the "traditional" command line editors supplied with OSX -- pico(nano) or emacs; another is to use a full GUI based text editor such as TextWrangler (free) or BBEdit (commercial) from BareBones Software.
Experienced users can also use the facilities of Xcode.

Downloading[edit | edit source]

Python[edit | edit source]

Apple provides an installation of Python with every Mac OS X installation, by default. You don’t need to install Python.

  • OSX 10.8.4 (Mountain Lion) provides (i.e. provided by Apple) both Python and Pythonw 2.6; and Python and Pythonw 2.7. The default is Python 2.6 -- "$man python" for details on switching between the two if necessary.

SVN[edit | edit source]

Many Wikimedia projects require that you need to run the latest version of the Pywikipedia framework. The easiest option is to use SVN.

  • OSX 10.8.4 (Mountain Lion) provides (i.e. provided by Apple) SVN: Subversion command-line client, version 1.6.18
Note that if you are familiar with Xcode, you can also use it for many of these actions. However, using Xcode is beyond the scope of this article.

Downloading the Pywikipediabot Scripts[edit | edit source]

Open Terminal.

(In Lion (10.7) and Mountain Lion (10.8) Terminal is found under "/Applications/Utilities" in the Finder. It does not appear in the Launch Pad)

Copy-paste the following into the Terminal window, and press enter/return:

$ svn checkout http://svn.wikimedia.org/svnroot/pywikipedia/trunk/pywikipedia/ pywikipedia

Using Apple's SVN under Mountain Lion 10.8.4, the following Error will be generated by Apple's security interface (Gatekeeper):

Error validating server certificate for 'https://svn.toolserver.org:443':
 - The certificate is not issued by a trusted authority. Use the
   fingerprint to validate the certificate manually!
Certificate information:
 - Hostname: *.toolserver.org
 - Valid: from Sun, 26 May 2013 09:34:24 GMT until Wed, 28 May 2014 17:08:01 GMT
 - Issuer: GeoTrust, Inc., US
 - Fingerprint: bf:35:7f:3e:62:4b:89:6c:bc:39:c9:c3:38:81:9e:53:26:43:be:f4
(R)eject, accept (t)emporarily or accept (p)ermanently? 

Responding "p/t" may then result in:

svn: warning: Error handling externals definition for 'pywikipedia/externals/pycolorname':
svn: warning: OPTIONS of 'https://svn.toolserver.org/svnroot/drtrigon/externals/pycolorname': 
Could not read status line: connection was closed by server (https://svn.toolserver.org)
( Note: This is assumed to be a transient SVN error and will likely go away with the next SVN update.)

The length of time for the download will depend upon the speed of your Internet connection. It will take a few minutes.

A folder will be created in your “home” folder (look in the left column in the Finder for the house icon and your username), with the name “pywikipedia”, in which all the scripts will be when the downloading is finished. You can see when it’s finished by looking for the following in the last line in Terminal:

NAME_OF_COMPUTER:~ USER_NAME$

This text is equal to the one appearing when you open Terminal (it is called the command line or terminal prompt). $ will be used in the text below to indicate the prompt.

Before you begin[edit | edit source]

There are a number of files in the SVN download to which one needs to pay attention. While they can be read from the Finder, you cannot perform the necessary actions on them from there.

  1. README
  2. CONTENTS
  3. docs/README
Note that when viewed from the Finder, the directory (folder) is displayed in CASE INSENSITIVE alphabetical order (upper and lower case are intermixed), while when viewed from a "ls" in Terminal, they appear in CASE SENSITIVE order -- i.e. the Capitalized names will appear at the top.
  • The top-level README file is a standard inclusion in such software distributions, however in this case, its contents are simply cosmetic. They provide no useful information.
  • The CONTENTS file contains important information describing the individual files in the distribution.

The first step in installing pywikipedia is to run the python code: generate_user_files.py. This cannot be done from the Finder.

Note that if you have Xcode installed, double clicking on this script from the Finder window will launch Xcode.
  • In a Terminal window:
$ cd pywikipedia
$ python generate_user_files.py

Which will generate the following:

Note 1: For your first attempt, generate both files
Note 2: The list of WIkis are those who are advertised to Mediawiki.org; for a private Wiki, select "27 - test"
Note 3: The default language is english
Note 4: for "Username" enter the username associated with your bot; typically <yourWikiUserid-bot>
Note 5: Choose the "Small" (S) version of the config file for your first attempt. The extend version contains many options beyond the scope of this article.
1: Create user_config.py file
2: Create user_fixes.py file
3: The two files
What do you do? 3 <-- Note 1: generate both files 
1: anarchopedia
2: battlestarwiki
3: botwiki
4: celtic
5: commons
6: fon
7: gentoo
8: i18n
9: incubator
10: krefeldwiki
11: lockwiki
12: loveto
13: lyricwiki
14: mediawiki
15: memoryalpha
16: meta
17: mozilla
18: oldwikivoyage
19: omegawiki
20: openttd
21: osm
22: piratenwiki
23: southernapproach
24: species
25: strategy
26: supertux
27: test
28: twcareer
29: ubuntutw
30: uncyclopedia
31: vikidia
32: wekey
33: wesolve
34: wikia
35: wikibond
36: wikibooks
37: wikidata
38: wikimediachapter
39: wikinews
40: wikipedia
41: wikiquote
42: wikisource
43: wikitech
44: wikitravel
45: wikitravel_shared
46: wikiversity
47: wikivoyage
48: wiktionary
49: wowwiki
Select family of sites we are working on (default: wikipedia): 27 <-- Note 2:  use Test to start
The language code of the site we're working on (default: 'en'):  <-- Note 3: (hit return to accept the default)
Username (en test): <wiki-bot username> <-- Note 4: your bot's username -- typically <yourWikiUserid-bot>
Which variant of user_config.py:
[S]mall or [E]xtended (with further information)? S <-- Note 5: Choose the Small version
'user-config.py' written.
'user-fixes.py' written.

Now you can begin configuring your bot.

Configuring[edit | edit source]

The user-config.py created by the "Small" option consists of 4 lines:

# -*- coding: utf-8  -*-                                                                                                      
family = 'test'
mylang = 'en'
usernames['test']['en'] = u'username-bot'

You need to configure (customize) your bot before you can use it.

This can be done using the applications Text Edit or TextWranger, or the comnand line tools in a Terminal Window -- emacs or pico(nano).

If you choose to use Text Edit or TextWrangler, simply use the finder to navigate to user-config.py, in the pywikipedia folder, and drag and drop that file on top of the appropriate icons.

If your language uses non-ASCII characters, make certain to retain the first line in the file:

# -*- coding: utf-8 -*-

Edit or add the following lines to user-config.py as appropriate for your Bot:

Parameter Explanation
family = 'wikipedia'
mylang = 'en'
(Required)

“xx” is the main language you want your bot to work on, equivalent to the language code of the Wikimedia project in which the bot’s going to run.

We selected "en" or English (the default) in the generate_user_files.py script.[1]

Family is the project name.[2]

We selected "27 test" as the Family of wikis when we ran the setup script generate_user_files.py. If the bot’s main wiki is for instance Wikibooks, edit this line in user-config.py:

family = 'wikibooks'
  • NOTE: If you’re running your bot on Commons, both mylang and family are to be set to “commons”.
  • ALSO NOTE: To run a bot on your local wiki will require you will likely need to create a "family file" --"families/xxx_family.py" where xxx is the name of your local wiki; the same name as you would enter into this variable, replacing "test". See: families/README-family.txt
usernames['wikipedia']['en'] = u'ExampleBot'
(Required)

Your user-config.py file needs to specify the bot's username.

In this example, the user is working on English Wikipedia, and has created a bot account with the username "ExampleBot".[3] [4]

usernames['FAMILY']['MYLANG'] = u'ExampleBot'
  • “FAMILY” and “MYLANG” are the same as you defined in the mylang and family options above.
We selected the FAMILY” to be "Test" when we ran the first setup script.
If you’re running the bot on Wikipedia, the family option should be “wikipedia” in the above line (see examples below).
  • This line can be defined as many times as you like (but only once per wiki Family), for each of your bots on different WM wikis.
  • The default for “MYLANG” will be what’s defined in mylang, except when running interwiki; the bot will edit on all the wikis defined in user-config.py.

(Optional)

usernames['wikipedia']['de'] = u'BeispielBot'
usernames['wikipedia']['en'] = u'ExampleBot'
usernames['wiktionary']['de'] = u'BeispielBot'

(Optional)

If you are working on more than one Wikimedia project, you can also add several usernames.

(Optional)

console_encoding = 'utf-8'

(Optional but recommended)

It is recommended that you change the console_encoding to UTF-8. Especially if you are editing a wiki with non-ASCII characters, this is important. If you don’t change the encoding, you’ll end up with “?”s all over the place. For example, the Spanish name María will render like Mar?a.

(Optional)

textfile_encoding = 'unicode_escape'

(Optional but Rarely needed)

Set "textfile_encoding" only If this is the encoding used by your system.

(Optional)

sort_ignore_case = True

(Optional)

Some scripts may use this for sorting, e.g. solve_disambiguation.py. Default is False. Capitalized titles will preceed uncapitalized ones if this key is False or omitted, and capitalization will be disregarded by sorting if True.

(Optional)

use_api_login = True

(Optional)

Because of some problems with SUL bots, the bot must log in via API. See: #Registering the username on the wanted wikis below

(Optional)

log = ['*']

(Optional)

Enable logging for all scripts. Logs will be stored in the logs folder.

  • Remember to save your changes to user-config.py.

Notes[edit | edit source]

  1. If you want to work with more than one language, choose the most common one. You can override this on the command line by using -lang:zh parameter.
  2. Meta uses 'meta' for both language code and wiki family, Commons uses 'commons' for both, and Testwiki uses 'test' for both, the multilingual wikisource uses '-' for the language. You can override this on the command line by using -family:wikibooks.
  3. The 'u' in front of the username stands for Unicode. The 'u' is required if your username contains non-ASCII characters.
  4. Note that on Linux/Unix hosts username capitalization matters! While logging in may not be an issue, testing the log in or attempting to use a bot will not use the correct cookie file and may result in anonymous access to the API. This can cause problems for private wikis that do not allow anonymous access or use third party authentication. Default usernames for mediawiki and those pulled via LDAP or other third party authentication schemes will have an uppercase character for the first letter, thus 'user' becomes 'User'.
Language: English  • Tiếng Việt


user-config.py examples[edit | edit source]

EksempelBot on no.wikipedia[edit | edit source]

mylang = 'no'
usernames['wikipedia']['no'] = u'EksempelBot'

console_encoding = 'utf-8'
use_api_login = True

ExampleBot on Commons[edit | edit source]

mylang = 'commons'
family = 'commons'
usernames['commons']['commons'] = u'ExampleBot'

console_encoding = 'utf-8'
use_api_login = True

BeispielBot on de.wikipedia and de.wikibooks, with de.wikipedia as main wiki[edit | edit source]

mylang = 'de'
usernames['wikipedia']['de'] = u'BeispielBot'
usernames['wikibooks']['de'] = u'BeispielBot'

console_encoding = 'utf-8'
use_api_login = True

Registering the username on the wanted wikis[edit | edit source]

It is recommended that you are registered on the wikis where you want to run your bot.

  • log into your own user account, write “Special:Userlogin” in the search box.
  • Then, choose to register an account.
  • Now fill out the required fields on the page and continue. If the account creation is successful, you should now be able to run the bot.
  • With SUL it’s easy to run bots on several wikis. If you’ve created the bot account recently, the account is already merged.
If you created your account(s) some time ago and it is not merged, or if it was created some time ago on other wikis, you will have to merge it in Special:MergeAccount.
Now, when logging in, the other accounts will automatically be registered as you enter the same password for all wikis.
  • NOTE: If you are working with your own Wiki or one independent of the Wikipedia "groups," you first need to create an appropriate "Family" file... see: #Generating a Family FIle below.

Bot flag and test edits[edit | edit source]

Don’t forget to apply for bot flag/permission to run the bot! On en.wikipedia and he.wikipedia (there may be more), you have to apply before you make test edits. On other wikis you can apply and make test edits right away. When you make test edits, use the argument -pt:15, which tells the bot to wait 15 seconds between each edit (4 edits/min), so the recent changes list doesn’t get flooded with the bot’s edits.

Running your Bot[edit | edit source]

Navigate to the correct folder[edit | edit source]

To run the scripts, you first need to navigate to the folder in which the scripts are located. To do this you need to use the Terminal window (application) - (Applications/Utilities/Terminal)

A "fresh" Terminal window will normally open in your "Home" directory .
  • “USER_NAME” is the home text to the right of the house icon in Finder; also referred to by the character "~"
  • Typing "cd" by itself will normally return you to your Home directory.
  • Assuming you have used the default SVN install, enter the following in Terminal:
$ cd pywikipedia

or

$ cd /Users/USER_NAME/pywikipedia

or

$ cd ~/pywikipedia

Tip[edit | edit source]

The above can be unnecessarily boring to write each time you want to use the bot. Terminal saves every command you use, and you can use old commands by pressing arrow up. With one press on the arrow up button, you see the last command used. Most likely, the command you see is not the right one; simply keep pressing arrow up until you see the right one. Then, press enter/return. Now, you can start running the bot. When you’re running a script followed by another when the first one is done, you don’t have to use the navigation command once again; only if you close Terminal.

Logging in[edit | edit source]

Now, you need to log into the bot accounts on the wiki(s), by typing the following into Terminal:

$ python login.py

Terminal will ask for the password. Type the password you used when you registered the bot’s username.

If you have accounts on multiple wikis, use

$ login.py -all

Type in the passwords, as above.

Logging in is only necessary once, the bot stays logged in.

Examples[edit | edit source]

$ python login.py
Password for user user-bot on mywiki:en:          <-- password is not echoed to the terminal window
No handlers could be found for logger "pywiki"  <-- no log file was configured, use: <code>python login.py -log</code> to enable the logfile, using the default filename
                                                                                  '''"login.log"'''
                                                                                     Logs will be stored in the logs folder.
Logging in to mywiki:en as user-bot via API.
Should be logged in now

To verify your login status use the "-test" parameter

$ python login.py -test
No handlers could be found for logger "pywiki"        <-- no log file was configured, see comment in previous example.
You are logged in on mywiki:en as user-bot.

Logging Out[edit | edit source]

To log your Bot out of your wikis use:

$ login.py -clean

Running the scripts[edit | edit source]

After navigating to the right folder, you can now select scripts, by writing:

$ python SCRIPT_NAME.py

To see an overview of some scripts with description pages, look here. Most of these scrips are part of the standard SVN download.

Arguments[edit | edit source]

Arguments are written by using a dash: - and then the name of the argument, -pt:15 for instance. Many scripts require arguments, for example interwiki.py:

$ python interwiki.py -start:! -autonomous
As default, the bot will run interwiki from the main language and project, and correct them on all registered wikis. The -start and -autonomous arguments says that it’ll check all pages on the wiki and add/modify interwikis. It will not remove anything, and it will give up when discovering conflicts, see interwiki.py.
Global arguments[edit | edit source]

All scripts recognize the following arguments:

-help
Shows the script’s help text
-lang:xx
Selects the language you want to work on. Overrides the mylang option in user-config.
-family:xyz
Selects the family you want to work on. Overrides the family option in user-config.
-log
Enable the logfile. Logs will be stored in the logs subdirectory.
-log:xyz
Enable the logfile, using “xyz” as the filename.
-nolog
Disable the logfile (if it's enabled by default).
-putthrottle:nn or short -pt:nn
Set the minimum time (in seconds) the bot will wait between saving pages. The default value is zero.

Maintenance[edit | edit source]

Updating[edit | edit source]

To update your scripts, simply write the following into Terminal:

$ svn update pywikipedia
When it’s done, it will say “At revision XXXX” (“XXXX” depending on the latest revision). If you have modified some of the scripts, don’t worry, your modifications will be merged with the changes made by updating.
  • If you have experience using Automator, you can do the Shell process in a "workflow" and save it to your desktop or on your dock, so you can reach it more easily. It’s a good idea to right click on it, then, while pressing Alt/Option, hold the cursor over Always open in, then choose Automator runner.

Modifying[edit | edit source]

You might want to go through the script text before you run it, because for some languages the script may be bot specific. Right-click on it, hold the cursor on Open in, then choose Other, scroll down and double-click on TextEdit. However, this is not mandatory, and not recommended for beginners, but if you have to; be careful when editing the scripts!

Reverting[edit | edit source]

If you want to revert your changes to a script, simply run the following:

$ svn revert NAME_OF_SCRIPT.py

If a script suddenly doesn’t work, and produces the output SyntaxError: invalid syntax, you have encountered an SVN conflict.

Revert the script as described above. Then try to run it. If it works, run the following in Terminal:

$ svn resolved NAME_OF_SCRIPT.py

You can now try to make the same edits again, if desired.

Generating a Family FIle[edit | edit source]

Family is the project name.[1]

  • We selected "27 test" as the Family of wikis when we ran the setup script generate_user_files.py.

However, if the bot’s main wiki is "mywiki", edit this line in user-config.py:

family = 'mywiki'
  • To run a bot on your local wiki (mywiki) you will need to create a "family file" --"families/xxx_family.py" where xxx is the name of your local wiki "families/mywiki_family.py", the same name as you would enter into this variable, replacing "test". See: families/README-family.txt
Usage: generate_family_file.py <url> <short name>
Example: generate_family_file.py http://www.mywiki.bogus/wiki/Main_Page mywiki
This will create the file families/mywiki_family.py

Example: lotro-wiki.com[edit | edit source]

> python generate_family_file.py http://lotro-wiki.com/ lotro-wiki
Generating family file from http://lotro-wiki.com/

==================================
api url: http://lotro-wiki.com/api.php
MediaWiki version: 1.20.6
==================================

Determining other languages... fr

There are 2 languages available.
Do you want to generate interwiki links? This might take a long time. ([y]es/[N]o/[e]dit)N
Loading wikis... 
  * en...  in cache
Retrieving namespaces...  en 
Writing families/lotro-wiki_family.py... 

This creates the basic file: pywikipedia/families/lotro-wiki_family.py

  • If you did so, you can now rename user-config-hold.py to user-config.py and make certain that the "family=" parameter is correct.
  • Consulting: README-family.txt again, you can edit the family file to include any necessary additions or corrections.
  1. Meta uses 'meta' for both language code and wiki family, Commons uses 'commons' for both, and Testwiki uses 'test' for both, the multilingual wikisource uses '-' for the language. You can override this on the command line by using -family:wikibooks.