Manual:Pywikibot/Use on third-party wikis

From MediaWiki.org
Jump to navigation Jump to search
Other languages: English  • español

The pywikibot may be used to do all kind of things that are important for the maintenance of a MediaWiki project. When this software is to be used outside of the Wikimedia projects, some configuration needs to be done.

Some non-Wikimedia projects, or families, are already supported. These can be found in the pywikibot/families folder.

Using the existing files as examples, it should be easy to adapt the bot to your own project. For a shorter set of instructions, see the Quick Start Guide which might have your bot up and running quickly. If you have any problems, you can come back on this page.

user-config.py file[edit]

The user-config.py must have a username defined for a wiki, especially if the bot does edit actions on it. The following three lines must be added to the user-config.py:

Code Explanation
mylang = 'xx'

xx is the code for the language code you are working on, "en" is English. If you want to work with more than one language, choose the most common one, as you can override configured value in command line by -lang: parameter.

family = 'sitename'

"Sitename" is the name of the site you're working on. Like the language, this can be changed by using the -family: parameter. It is the prefix of the <sitename>_family.py file without the "_family.py".

usernames['sitename']['en'] = u'ExampleBot'

Your user-config.py file needs to specify the bot's username, for a specific family and language. The sitename must be the name of a valid family and at least one username must be specified for the configured default language/sitename combination. It doesn't create an account on that page so it must be an already created one. The language (en in the example) can be replaced by * and then the username is used for that site, if no username for the specific language is used.

In this example, the user is working on English sitename, and is logging in as a bot with the username "ExampleBot".

Now save user-config.py again.

Family file[edit]

If there is no family file for the wiki that you are going to use the bot on, then you need to create one. The username must be manually added after creation of a family.

AutoFamily[edit]

It is possible to define an AutoFamily in the user-config.py and this doesn't require an additional family file, but doesn't allow as much configuration freedom.

To add a family with the name 'mhwp' or 'w3c' the following lines can be added:

family_files['mhwp'] = 'https://mh.wikipedia.org/'
family_files['w3c'] = 'https://www.w3.org/wiki/api.php'

It adds w/api.php at the end of the URL if it doesn't end with api.php. So if your wiki does something different (like the W3C example) simply use the complete URL to the api.php. The text must begin with http:// or https:// to be detected as an AutoFamily. Usernames can be defined as usual (usernames['whwp']['*'] = u'ExampleBot') but there are no such thing as languages.

Script to generate family file[edit]

The generate_family_file.py in the root directory automatically generates a family file and stores it in the family repository. Given the URL it queries the page and extracts the location of api.php from it. If it succeeds you can login to your wiki using python pwb.py login (make sure you have already created the account on the wiki!). If that fails, then you will need to modify the existing files below, or create a new file, in a text editor.

Creating the family file manually[edit]

Save the file in the pywikibot/families folder, with a name such as <sitename>_family.py where <sitename> is the name of site you have chosen (for example, ksp_family.py).

The basis looks something like this:

from pywikibot import family

class Family(family.Family):
    name = 'ksp'
    langs = {
        'en': 'wiki.kerbalspaceprogram.com',
    }

If the wiki is configured like the official WMF wikis, this is the least amount of code required. Important is that the class is named Family and that the constructor (__init__ doesn't require any additional parameters (apart from self). If a wiki does support multiple languages it is possible to define them in the langs dictionary.

This assumes that the api.php is in the directory /w/ (in that example the full URL would be http://wiki.kerbalspaceprogram.com/w/api.php). If this is not the case and for example the api.php is in http://example.com/mediawiki/api.php the scriptpath(self, code) method of Family has to be overwritten. It does not return the domain and can be dependent on the language/code:

def scriptpath(self, code):
    return '/mediawiki'

To support HTTPS with a family (not AutoFamily) it is necessary to add that specifically:

def protocol(self, code):
    return 'HTTPS'

This enables for all languages/codes in that family the HTTPS protocol. It can also return 'HTTP' to not use a secure HTTP connection and add conditions to only use HTTP/HTTPS in certain cases. If the certificate is rejected, the authentication process can be ignored via:

def ignore_certificate_error(self, code):
    return True

Configuring custom families folder[edit]

If you create your own family files, you may not want to mix them with those provided by Pywikibot. You can put your custom family files into their own folder, and configure your script to look for them. To do that the easiest way is to edit the user-config.py and add all directories in which the family files are via register_families_folder(directory). All families in the files ending with _family.py are available then.

Via register_family_file(family name, file name) it is also possible to just add one family file which doesn't have to follow the filename convention.

Running the Pywiki Bot[edit]

Refer to Manual:Pywikibot/Basic use on how to run the bot.

Wikibase[edit]

Once the family module exists for the Wikibase repository, it needs to be modified so that the Family subclass tells Pywikibot that it supports Wikibase.

3.0-dev[edit]

Edit the family module so that the Family subclass implements method interface, similar to wikidata_family.py.


Family File examples[edit]

Example: Mozilla wiki[edit]

The Mozilla Foundation's wiki, wiki.mozilla.org, is a very simple example because it is only available in one language.

This is the contents of families/mozilla_family.py. Hints for you to write your own family specification are underlined.

 # -*- coding: utf-8  -*-

from pywikibot import family

# The official Mozilla Wiki. #Put a short project description here.

class Family(family.Family):

    name = 'mozilla' # Set the family name; this should be the same as in the filename.
    langs = {
        'en': 'wiki.mozilla.org', # Put the hostname here.
    }

    def version(self, code):
        return "1.4.2"  # The MediaWiki version used. Not very important in most cases.

    def scriptpath(self, code):
        return '' # The relative path of index.php, api.php : look at your wiki address.
# This line may need to be changed to /wiki or /w,
# depending on the folder where your mediawiki program is located.
# Note: Do not _include_ index.php, etc.

Example: Starwars[edit]

This is the content of the Starwars wiki at wikia. The file is located at families/starwars_family.py.

Here explains how (as of 2015) to configure the Pywiki bot to work at this site. As with other Wikia-related family configurations, this is likely to have been broken by the same series of domain name changes which put the Cancer Help Wiki on "cancer.fandom.com".

from pywikibot import family

class Family(family.Family):
    name = 'starwars'

    langs = {
        'en': None,
    }

    # A few selected big languages for things that we do not want to loop over
    # all languages. This is only needed by the titletranslate.py module, so
    # if you carefully avoid the options, you could get away without these
    # for another wiki family.
    languages_by_size = ['en']
    def hostname(self,code):
        return 'starwars.wikia.com'
    def path(self, code):
        return '/index.php'
    def version(self, code):
        return "1.9" # Which version of MediaWiki is used?

Example: Memory Alpha[edit]

memoryalpha_family.py is the "family" definition of Memory Alpha, www.memory-alpha.org, a Star Trek wiki. This specification is a little bit more difficult because it has several languages as subdomains of one common main domain.

The domain name scheme shown is the "old" version, before Wikia forced a change of domain name to "memory-alpha.wikia.com" and later "memory-alpha.fandom.com"; it likely will no longer work for Pywikibot use without configuration changes.

# -*- coding: utf-8  -*-
from pywikibot import family

# The Memory Alpha family, a set of StarTrek wikis.

class Family(family.Family):
    name = 'memoryalpha'

    langs = {  # All available languages are listed here.
        'de': None, # Because the hostname is the same for all languages,
        'en': None, # we don't specify it here, but below in the hostname()
        'nl': None, # function.
        'sv': None,
    }

    # A few selected big languages for things that we do not want to loop over
    # all languages. This is only needed by the titletranslate.py module, so
    # if you carefully avoid the options, you could get away without these
    # for another wiki family.
    biglangs = ['en', 'de'] # Not very important

    def hostname(self,code):
        return 'www.memory-alpha.org' # The same for all languages

    def scriptpath(self, code):
        return '/%s' % code # The language code is included in the path

    def version(self, code):
        return "1.4"

Example: Uncyclopedia[edit]

The various Uncyclopedias are slightly more awkward as not all are hosted at the same domain or under the same name. Domain names and paths must be specified individually as many individual languages have their own registered domain names and many use custom namespaces.

Uncyclopedia in language xx: is most likely not to be found on xx.uncyclopedia.org but on some other independent name, possibly on some other server with completely different path names and configurations. This limits the ability to share the same URL patterns across the multiple language projects.

Uncyclopedia was largely hosted by Wikia at one point and the approaches which worked for an Uncyclopædia or a Memory Alpha project typically can be adapted to other Wikia.

There was a families/uncyclopedia_family.py in some versions of the Pywikibot distribution (it was present in 2011 but gone by 2019). This file, if available, is likely too outdated to be useful; the example below is even more outdated, as is the limited on-wiki documentation like uncyclopedia:es:Usuario:Chixpy/uncyclopedia_family.py.

Uncyclopedia is a difficult case because it's not a single website but a loose coalition of multiple projects across multiple sites in various languages... in other words, a herd of cats. The individual projects are often on different domains or different names, they routinely use different paths to key files (was that /api.php or /w/api.php - it depends which language version) and in some cases the Special:Interwiki page will list different link destinations for the same prefixes depending on which Uncyclopedia you're looking at. One doesn't have to be mentally ill to run Pywikibot on this wiki set, but it most emphatically helps.

Many of the issues are rooted in the project's long history:

  • Uncyclopedia was founded as an independent wiki in English on Jan 5, 2005. Many of the early efforts to create an "Uncyclopedia Babble" project with multiple languages (fr: it: es: pl: and a few others) were created on Wikia in late 2005 or early 2006. There is no way for anyone other than Wikia staff to update the interwiki table on any Wikia-hosted project. Most of these projects used Wikia's userlist instead of Uncyclopedia's, used an incompatible free licence, intrusive advertising and an outdated, incomplete or broken interwiki table and were configured to look more like a random collection of unrelated wikis than a single, monolithic Wikipedia-style bloc.
  • By the end of 2006, just over half were Wikia-hosted; the next batch of wikis (pt: ja: zh-tw: nl: and others) were created in mid-2006 on a non-Wikia server as Wikia's goals did not always coincide with Uncyclopedia's. This made Extension:Interwiki available to individual project admins on the new projects, but there were still instances where certain link pairs (such as pt: to gl:) were not available. A few wikis (such as fi: ko: sv: at various times) were hosted independently; this created additional differences where one language may be using a different MediaWiki version or different path names.
  • In 2008, Wikia broke the Special:Export code by modifying it to spam "From SITENAME, a Wikia wiki" on every exported page. This broke Pywikibot installations, as this self-promotional text was being sucked into the 'bot script and then posted back to the original wiki as part of routine page updates like Manual:Pywikibot/interwiki.py runs. Most of the pywikibot installations were either shut down voluntarily or blocked by local admins.
  • A number of existing language projects forked to independent webservers (such as Russia in 2010 or USA in 2013). Wikia kept the old versions online as direct competitors to the new community, in an attempt to kill the new community through search engine duplicate content penalties. In some cases, the same language prefixes (such as en: or ru:) pointed to different places from different Uncyclopedia editions. In some cases, squatters would turn up at the old, abandoned wiki and - at worst - there would eventually be two rival communities each claiming to be Uncyclopedia in the same language.
  • A few projects (such as Stupidedia and Kamelopedia) were actually independent in origin; they never were part of Uncyclopedia, but had a similar theme and format. Because there is only one prefix (such as de: for German) for each individual language, Uncyclopedias which wanted to link to two or three overlapping projects would use language prefixes for non-standard dialects (such as Bavarian or Lower German) to link the extra projects into the family. These prefixes were very likely to work on non-Wikia editions of Uncyclopedia, but almost certain to redlink if used from Wikia.
  • Uncyclopedia did not follow Wikimedia's pattern of deploying Wikibase across every project in 2013, because there is no single database server accessible to all projects. This leaves interlanguage links to be created using the "old" (pre-Wikidata) methods as Wikibase is limited in its usability across multiple independent hosts.
  • In some cases, the URL of a wiki has changed because the project in that language had been renamed (for instance, Latin and Greek changed project names) or because an inactive project was archived to a subdomain (such as xx.uncyclopedia.info for language code xx:) instead of remaining on its own, language-specific second-level custom domain. A few upgraded to .org from other TLDs such as .ws, .net or .info for reasons which had more to do with the availability of the domain than project objectives.
  • A few interwiki sidebar links rely on Manual:$wgExtraLanguageNames to contain values for non-standard prefixes like Klingon ('tlh' => 'tlhIngan Hol') or Olbanian ('olb' => 'Олбанский') which are not set by the standard MediaWiki installed configuration. If these aren't set on one of the individual wikis in the set, any attempt to use them will cause a link to display at the end of the page body instead of being moved to the sidebar as an inter-language link.
  • A forced ad-heavy "Oasis" reskin in May 2018 broke one key requirement: as an encyclopaedia parody, Uncyclopedia needs to look like the encyclopaedia and therefore needs to use the standard skins (such as Vector or Monobook) used by Wikipedia. The backlash to this change caused more communities to abandon Wikia for independent hosting; a few went to Miraheze, a relatively new wiki farm with limited resources. Any uncyclopedia_family.py file which wasn't being constantly updated wasn't going to properly reflect these changes... and, even if it were updated, the existence of two duplicative wikis in the same language is causing technical problems for 'bot operators who have no idea which version is being linked from any given project.
  • As a final parting shot, Wikia heightened its level of content censorship in Feb 2019 (even though this issue had already caused en: to fork in 2013) and began shutting down every Uncyclopedia it was still hosting in Mar-Apr 2019. Instead of explaining where the missing projects had gone, Wikia now returns 410 GONE and "This wiki is now closed".

This example is outdated; it was based on the 2006 configuration, with minor updates to reflect that en: and ru: have forked. Do not expect this (or any other configuration which mentions Wikia in any context) to be current or usable on Uncyclopedia today. It is only retained to illustrate that various things which would be standard (such as base domain names and path structures) on any more monolithic project may be need to be coded individually to handle a loose confederation of independent wikis hosted separately:

# -*- coding: utf-8  -*-
from pywikibot import family

# The Uncyclopaedia family, a satirical set of encyclopaedia wikis. (May 2006)
#
# Save this file to families/uncyclopedia_family.py in your pywikibot installation
# The pywikibot itself is available for free download from github
#

class Family(family.Family):
    name = 'uncyclopedia'

    langs = {
        'ar': 'beidipedia.wikia.com',
        'ca': 'valenciclopedia.wikia.com',
        'da': 'da.uncyclopedia.wikia.com',
        'de': 'de.uncyclopedia.wikia.com',
        'el': 'anegkyklopaideia.wikia.com',
        'en': 'uncyclopedia.co',
        'es': 'inciclopedia.wikia.com',
        'fi': 'peelonet.zapto.org',
        'fr': 'desencyclopedie.com',
        'he': 'eincyclopedia.wikia.com',
        'hu': 'hu.uncyclopedia.info',
        'it': 'nonciclopedia.wikia.com',
        'ja': 'ja.uncyclopedia.info',
        'la': 'uncapaedia.wikia.com',
        'no': 'ikkepedia.net',
        'pl': 'nonsensopedia.wikia.com',
        'pt': 'pt.uncyclopedia.info',
        'ru': 'absurdopedia.net',
        'sv': 'psyklopedin.hehu.se',
        'zh': 'zh.uncyclopedia.wikia.com',
        'zh-tw': 'zh.uncyclopedia.info',
    }

    # A few selected big languages for things that we do not want to loop over
    # all languages. This is only needed by the titletranslate.py module, so
    # if you carefully avoid the options, you could get away without these
    # for another wiki family.
    languages_by_size = ['en', 'pl', 'de', 'es', 'ru', 'fr']

    def hostname(self,code):
        return self.langs[code]

    def scriptpath(self, code):
        if code=='fi':
           return '/hikipedia'
        if code in ['hu', 'ja', 'pt', 'sv', 'zh-tw']:
           return '/w'
        if code=='no':
           return ''
        return '/wiki'

    def version(self, code):
        return "1.7"

Notes[edit]

Language[edit]

For a single-language site, the language specified does not matter as long as it is consistent between the user-config.py and families/foo_family.py

Login failed. Wrong password?[edit]

Pywikibot does not report anything more useful than success, failure, or host connection failure. If possible, try accessing the web server logs (Apache uses access_log by default) and take a look at the URL strings.

You could also try running login.py in 'very verbose' mode, i.e.: python login.py -v -v . This will dump a lot of information, including possibly the html code from the server, so you can see exactly what is going on. (however this option does run the risk of possibly revealing some security sensitive info so be careful...)

Make sure your scriptpath, the relative path to your api.php and index.php files, is defined appropriately for your wiki in your families file:

def scriptpath(self, code):
    return '/wiki'

If this does not help, add a line like

authenticate['www.mywiki.com'] = ('botName','botPassword')

to your user-config.py file.

See the mozilla configuration for clues.

Bot doesn't want to stay logged in[edit]

If you are able to log in with login.py, but the bot doesn't seem to want to remember your credentials no matter what you should consider using a password file.

Add the line

password_file = "secretsfile"

to your user-config.py. The secretsfile should be made only readable by the user who executes the bot (for Unix it's usually the the mode 600), that file has to be UTF8 encoded (or ASCII if it only consists of ASCII characters). Each line contains a tuple with 2 to 4 values:

("username", "password")
("family", "username", "password")
("language", "family", "username", "password")

It'll select the last applicable entry, so the values should be ordered from most generic to least generic. If no language is given it'll use that password on all languages of that family and if family is also not given it'll choose that password if the username is matching.

This is not the same as the tip above, which is for HTTP AUTH BASIC-protected wikis.

Possible issue with SVN pywikipedia, custom URL schemes, and/or MediaWiki in XAMPP[edit]

My particular difficulty with login.py (from SVN rev 9509, with Python 2.6.something and MW 1.17) is that it initially logs in just fine, but login.py -test immediately after says:

You are not logged in on <wiki>:<lang>.

when clearly I just did. There is nothing unusual in the output of python login.py -v -v. Yet none of the other bot scripts are able to use the credentials which should have been cached by login.py. Since I'm using it with non-Wikimedia wiki, using a custom (short) URL scheme, with the wiki running on Windows (XAMPP), and with all those strikes against it, I'm not about to call this a bug. But login.py certainly isn't remembering the login cookie as advertised. Annoying, because I was trying to loop through a thousand plus images with upload.py and being asked a password for each one. The workaround above works.

Mismatched interwiki configuration[edit]

In some projects (such as Uncyclopedia), each language operates as an independent wiki. This may mean that interwiki tables differ from one individual wiki to another within the same project. Interwiki.py is built on the assumption that, if outbound interlanguage links are available at all from a language, the list of available link-destination languages and the destination URL for each will match perfectly across all wikis in the project.

This leads to some potential pitfalls:

  • If one language is missing outbound language interwiki support entirely, one must avoid giving pywikibot an account on that wiki (in user-config.py) in order to ensure that interwiki.py leaves that one language wiki untouched.
  • If one language is using a valid but incomplete interwiki table, running interwiki.py on that language wiki will create broken links. Unlike the case where one language is missing project-wide, there is no clean and easy workaround.
  • If a language in a project has been forked (not just mirrored), the interwiki for each individual language pair will point to only one of the multiple forks. Verify the wiki your bot is looking at is the same one that is being linked from the wiki you're editing - otherwise the bot will delete some valid links as "page does not exist".

Customisation of namespaces[edit]

Some projects use non-standard extensions to provide Special:Interwiki and Special:Namespaces lists; where available, these lists should be checked against the configuration files to detect any additional namespace customisations.

Short URL rewrites[edit]

If your site uses short URL rewrites, you may have to add "/api.php" to the blacklists, Otherwise, your bot scripts will not be able to access api.php.

Check your rewrite conditions in your Apache conf file, and make an appropriate addition.

Bot & private wikis[edit]

Some wikis require logging in into mediawiki before being able to view any wikipage. If you have any such site, add to your custom family file :

def isPublic(self):
    return False

Fixing Permission Denied problems

Creating page [[Category:Help]] via API
Unknown Error. API Error code:permissiondenied
Information:Permission denied

Your wiki may require users to be part of a particular group in order to edit pages. If so, login to your wiki as an administrator and use Special:UserRights to put your bot into the proper group(s) to avoid API permission problems.

Bot & HTTP auth[edit]

Some sites will require password identication to access the HTML pages at the site. If you have any such site, add lines to your user-config.py of the following form:

authenticate['en.wikipedia.org'] = ('John','XXXXX') # where John is your login name, and XXXXX your password.

Colons in Article Titles Within Custom Namespaces[edit]

I had a problem with a colon in an article title; specifically, for the title The Third Wave: Democratization in the Late Twentieth Century in the namespace that I created called WikipediaExtracts. The script failed on that one on account of the colon in the article title.

When I looked at page.py (line 171) I saw that one can pass the namespace number into the function.

So when I changed it from:

page = pywikibot.Page(site, "WikipediaExtracts:The Third Wave: Democratization in the Late Twentieth Century")

to:

page = pywikibot.Page(site,"The Third Wave: Democratization in the Late Twentieth Century", 3000)

that worked -- where 3000 is the number I had picked for the custom namespace following the example at Manual:Using_custom_namespaces#Creating_a_custom_namespace.

See also[edit]

Other languages: English  • español

References[edit]

  1. The 'u' in front of the username stands for Unicode. The 'u' is important if your username contains non-ASCII characters. If you are using ASCII characters only or Python 3, you can remove the 'u' (if you have troubles logging in with your bot, otherwise you can leave the 'u' as is).


If you need more help on setting up your Pywikibot visit the #pywikibot IRC channel connect or pywikibot@ mailing list.