Manual:Pywikibot/interwiki.py

There is a functional need for interlanguage links between the different language versions of a project, interwiki.py can help you create these.

Getting started
To see what you need to get started on the Python Wikipedia bot, see Using the python wikipediabot. We assume you have taken these steps (downloading Python and bot, creating user_config.py, running login.py). Now to start, type "wikipedia.py" (without the "'s), or if that does not work, "python wikipedia.py".

The bot will ask for a page to check. Give a page on your home Wikipedia, preferably one with one or more interwiki-links that could however have more. The bot will read this page, and if it has any interwiki-links, it will check those pages as well, and the interwiki-links from those, etcetera. After it has finished that, what happens on what pages it found.
 * If the page has no interwiki links, or if the links found are identical to the ones on the page, the bot will stop silently.
 * If the bot finds interwiki links to new languages, or finds that an interwiki-link has to be changed, it will do so.
 * If the bot finds that an interwiki link is to be removed, it will ask your permission to do so.
 * If the bot finds more than one page for a language, it will go into an interactive mode. It will give the pages found with the pages that link to that page, and ask for each language with more than one link, which if any should be linked to, then ask for each language with one link whether it should be linked.

Note that these behaviours can be changed using options, see below.

You can also specify the page to work on directly, using "wikipedia.py pagename". But there are more possibilities, see below.

Working on more than one page
Using the XML Export, the pywikipediabot-software can get more pages at once, upto 60 at a time. To use this possibility, you can use the bot on a set of pages. The most common form is getting pages in alphabetical order from Special:Allpages, using the -start option.

-start
If you add the option -start, the bot will go through the pages alphabetically, starting at the word specified. If you want to start at the letter B, for example, you can use "wikipedia.py -start:B". In particular, if you want to do the whole Wiki, you can use "wikipedia.py -start:!"

Restarting: -continue, -restore
Going through the whole wiki can, in large or even moderately large wikis, take a long time. Thus, it may well happen that you are forced to end the program before it has finished. In that case you can use "interwiki.py -continue" next time. The bot, when it crashes or is stopped (through control-c), will make a file specifying the pages it is working on. If you use the continue-option, it will continue with those pages, and after that continue alphabetically. If you want to restart a non-alphabetical run, you can use "interwiki.py -restore" instead. It will just restart the pages it was working on.

Be warned that only the last bot run that was stopped will be recoverable. The bot will save its information to a file interwiki.dump, and if another run is broken off, even if it is only on one page, it will overwrite the file.

Autonomous mode
When working on a lot of pages, you may want the bot to just continue, rather than asking you every time it sees a problem. This is done by adding the option "-autonomous". If this option is used, the bot will skip all problems and removals, and save a log of those in autonomous_problems.dat. If you want it to do removals as well, add the "-force" option too; in that case it is good to check the removals afterward (often a page is removed because of a correctable typo).

Using hints
Upto now, we have only worked on adding interwikis on pages that already have some. But the bot can also be used to add them on pages that have none yet. This is done by using hints. If for example you want to add interwikis to the page House, and think there might be a page at Maison that would be about the same subject, you can type (if your bot is set to run on English by default) "interwiki.py House -hint:fr:Maison".

If the link is to the same title, you can remove the title, and even the second :. Also, if you want to link to the same word in several languages, you can combine them with commas. So instead of "interwiki.py Albert Einstein -hint:de:Albert_Einstein -hint:fr:Albert_Einstein -hint:id:Albert_Einstein" (those underscores are necessary, otherwise the bot will regard the 'Einstein' part of the pagename), you can write "interwiki.py Albert Einstein -hint:de,fr,id",

Special hints
Some special hints have been defined to do a number of languages at once. You can use them instead of the language part of a hint. Currently the following special hints exist for Wikipedia: The same are defined for Wiktionary, but at the moment of writing, 30, 50 and all are the same for Wiktionary. It is intended to add more options.
 * 10: Ten of the largest Wikipedias
 * 20,30,50: Idem, for twenty, thirty and fifty languages
 * all: All Wikipedias with at least ~100 articles
 * cyril: All languages in Cyrillic script

Asking for hints
When working on multiple pages such hints in the command line are rarely useful. In that case (or if you want to decide on the hints later), you can use the options "-askhints", "-untranslated" and "-untranslatedonly". If you choose the -askhints option, for each page you will be asked for one or more hints. They can be like the hints after -hint: on the command line, but the ':' may not be omitted, and spaces are allowed. Thus, valid hints would for example be "en:John Smith", "de,nds,af:" or "50:". "-untranslated" asks for hints only if there are no interwiki links yet; "-untranslatedonly" is like -untranslated, but other pages are not worked on at all.

Instead of giving a hint, you can give an empty line. This specifies that all hints for this page have been given (or that you have no hints for it). Note that if you have given a hint, the bot will keep asking for more hints until you press enter. Another option is to input a question mark and nothing else; in that case you get shown the beginning of the text to the page. If after that you input the question mark again, it will give a larger part of the text, etcetera.

It might in these cases be useful to have the "-confirm" option added, so the bot gets interactive before making a change. This can be used to check whether the links are correct and/or as an impetus to create a backlink.

Wiktionary
For Wiktionary there is the special "-wiktionary" option. It works like "-hint:all", but has some extras because on Wiktionary some languages use capitalisation and others don't, and links to another word are never correct.

On non-capitalising wiktionaries, links to capitalising wiktionaries are only added for capitalised words. Also, any link found to a word that differs more than just in capitalisation, is ignored completely.

Automatic translation
For years (both AD and BC) and days of the year, the bot can automatically translate it in a large number of languages. If you do not want this automatic translation (for example because it takes long to go over such a large number of languages), it can be switched off with the "-noauto" option.

With the option "-years:" followed by a number (positive or negative), the bot goes through the years from the given year to 2050. If "-years" without any addition is given, the beginning year is taken to be the year 1.

With the option "-days" the bot goes through the days of the year; however, this bot only works correctly on nl:.

Avoiding unwanted links
If you want to run the bot, but know that for a given page, it will get to links that it should not get, you can use the -noredirect or -neverlink options.

-noredirect means that if a redirect page is found, the redirect is not followed, as is the normal behaviour, but the page is skipped.

-neverlink:xx with xx: a language code means that any links to the language xx: are ignored.

Working with the logfile
The bot makes a log of everything that happens in the file logs/interwiki.log

This can be used to create backlinks (from the languages you're linking to) to the pages the bot checked and edited. There are two possibilities:
 * Running the interwiki bot with the option "-warnfile:filename". This will give the bot the additions from the logfile as hints.
 * Running another bot, namely warnfile.py. This will directly make the changes advised by the logfile.

To make the long logfile more usable, you can run splitwarning.py. This will create separate files for backlinks from the various languages.

Overview of the options
Here is a list of the options, with an explanation of those that have not yet been discussed.
 * -array: (usage: "-array:nn" with nn a number) When working on several pages, make sure to have at least this number of pages the bot is working on, if possible. The default value is 100; when using -untranslatedonly or a similar option, you might want to set it lower.
 * -always: Always save the page, even if only one byte has changed (default: save the page only if at least one link has actually changed)
 * -askhints: Ask hints (see above)
 * -autonomous: Work in autonomous mode (see above)
 * -confirm: Always ask permission before changing a page.
 * -days: Work on the days
 * -file: (usage: "-file:filename") Specifies a file containing a list of pages to process. (Page names are specified as project:lang:pagename, and should encode non-ASCII as HTML numeric entities.  Converters.)
 * -force: When an interwiki link is to be removed, just do it, don't ask for permission
 * -hint: Give a hint (see above)
 * -name: Old option; equivalent to "-hint:all", but capitalizes the last word when trying on eo:. Might get deprecated.
 * -neverlink: Do not link to a specific language (see above)
 * -noauto: Do not use automatic translation (see above)
 * -nobacklink: Do not give a list of missing links on pages linked to
 * -nobell: Give no audio sign when asking for input.
 * -noredirect: If the bot finds a page linked to is a redirect, ist is skipped (normal behaviour: It follows the redirect)
 * -noshownew: Do not show new links found
 * -number: (usage"-number:nn" with nn a number) In combination with -start, checks only the first nn pages rather than the whole wiki.
 * -same: Old option; equivalent to "-hint:all"; might get deprecated
 * -showpage: When using -askhints or some such option, always show the page text, even if not prompted.
 * -skipfile: (usage "-skipfile:filename") On a run using -start, do not do the pages in the file start
 * -untranslated: Ask hints for untranslated pages (see above)
 * -untranslatedonly: Ask hints for untranslated pages (see above)
 * -warnfile: Use the logfile for pages and hints (see above)
 * -wiktionary: Special wiktionary options (see above)
 * -years: Work on the years