Manual:Pywikibot/weblinkchecker.py

Overview
weblinkchecker.py is a script from the Python Wikipedia Bot which finds broken external links.

weblinkchecker.py can check the following:
 * All URLs found on a single article
 * All articles in a category
 * All articles in one or more namespaces
 * All articles on the wiki
 * And much more! Check the list of command-line arguments.

It will only check HTTP and HTTPS links, and it will leave out URLs inside comments and nowiki tags. To speed itself up, it will check up to 50 links at the same time, using multithreading.

The bot will not remove external links by itself, it will only report them; removal would require strong artificial intelligence. It will only report dead links if they have been found unresponsive at least twice, with a default period of at least one week of waiting between the first and the last time. This should help prevent users from removing links due to temporary server failure. Please keep in mind that the bot cannot differentiate between local failures and a server failures, so make sure you're on a stable Internet connection.

The bot will save a history of unavailable links to a  the   subdirectory, e.g.  . This file is not intended to be read or modified by humans. The dat file will be written when the bot terminates (because it is done or the user pressed CTRL-C). After a second run (with an appropriate wait between the two), a human-readable list of broken links will be generated as a  file.

Usage
''Speculation. If someone is familiar with the technical details, please update this section.''

To check for dead links for the first time for all pages on the wiki:

This will add an entry into the .dat file, with a date. If you run this line again, it will add any new dead links that are not already list, or it will remove any existing entries that are now working.

After the bot has checked some pages, run it on these pages again at a later time. This can be done with this command:

If the bot finds a broken link that has been broken for at least one week, it will log it in a text file, e.g. . The written text has a format that is suitable for posting it on the wiki, so that others can help you to fix or remove the broken links from the wiki pages.

Additionally, it's possible to report broken links to the talk page of the article in which the URL was found (again, only once the linked page has been unavailable at least twice in at least one week). To use this feature, set report_dead_links_on_talk = True in your user-config.py.

Reports will include a link to the Internet Archive Wayback Machine if available, so that important references can be kept.

Syntax examples
python weblinkchecker.py -start:!
 * Loads all wiki pages in alphabetical order using the Special:Allpages feature.

python weblinkchecker.py -start:Example_page
 * Loads all wiki pages using the Special:Allpages feature, starting at "Example page"

python weblinkchecker.py -weblink:www.example.org
 * Loads all wiki pages that link to www.example.org

python weblinkchecker.py Example page
 * Only checks links found in the wiki page "Example page"

python weblinkchecker.py -repeat
 * Loads all wiki pages where dead links were found during a prior run

Command-line arguments
the following list was extracted from the bot's help (using ). It is in addition to the global arguments used by most bots. Most of the arguments on the list were not verified, and it should probably be re-arranged in a more logical order.

All other arguments will be regarded as part of the title of a single page, and the bot will only work on that single page.

Configuration variables
The following config variables (to be declared in user-config.py) are supported by this script: