Manual:Pywikibot/weblinkchecker.py

In other languages: it fr

Overview
weblinkchecker.py is a script from the Python Wikipedia Bot which finds broken external links.

weblinkchecker.py can either check all URLs found on a single article, all articles in a category, or in all articles on the wiki. It will only check HTTP and HTTPS links, and it will leave out URLs inside comments and nowiki tags. To speed itself up, it will check up to 50 links at the same time, using multithreading.

The bot will not remove external links by itself, it will only report them; removal would require strong artificial intelligence. It will only report dead links if they have been found unresponsive at least twice, with at least one week of waiting between the first and the last time. This should help prevent users from removing links due to temporary server failure. Please keep in mind that the bot cannot differentiate between local failures and a server failures, so make sure you're on a stable Internet connection.

The bot will save a history of broken links to a  the   subdirectory, e.g.  . This file is not intended to be read or modified by humans. The dat file will be written when the bot terminates (because it is done or the user pressed CTRL-C).

Usage
''Speculation. If someone is familiar with the technical details, please update this section.''

To check for dead links for the first time for all pages on the wiki:

This will add an entry into the .dat file, with a date. If you run this line again, it will add any new dead links that are not already list, or it will remove any existing entries that are now working.

After the bot has checked some pages, run it on these pages again at a later time. This can be done with this command:

If the bot finds a broken link that has been broken for at least one week, it will log it in a text file, e.g. . The written text has a format that is suitable for posting it on the wiki, so that others can help you to fix or remove the broken links from the wiki pages.

Additionally, it's possible to report broken links to the talk page of the article in which the URL was found (again, only once the linked page has been unavailable at least twice in at least one week). To use this feature, set report_dead_links_on_talk = True in your user-config.py.

Reports will include a link to the Internet Archive Wayback Machine if available, so that important references can be kept.

Syntax examples
python weblinkchecker.py -start:!
 * Loads all wiki pages in alphabetical order using the Special:Allpages feature.

python weblinkchecker.py -start:Example_page
 * Loads all wiki pages using the Special:Allpages feature, starting at "Example page"

python weblinkchecker.py -weblink:www.example.org
 * Loads all wiki pages that link to www.example.org

python weblinkchecker.py Example page
 * Only checks links found in the wiki page "Example page"

python weblinkchecker.py -repeat
 * Loads all wiki pages where dead links were found during a prior run