Extension:BrokenLinks

From MediaWiki.org
Jump to: navigation, search
MediaWiki extensions manual - list
Crystal Clear action run.png
BrokenLinks

Release status: stable

Implementation Special page
Description Special Page which checks all links in table _externallinks and reports on those that return an HTTP response 4xx or 5xx
Author(s) Gary Thompson (sushigurutalk)
Latest version 0.1.0 (09/06/2009)
MediaWiki 1.14.0
License GPL
Download No link
Parameters

$wgUseAjax=true;

Added rights

restrict to sysops

Translate the BrokenLinks extension if possible

Check usage and version matrix; code metrics

This special page was written for a small wiki (150 articles) which has a small editing team. We wanted a quick(er) way to check external links throughout the article without having to visit each article to do so. This was written to help our administration, just thought someone else somewhere might want the same thing.

The page will collect URLs from table Externallinks_table and check each URL in turn for a successful server response. It then reports those links, and the page(s) they appear on, which fail to respond.

If you're looking for a tool to use on a large wiki, try: w:en:User:Dispenser/Link_checker

News[edit | edit source]

  • 2009-06-09: Released version 0.1

Compatibility[edit | edit source]

Tested on our release of MediaWiki 1.14.0 only.

On MediaWiki 1.18.1 you have to make a little change in BrokenLinks.php (about line 59)

Instead of

 $html = new OutputPage(); #this is going to be our object to add html to; allows us to use the nice OutputPage functionality

you have to write

 $context = RequestContext::getMain();
 $html = $context->getOutput(); #this is going to be our object to add html to; allows us to use the nice OutputPage functionality

Otherwise you will get this error:

Catchable fatal error: Argument 1 passed to ContextSource::setContext() must implement interface IContextSource, null given, called in /path/to/your/wiki/includes/OutputPage.php on line 228 and defined in /path/to/your/wiki/includes/RequestContext.php on line 348

Usage[edit | edit source]

Install, as per the instructions below, then go to your wiki/Special:BrokenLinks. Select the number of errors you wish to report on and click the button.

Since the script simply steps through each URL in turn the response may take some time, as each URL will have a timeout to observe.

Download instructions[edit | edit source]

Please cut and paste the code found below and place it in $IP/extensions/BrokenLinks/.

  • Note #1: $IP stands for the root directory of your MediaWiki installation, the same directory that holds LocalSettings.php.
  • Note #2: You must provide a link to an Ajax loading image in the .js file that you create (BrokenLinks.js) in order for this extension to work. For example: var ajax_loader = '<img src="/images/ajax-loader.gif" alt="ajax loader image">';

Installation[edit | edit source]

To install this extension, add the following to LocalSettings.php:

require_once( "$IP/extensions/BrokenLinks/BrokenLinks.php" );
$wgUseAjax = true;

Code[edit | edit source]

extensions/BrokenLinks/BrokenLinks.php[edit | edit source]

<?php
/*
 * Main file for the BrokenLinks extension of MediaWiki.
 * This code is released under the GNU General Public License.
 *
 * Purpose:
 * Special Page which checks all links in table _externallinks and reports on those that return
 * an HTTP response 4xx or 5xx
 
 * Usage:
 * require_once("extensions/BrokenLinks/BrokenLinks.php"); in LocalSettings.php
 * 
 * @package MediaWiki
 * @link http://www.mediawiki.org/wiki/Extension:DynamicPageList   Documentation
 * @license http://opensource.org/licenses/gpl-license.php GNU Public License
 * @version 0.1.0
 * Inital release
*/
 
/*
 * Register the extension with MediaWiki
*/
 
# Alert the user that this is not a valid entry point to MediaWiki if they try to access the special pages file directly.
if (!defined('MEDIAWIKI')) {
   print"To install my extension, put the following line in LocalSettings.php:<br />\n";
   print'require_once( "$IP/extensions/BrokenLinks/BrokenLinks.php" );<br />';
   print'Also ensure that $wgUseAjax=true; is added to LocalSettings.php to enable AJAX support.';
   exit( 1 );
}
 
$wgAjaxExportList[] = 'getBrokenLinks';
 
$wgExtensionCredits['specialpage'][] = array(
 'name' => 'BrokenLinks',
 'author' => 'Gary Thompson, University of St Andrews',
 'url' => 'http://www.mediawiki.org/wiki/Extension:BrokenLinks',
 'description' => 'Create a Special Page which checks all links in table _externallinks and reports on those that return an HTTP response 4xx or 5xx',
 'descriptionmsg' => 'Create a Special Page which checks all links in table _externallinks and reports on those that return an HTTP response 4xx or 5xx',
 'version' => '0.0.1'
);
 
$wgAutoloadClasses['BrokenLinks'] = dirname(__FILE__) . '/BrokenLinks.body.php';	# Tell MediaWiki to load the extension body.
$wgExtensionMessagesFiles['BrokenLinks'] = dirname(__FILE__) . '/BrokenLinks.i18n.php';	# (non) international settings
$wgSpecialPages['BrokenLinks'] = 'BrokenLinks'; 								# Let MediaWiki know we exist

function getBrokenLinks($error_lim){
 
	$fails = array(
          400=>'Bad Request',401=>'Unauthorized',402=>'Payment Required',404=>'Not Found',405=>'Method Not Allowed',406=>'Not Acceptable',
          407=>'Proxy Authentication Required',408=>'Request Timeout',409=>'Conflict',410=>'Gone',411=>'Length Required',412=>'Precondition Failed',
          413=>'Request Entity Too Large',414=>'Request-URI Too Long',415=>'Unsupported Media Type',416=>'Requested Range Not Satisfiable',
          417=>'Expectation Failed',500=>'Internal Server Error',501=>'Not Implemented',502=>'Bad Gateway',503=>'Service Unavailable',
          504=>'Gateway Timeout', 505=>'HTTP Version Not Supported'
          ); #list of server responses likely to mean we can't get through at all, leading to upset users.
	
	$allowable_protocols = array('http','https'); #all we really care about

	$html = new OutputPage(); #this is going to be our object to add html to; allows us to use the nice OutputPage functionality

	$dbr = wfGetDB( DB_SLAVE ); # create an instance to the database - read only
	$page = $dbr->tableName( 'page' );
	$externallinks = $dbr->tableName( 'externallinks' );
 
	$sql = "SELECT count(*) AS max_links FROM $externallinks";
	$res = $dbr->query( $sql );
	$row = $dbr->fetchRow( $res );
	$max_links = $row['max_links'];
 
	$error_limit = (!$error_lim) ? $max_links : $error_lim; #set the theoretical upper limit of links to check
	
	$html->addHTML("<h3>Error limit set at: $error_limit</h3>");
 
	$sql = 	"SELECT page_namespace AS namespace, page_title AS title, el_to AS url
		FROM $page,	$externallinks
		WHERE page_id=el_from
		GROUP BY el_to";
 
	$res = $dbr->query( $sql ); # run the SQL query

	$error_count = 0;
 
	$html->addHTML('<ol>');
 
	while ( $row = $dbr->fetchObject( $res ) ) {
		if($error_count >= $error_limit){
			break;	
		}
		$url = $row->url;
		$title = $row->title;
		$t = Title::newFromText($title); #get article object to play with

		# check to see if we can access the file at this URL
		$URLInfo = array();
		$url_parsed = true; #setup for a fail...
		$this_protocol = explode('://',$url);
		if(in_array(strtolower($this_protocol[0]),$allowable_protocols)){
			$URLInfo = @parse_url($url) or $url_parsed = false;
			if($url_parsed==false){ #FAIL!
				$html->addHTML("<li>Can't parse URL - this is a serious FAIL.  Probably a very badly formed URL with a typo."); #don't die - just raise the message
				$html->addHTML($html->addWikiText("[" . $t->getFullURL( 'action=edit' ) . " $title] has the url $url") . "</li>");
				$error_count++;
			}else{
				$host = $URLInfo['host'];
				$DocumentPath = (isset($URLInfo['path'])) ? $URLInfo['path'] : "/";
				if (isset($URLInfo['query'])){
					$DocumentPath = $DocumentPath."?".$URLInfo['query'];
				}
				$conn = @fsockopen($host, 80, $errno, $errstr, 1.0); 
				if ($conn){ 
					fwrite ($conn, "HEAD ".$DocumentPath." HTTP/1.0\r\nHost: $host\r\n\r\n"); 
					$response= fgets($conn,13);
					$status = substr($response,-3);
					if (@array_key_exists($status,$fails)) { #FAIL!
						$html->addHTML("<li>ERROR::{$fails[$status]}");
						$html->addHTML($html->addWikiText("[" . $t->getFullURL( 'action=edit' ) . " $title] has the url $url") . "</li>");
						$error_count++;
					}
					fclose($conn); 
				}else{ #FAIL!
					$html->addHTML("<li>ERROR::Cannot Connect.  Not even a little bit.");
					$html->addHTML($html->addWikiText("[" . $t->getFullURL( 'action=edit' ) . " $title] has the url $url") . "</li>");
					$error_count++;
				}
			}
		}
	}
	$html->addHTML('</ol>');
	$dbr->freeResult( $res );
	# extract our lovely formatted HTML from the html object and send it back to the AJAX request.
	return $html->getHTML();
}

extensions/BrokenLinks/BrokenLinks.body.php[edit | edit source]

<?php
class BrokenLinks extends SpecialPage {
 
        function __construct() {
 
                parent::__construct( 'BrokenLinks', 'editinterface' );
                wfLoadExtensionMessages('BrokenLinks');
 
        }
 
        function execute( $par ) {
 
                global $wgRequest, $wgOut, $wgUseAjax, $wgJsMimeType, $wgScriptPath, $wgUser;
				if ( !$this->userCanExecute($wgUser) ) {
                $this->displayRestrictionError();
                return;
				}else{
                if (!$wgUseAjax) {
                        $wgOut->addWikiText('wfAjaxlink: $wgUseAjax is not enabled, aborting extension setup.');
                        return;
                }else{
                        $wgOut->addScript("<script type=\"{$wgJsMimeType}\" src=\"{$wgScriptPath}/extensions/BrokenLinks/BrokenLinks.js\"></script>\n" );
                }
 
                $this->setHeaders();
 
                # setup some limits for users 
                 $opts = '';
                $limits = array(1=>'1',5=>'5',10=>'10',20=>'20',50=>'50',100=>'100',0=>'Show All');
                foreach($limits as $key=>$val){
 
                        $opts .= "<option value=\"$key\">$val</option>\n";
 
                }
                $select = "<select name=\"limit\">$opts</select>";
 
                #setup the form for them to use
         $wgOut->addHTML("<form method=\"post\" action=\"\">");
                $wgOut->addHTML("<p>Limit number of errors: $select<input type=\"button\" value=\"Get Results\" onclick=\"getLinks(limit.value);\"></p>");
                $wgOut->addHTML("<p>Be aware that any more than 10 and you'll need to go pop the kettle on.</p>");
                $wgOut->addHTML("</form>");
                $wgOut->addHTML("<div id=\"divBrokenLinks\"></div>");
				}
        }
}

extensions/BrokenLinks/BrokenLinks.i18n.php[edit | edit source]

<?php
$messages = array();
 
$messages['en'] = array( 
	'brokenlinks' => 'Broken Links',
);

extensions/BrokenLinks/BrokenLinks.js[edit | edit source]

function getLinks(lim){
 
	var ajax_loader = '<img src="'''path to your ajax progress bar image'''" alt="ajax loader image">';
 
	// show a nice ajax loader to show *something* is happening (due to server responses, this page can take a while to load
	document.getElementById('divBrokenLinks').innerHTML = ajax_loader;
 
	//now initialise the ajax call, sending response to divBrokenLinks
	sajax_do_call( "getBrokenLinks", [lim] , document.getElementById('divBrokenLinks'));
 
	return;
 
}

That's the lot.

See also[edit | edit source]