Extension:PageCrossReference

From MediaWiki.org
Jump to navigation Jump to search
MediaWiki extensions manual
OOjs UI icon advanced.svg
PageCrossReference
Release status: unmaintained
Implementation Parser extension, Link markup
Description On onArticleSaveComplete parse the page for other page titles in the wiki and turn that raw text into an internal link.
Author(s) (Jdooleytalk)
Latest version 0.62
MediaWiki 1.20+
Database changes No
License GNU General Public License 3.0
Download github.com[1]
Parameters
See The Configuration Parameters Section
Hooks used
ArticleSaveComplete
Translate the PageCrossReference extension if it is available at translatewiki.net
Check usage and version matrix.

The PageCrossReference extension searches an article on a Major Edit save for words that match page_titles within the article's namespace. If a match is found the words are changed into an Internal Link and the next page_title is searched for. There is no distinction between ' ' and '_' separating words.

Text found in the following conditions are ignored:

  • The article's own page_title.
  • page_titles in other page_namespaces.
  • Subpages
  • Link tags
  • Http tags
  • Text between "nowiki" tags
  • Text between "pre" tags
  • Text between "angled" braces

Installation[edit]

  • Download, extract and place the file(s) in a directory called PageCrossReference in your extensions/ folder.
  • Add the following code at the bottom of your LocalSettings.php:
    require_once "$IP/extensions/PageCrossReference/PageCrossReference.php";
    // This example adds the page "Black Page" to the BlackList.
    $wgPageCrossReferenceBlackList = array( 'Black_Page' );
    
  • Configure as required.
  • Yes Done – Navigate to Special:Version on your wiki to verify that the extension is successfully installed.

Configuration[edit]

PageCrossReference Parameters
Parameter Default Description
$wgPageCrossReferenceMinimumWordLength 2 Choose titles with these or more words
$wgPageCrossReferenceSkipHeaders false Headers, true = ignore, false = search
$wgPageCrossReferenceBlackList null Never choose these titles if found, assume underscores for spaces

Operation[edit]

PageCrossReference executes on onArticleSaveComplete then checks for a true $revision and false $minoredit before proceeding to parseArticle.

On parseArticle $wgPageCrossReferenceLoop is checked because doEdit invokes onArticleSaveComplete, thus running PageCrossReference twice, so PageCrossReference runs once per article. The article wiki-text is parsed into an array. The page_title foreach loop runs once per page_title, invoking parseContent and passing the text_article_array by reference.

On parseContent each array element is assessed for an existing page_title internal link or a potential one. If an internal link is found or made the for loop ends.

PageCrossReference cycles through EVERY page_title in the page table in a page_namespace! Ten thousand page_titles means the for loop in parseContent runs 10,000 times. This makes PageCrossReference a memory and CPU hog. The hog mass will vary per site. A test with 32 page_titles on a 1.5 MB article took an extra second to save.

Known Bugs[edit]

001 Raw page_title followed by internal link. For example, suppose "Main Page" is on line seven and
    "[[Main Page]]" is on line 23. PageCrossReference will convert "Main Page" on line seven to
    "[[Main Page]]" and end the search for "Main Page" while leaving two internal links. This
    violates the OneHit principle but a fix, searching through every article twice, would double the
    work for a rare circumstance. Bug untouched.

002 Subpage. The "/" in a subpage page_title fouls the parser and causes confusion with ratios in
    multi-worded page_titles. The solution is removing subpage searches from PageCrossReference
    and words around "/". Bug fixed.

003 Article page_title intersection across namespaces. The Select query filters out the article
    page_title, therefore articles with the same page_title in other namespaces are ignored. Left
    for the author to edit. Bug untouched.

004 File Upload. File Upload failed in version 0.6. Added Namespace Validator. Bug fixed.

005 page_title page_title. With page_titles in sequence the space between was lost. Bug fixed.