MediaWiki r45588 - Code Review

Jump to: navigation, search
Repository:MediaWiki
Revision:r45587‎ | r45588 (on ViewVC)‎ | r45589 >
Date:23:59, 8 January 2009
Author:simetrical
Status:reverted (Comments)
Tags:
Comment:
Reduce code duplication correctly this time, again

The test cases I thought up are at:

http://www.mediawiki.org/wiki/User:Simetrical/Id_tests

All of them pass with the patch, except for some that fail on current
code as well: the ones involving templates, multiply-occurring section
headers, or numeric id's (there seems to be a weird bug with those that
probably involves string and numeric id's being used in the same array).
This is true whether $wgEnforceHtmlIds is on or off. (Actually, the
problem with numeric keys doesn't happen with $wgEnforceHtmlIds off,
because of course numeric ids aren't allowed then.)
Modified paths:

Diff [purge]

Index: trunk/phase3/includes/parser/Parser.php
===================================================================
--- trunk/phase3/includes/parser/Parser.php	(revision 45587)
+++ trunk/phase3/includes/parser/Parser.php	(revision 45588)
@@ -3448,7 +3448,7 @@
 	 * @private
 	 */
 	function formatHeadings( $text, $isMain=true ) {
-		global $wgMaxTocLevel, $wgContLang, $wgEnforceHtmlIds;
+		global $wgMaxTocLevel, $wgContLang;
 
 		$doNumberHeadings = $this->mOptions->getNumberHeadings();
 		$showEditLink = $this->mOptions->getEditSection();
@@ -3593,71 +3593,17 @@
 				}
 			}
 
-			# The safe header is a version of the header text safe to use for links
-			# Avoid insertion of weird stuff like <math> by expanding the relevant sections
-			$safeHeadline = $this->mStripState->unstripBoth( $headline );
+			list( $anchor, $legacyAnchor, $tocline, $headlineHint ) =
+				$this->processHeadingText( $headline );
 
-			# Remove link placeholders by the link text.
-			#     <!--LINK number-->
-			# turns into
-			#     link text with suffix
-			$safeHeadline = $this->replaceLinkHoldersText( $safeHeadline );
-
-			# Strip out HTML (other than plain <sup> and <sub>: bug 8393)
-			$tocline = preg_replace(
-				array( '#<(?!/?(sup|sub)).*?'.'>#', '#<(/?(sup|sub)).*?'.'>#' ),
-				array( '',                          '<$1>'),
-				$safeHeadline
-			);
-			$tocline = trim( $tocline );
-
-			# For the anchor, strip out HTML-y stuff period
-			$safeHeadline = preg_replace( '/<.*?'.'>/', '', $safeHeadline );
-			$safeHeadline = trim( $safeHeadline );
-
-			# Save headline for section edit hint before it's escaped
-			$headlineHint = $safeHeadline;
-
-			if ( $wgEnforceHtmlIds ) {
-				$legacyHeadline = false;
-				$safeHeadline = Sanitizer::escapeId( $safeHeadline,
-					'noninitial' );
-			} else {
-				# For reverse compatibility, provide an id that's
-				# HTML4-compatible, like we used to.
-				#
-				# It may be worth noting, academically, that it's possible for
-				# the legacy anchor to conflict with a non-legacy headline
-				# anchor on the page.  In this case likely the "correct" thing
-				# would be to either drop the legacy anchors or make sure
-				# they're numbered first.  However, this would require people
-				# to type in section names like "abc_.D7.93.D7.90.D7.A4"
-				# manually, so let's not bother worrying about it.
-				$legacyHeadline = Sanitizer::escapeId( $safeHeadline,
-					'noninitial' );
-				$safeHeadline = Sanitizer::escapeId( $safeHeadline, 'xml' );
-
-				if ( $legacyHeadline == $safeHeadline ) {
-					# No reason to have both (in fact, we can't)
-					$legacyHeadline = false;
-				} elseif ( $legacyHeadline != Sanitizer::escapeId(
-				$legacyHeadline, 'xml' ) ) {
-					# The legacy id is invalid XML.  We used to allow this, but
-					# there's no reason to do so anymore.  Backward
-					# compatibility will fail slightly in this case, but it's
-					# no big deal.
-					$legacyHeadline = false;
-				}
-			}
-
 			# HTML names must be case-insensitively unique (bug 10721).  FIXME:
 			# Does this apply to Unicode characters?  Because we aren't
 			# handling those here.
-			$arrayKey = strtolower( $safeHeadline );
-			if ( $legacyHeadline === false ) {
+			$arrayKey = strtolower( $anchor );
+			if ( $legacyAnchor === false ) {
 				$legacyArrayKey = false;
 			} else {
-				$legacyArrayKey = strtolower( $legacyHeadline );
+				$legacyArrayKey = strtolower( $legacyAnchor );
 			}
 
 			# count how many in assoc. array so we can track dupes in anchors
@@ -3679,12 +3625,10 @@
 			}
 
 			# Create the anchor for linking from the TOC to the section
-			$anchor = $safeHeadline;
-			$legacyAnchor = $legacyHeadline;
 			if ( $refers[$arrayKey] > 1 ) {
 				$anchor .= '_' . $refers[$arrayKey];
 			}
-			if ( $legacyHeadline !== false && $refers[$legacyArrayKey] > 1 ) {
+			if ( $legacyAnchor !== false && $refers[$legacyArrayKey] > 1 ) {
 				$legacyAnchor .= '_' . $refers[$legacyArrayKey];
 			}
 			if( $enoughToc && ( !isset($wgMaxTocLevel) || $toclevel<$wgMaxTocLevel ) ) {
@@ -3756,6 +3700,70 @@
 		}
 	}
 
+	private function processHeadingText( $headline ) {
+		global $wgEnforceHtmlIds;
+
+		# The safe header is a version of the header text safe to use for links
+		# Avoid insertion of weird stuff like <math> by expanding the relevant sections
+		$safeHeadline = $this->mStripState->unstripBoth( $headline );
+
+		# Remove link placeholders by the link text.
+		#     <!--LINK number-->
+		# turns into
+		#     link text with suffix
+		$safeHeadline = $this->replaceLinkHoldersText( $safeHeadline );
+
+		# Strip out HTML (other than plain <sup> and <sub>: bug 8393)
+		$tocline = preg_replace(
+			array( '#<(?!/?(sup|sub)).*?'.'>#', '#<(/?(sup|sub)).*?'.'>#' ),
+			array( '',                          '<$1>'),
+			$safeHeadline
+		);
+		$tocline = trim( $tocline );
+
+		# For the anchor, strip out HTML-y stuff period
+		$safeHeadline = preg_replace( '/<.*?'.'>/', '', $safeHeadline );
+		$safeHeadline = trim( $safeHeadline );
+
+		# Save headline for section edit hint before it's escaped
+		$headlineHint = $safeHeadline;
+
+		if ( $wgEnforceHtmlIds ) {
+			$legacyHeadline = false;
+			$safeHeadline = Sanitizer::escapeId( $safeHeadline,
+				'noninitial' );
+		} else {
+			# For reverse compatibility, provide an id that's
+			# HTML4-compatible, like we used to.
+			#
+			# It may be worth noting, academically, that it's possible for
+			# the legacy anchor to conflict with a non-legacy headline
+			# anchor on the page.  In this case likely the "correct" thing
+			# would be to either drop the legacy anchors or make sure
+			# they're numbered first.  However, this would require people
+			# to type in section names like "abc_.D7.93.D7.90.D7.A4"
+			# manually, so let's not bother worrying about it.
+			$legacyHeadline = Sanitizer::escapeId( $safeHeadline,
+				'noninitial' );
+			$safeHeadline = Sanitizer::escapeId( $safeHeadline, 'xml' );
+
+			if ( $legacyHeadline == $safeHeadline ) {
+				# No reason to have both (in fact, we can't)
+				$legacyHeadline = false;
+			} elseif ( $legacyHeadline != Sanitizer::escapeId(
+			$legacyHeadline, 'xml' ) ) {
+				# The legacy id is invalid XML.  We used to allow this, but
+				# there's no reason to do so anymore.  Backward
+				# compatibility will fail slightly in this case, but it's
+				# no big deal.
+				$legacyHeadline = false;
+			}
+		}
+
+		return array( $safeHeadline, $legacyHeadline, $tocline,
+			$headlineHint );
+	}
+
 	/**
 	 * Transform wiki markup when saving a page by doing \r\n -> \n
 	 * conversion, substitting signatures, {{subst:}} templates, etc.
@@ -4736,21 +4744,9 @@
 	 * "== Header ==".
 	 */
 	public function guessSectionNameFromWikiText( $text ) {
-		# Strip out wikitext links(they break the anchor)
 		$text = $this->stripSectionName( $text );
-		$headline = Sanitizer::decodeCharReferences( $text );
-		# strip out HTML
-		$headline = StringUtils::delimiterReplace( '<', '>', '', $headline );
-		$headline = trim( $headline );
-		$sectionanchor = '#' . urlencode( str_replace( ' ', '_', $headline ) );
-		$replacearray = array(
-			'%3A' => ':',
-			'%' => '.'
-		);
-		return str_replace(
-			array_keys( $replacearray ),
-			array_values( $replacearray ),
-			$sectionanchor );
+		list( $text, /* unneeded here */ ) = $this->processHeadingText( $text );
+		return "#$text";
 	}
 
 	/**

Follow-up revisions

Rev.Commit summaryAuthorDate
r45646* Reverting r45588, causes fatal errors when saving new sectionsnikerabbit17:16, 10 January 2009

Comments

#Comment by Raymond (Talk | contribs)   14:34, 9 January 2009

PHP Fatal error: Call to a member function unstripBoth() on a non-object in /var/www/w/includes/parser/Parser.php on line 3708

Seen at Betawiki. Unsure, if this revision is the culprit.

#Comment by Nikerabbit (Talk | contribs)   21:22, 9 January 2009

FYI backtrace:

  • Parser.php line 3708 calls wfBacktrace()
  • Parser.php line 4749 calls Parser::processHeadingText()
  • - line - calls Parser::guessSectionNameFromWikiText()
  • StubObject.php line 58 calls call_user_func_array()
  • StubObject.php line 76 calls StubObject::_call()
  • - line - calls StubObject::__call()
  • EditPage.php line 975 calls StubObject::guessSectionNameFromWikiText()
  • EditPage.php line 2392 calls EditPage::internalAttemptSave()
  • EditPage.php line 453 calls EditPage::attemptSave()
  • EditPage.php line 346 calls EditPage::edit()
  • Wiki.php line 508 calls EditPage::submit()
  • Wiki.php line 63 calls MediaWiki::performAction()
  • index.php line 114 calls MediaWiki::initialize()
#Comment by Raymond (Talk | contribs)   19:07, 10 January 2009

Reverted by Nikerabbit with r45588.

#Comment by Simetrical (Talk | contribs)   23:07, 10 January 2009

That's r45646, actually.

Status & tagging log

  • 20:22, 30 August 2011 ^demon (Talk | contribs) changed the tags for r45588 [removed: parser]
  • 14:18, 13 January 2009 Aaron Schulz (Talk | contribs) changed the status of r45588 [removed: fixme added: reverted]
  • 23:07, 10 January 2009 Simetrical (Talk | contribs) changed the status of r45588 [removed: reverted added: fixme]
  • 19:07, 10 January 2009 Raymond (Talk | contribs) changed the status of r45588 [removed: fixme added: reverted]
  • 14:34, 9 January 2009 Raymond (Talk | contribs) changed the tags for r45588 [added: parser]
  • 14:34, 9 January 2009 Raymond (Talk | contribs) changed the status of r45588 [removed: new added: fixme]
Personal tools
Namespaces
Variants
Views
Actions
Site
Support
Download
Development
Communication
Toolbox