Requests for comment/New hook: ParserBeforePreprocess

From mediawiki.org
Request for comment (RFC)
New hook: ParserBeforePreprocess
Component General
Creation date
Author(s) Van de Bugger
Document status declined

Change abandoned. A ParserBeforePreprocess hook, with slightly different syntax, was introduced in 1.35.

Proposal[edit]

A new hook: ParserBeforePreprocess. Called before preprocessing text (Parser.php, line ~2803):

function preprocessToDom( $text, $flags = 0 ) {
    wfRunHooks( 'ParserBeforePreprocess', array( $this, &$text, $flags ) );
    $dom = $this->getPreprocessor()->preprocessToObj( $text, $flags );
    return $dom;
}

Rationale[edit]

Existing hook ParserBeforeInternalParse is advertised as a way to implement custom preprocessors:

Replaces the normal processing of stripped wiki text with custom processing. Used primarily to support alternatives (rather than additions) to the core MediaWiki markup syntax.

But it does not completely meet the goal, because it is not called to preprocess template source. For example, on page:

{{ Some template }}

hook ParserBeforeInternalParse is called 3 times:

  1. On original page source.
  2. On result of template.
  3. On message "This page was accessed x times."

But is is not called on template source, so it cannot be used to implement custom preprocessing. At least preprocessing, which effective in both page and template sources.

I failed to find a hook which allows custom preprocessing. This is the reason for proposing this one.

Background[edit]

I want preprocessor to recognize <dws/> tag and discards the tag itself and whitespace after it.

The primary purpose — better formatting for template code. Example:

*   Some introductory text <dws/>
    {{  template1 
        | param1=value1 
        | ... 
    }} <dws/>
    continuation <dws/>
    {{  template2 
        | param1=value1 
        | ... 
    }} <dws/>
    and, finally, finish.

Without the <dws/> tag it must be formatted as:

*   Some introductory text {{ template1 
        | param1=value1 
        | ... 
    }} continuation {{ template2 
        | param1=value1 
        | ... 
    }} and, finally, finish.

This is a simple example. In more complex templates with nested parser functions (#if, #loop, etc), the importance of good formatting increases.

Obviously, such a tag cannot be implemented as extension, because it affects not just the tag itself, but also the text after the tag.

My original implementation as a patch for parser (actually, preprocessor) was rejected because it is always dangerous to touch preprocessor. I am ok with it if there is another way to reach the goal. It seems the only way left is the proposed hook.

See also[edit]