Manual talk:Hooks/ParserBeforeStrip

From mediawiki.org

Edit page[edit]

Note that this hook is called on the edit page (and possibly elsewhere) with the contents of the disclaimer under the edit box. If you use this function to add content to the page, which shouldn't be added in these circumstances, you should check what action was used to view the page, and only add the extra content if it is not being edited. I will add more info about this once I have figured it out. --HappyDog 04:24, 17 June 2006 (UTC)Reply

One way to check this out is to check the global $action variable, since it is defined to be the current action. Make sure you setup will ignore pages where $action === 'edit' and you should be good in that respect. I've also ignored the history pages with $action === 'history' and the Special and Help pages with (strpos($parser->mTitle->mPrefixedText, "Special:") === false) and the like. I've still got some more tweaking to do, but otherwise it works out pretty nicely. -- Chad.burrus 00:34, 20 December 2006 (UTC)Reply
You should not be accessing the private class variables (begininning with little 'm') directly - there should be wrapper functions to get these for you (I'm guessing $parser->getTitle() and $title->getPrefixedText()...). Also, you should check the namespace using the numeric namespace ID, rather than the hard-coded name, otherwise your page will only work on English wikis. --HappyDog 01:24, 8 January 2007 (UTC)Reply
To get the numeric ID of namespaces here, use $parser->getTitle()->getNamespace(). --Mlogic 23:11 , 17 April 2008 (UTC)
Multiple runs of the parser hook can happen at any time, not just during editing. For example, the galery feature will run its own parsing within one page, and many extensions might do the same if they hook to certain <tags> (in which case they get raw wiki text that they need to parse on their own). I do not know yet of any way to distinguish those parses other than by checking their content. The parser objects look just the same, I think, though maybe the parser options could sometimes give some hint. But filing a proper bug might be more useful to really solve the issue. Idea: the parser or its parser options contain a property that states their purpose from a set of predefined constants, and parsing the main article text is just one value which is not the default. Two other relevant cases might be: parsing a message (e.g. the disclaimer) and parsing inline pieces of text (e.g. the galery, and many extensions). --Markus Krötzsch 17:38, 23 March 2007 (UTC)Reply
The following code seems to work:
$title = $parser->getTitle();
$article = new Article($title);
if ($article->getContent() != $text)
  return;
--HappyDog 06:10, 8 November 2007 (UTC)Reply
Doesn't work under all circumstances. It doesn't work after saving a page before the first page purge because you get the text from the previous version which is different from the new edited text. So the code is useless. Any better ideas? --Danwe 23:45, 23 October 2009 (UTC)Reply
checking for $parser->mOptions->getEnableLimitReport() works so far. Looks like this this is only set when the main article text is in the parser. --Danwe 18:38, 27 February 2010 (UTC)Reply
it acts similarly to the method above due to caching. i still need to purge the cache to make any use of getEnableLimitReport(). By disabling page cache and reducing parser cache time one can "solve" this, but it it still doesn't work right after saving a page. --Piksi 19:07, 4 March 2010 (UTC)Reply

Sanity Check[edit]

This hook runs twice (possibly more). To prevent this from happening, just add the following to fnMyHook:

static $hasRun = false;
if ($hasRun) return;
$hasRun = true;

-- Egingell 13:29, 16 August 2007 (UTC)Reply

I believe this happens because certain user interface messages go through the parser. Therefore it is run once for the page content, and then however many more times for various UI messages. I am not 100% sure about this however, so perhaps a bit more investigation is required. Assuming I am right, then the above code will be useful for extensions where it is important that they only run once per page view, but for most other situations the standard behaviour will cause no problems. --HappyDog 15:03, 27 August 2007 (UTC)Reply


ParserBeforeStrip hook is not always called before strip[edit]

I'm using the ParserBeforeStrip hook in a math extension (JsMath) to strip any <math> tags before the parser strips them. The problem I am having is that the parser sometimes calls the strip command without using this hook. For example, in the function Parser::braceSubstitution() there is

$text = $this->strip( $text, $this->mStripState );

instead of

wfRunHooks( 'ParserBeforeStrip', array( &$this, &$text, &$this->mStripState ) );
$text = $this->strip( $text, $this->mStripState );
wfRunHooks( 'ParserAfterStrip', array( &$this, &$text, &$this->mStripState ) );

This means that content that is transcluded to another page is not subjected to the ParserBeforeStrip hook before stripping occurs. Is there a reason for the parser not to use this hook in these cases? — Tommy Ekola 09:29, 5 February 2008 (UTC)Reply

How to get the hook function code to only apply to article content, and not to article title[edit]

Can anyone offer advice on how to get the hook function code to only apply to article content, and not to article title? I ran this code:

static $hasRun = 0;
$hasRun++;
$text=$hasRun;

You'll find that when you use that (at least with v1.16), it will make both the page title and content equal to 2. If it were different numbers, I could program it to differentiate between the two. Thanks, Tisane 09:17, 16 April 2010 (UTC)Reply

You definitely should read the sections above. This has already been discussed there and there is no perfect final solution due to caching. I consider this hook buggy and a flaw in design in its current state. --Danwe 19:50, 16 April 2010 (UTC)Reply

Won't work with transcluded pages[edit]

I wrote a function to rewrite article content using the ParserBeforeStrip hook, but it won't work with Transclusion, e.g. {{:Article Name}}. But ParserBeforeStrip seems to be the perfect hook for me since I can change text to wiki markup and it works on preview. How can I fix that? It feels like a bug actually, because transcluded content should look 100% the same as on the page itself. --Subfader 16:14, 21 June 2010 (UTC)Reply

Nevermind, solved by adding a new hook. --Subfader 16:17, 23 June 2010 (UTC)Reply
What hook is that?
I did this through the OutputPageBeforeHTML hook - not sure if it's the best way to do it, but it works for my basic parser. --16:20, 18 April 2011

Detect Parse of Article Content / Hack removal[edit]

The following example was removed from this page. It's here for reference now. It's really a hack, not a proper use of the hook, and there are edge cases where this practice may break for the extension.

Note that this hook is not run on cached page content. To test it on page content try an edit preview. To ensure you are only acting on page content and not other items, one method is the following (taken from Extension:AutoCategoryInclude):

$action = $wgRequest->getVal('action','view');
 
// the templates will be included solely when the page is viewed, and only to the page wikitext, not during the other parserhook runs.
// the method of determining which run is handling the article wikitext is shaky but seems to work
if (($action == ('view' || 'print' || 'purge' || null)) && ($parser->getOptions()->getEnableLimitReport())) {
  // ... your code here ...
}

Note that this hook is generally called more than one time during page rendering. Therefore, you may want to use code such as this to ensure that it only runs once:

static $hasRun = false;
if ($hasRun) return true;
$hasRun = true;
Dantman 05:37, 28 September 2011 (UTC)Reply
Do you have any better suggestion as any of the above how to determine whether the hook was called from the rendering of the current articles main part or not? As long as there is none, we have to stick with hacks I am afraid. I guess this lays within the whole nature of the MW parser which looks like a hack in itself in some aspects --Danwe 15:16, 28 September 2011 (UTC)Reply
Dantman mentioned that better metadata should solve this issue. Maybe something in $parser->getOptions(). I'll open a feature request in bugzilla for it.--Quadir 15:20, 28 September 2011 (UTC)Reply
Sounds good, I think it's about time to get rid of these hacks in popular extensions. Could you please post the link to the feature request here? --Danwe 20:26, 28 September 2011 (UTC)Reply
We do have a getInterfaceMessage on ParserOptions. trunk now sets it on all wfMsgWikiHtml calls, but not on the wfMsg calls that get sent to transform instead of parse (previously wfMsgWikiHtml abused $wgOut->parse). We can track down someone who understands that and enquire why all system messages don't have interface set on, and perhaps get that set consistently, then we can rely on that. If it turns out there's a reason that interface messages that aren't the main page body do have that off then we can add some sort of "is this a primary article being parsed" metadata. Btw, now that I think about it, we already have a bit of metadata to tell if the current parse is a preview parse (ie: even right now I think that wgRequest hack is unnecessary). Dantman 23:07, 28 September 2011 (UTC)Reply
Not all system messages are interface messages. It has more to do with if the userlanguage is used and stuff like that rather then if its "page" content vs something not page content. Personally (in my mostly uninformed opinion off the top of my head, so i may be saying something stupid) I don't think parser should really differentiate between page "content" and other parsed content. Instead we should perhaps have a new hook that lives outside of the Parser, that gets called on the text before/after parsing it, just for the revision text (could be named RevisionBeforeParse, RevisionAfterParse or something). Bawolff 23:45, 28 September 2011 (UTC)Reply
Oh right, we do have Manual:Hooks/ArticleAfterFetchContent. Dantman 23:49, 28 September 2011 (UTC)Reply
One problem with getInterfaceMessage is that it's set for the title IIRC (could be wrong today, but that's what I remember from the dumps yesterday. We have ArticleAfterFetchContent but nothing for before the parse... which is for example where I and the other extension (that this hack is from) need it.--Quadir 03:38, 29 September 2011 (UTC)Reply
Uhm... ArticleAfterFetchContent IS before the parse... parsing hasn't even been started, that's the raw WikiText which then gets sent to the parser. Dantman 05:25, 29 September 2011 (UTC)Reply
My problems with ArticleAfterFetchContent: It will modify the text ending up in the edit field! Also, it doesn't change the situation that I can't decide which call to the hook is the one for the page requested by the user which will be displayed in the browser. I only want to add some wikitext to that page and not to all templates and whatever fetched along the way. --Danwe 19:43, 29 September 2011 (UTC)Reply
In ArticleAfterFetchContent you can use $article->getContext()->getRequest() to then check the getVal('action','view') and see if it's an edit, purge, view, or whatnot. Please note this is new, you should use the global $wgRequest if the getContext() is not available.--Quadir 19:47, 29 September 2011 (UTC)Reply