Topic on Help talk:Extension:ParserFunctions

#sub,{{#sub:<nowiki>This is a </nowiki>test|1}}

10
Istudymw (talkcontribs)

This should return est!

Verdy p (talkcontribs)

No, the #sub: returns the substring starting at character 1 as specified here (no ending position is specified so this is until the end of the string). #sub is a parser function, so its parameters are NOT '''pre'''processed by Mediawiki, so it can contain any syntax needed, not necessarily MediaWiki or HTML, and it is also not stripped from leading/trailing whitespaces, because the parameter is not named. It is only processed by PHP

So #sub will return "<nowiki>This is a </nowiki>test" unchanged!

(the same would be true if you used a parserfunction call to a Lua module using #invoke).

Parserfunctions can do what ever they want for the parameters. and pass return a string in any format, which can further be used as an argument to call another parserfunction (or Lua module). Only the parserfunction itself can decide whever to strip leading/trailing whitespaces, HTML comment, or "nowiki" tags, At that level MediaWiki only processes pipes (|) to separate characters, and "noinclude" or "includeonly" tags).

Then, after the call, the return value will be processed by Mediawiki: at that time it will process "<nowiki>This is a </nowiki>test" for the rest of the expansion of the page. And then the "nowiki" tags will be considered by MediaWiki and will result into "This is a test", that will be displayed. The effect of "nowiki" tags does not remove the content, it just indicates that the content surrounded by this tag must not be parsed by MediaWiki, if it ever contains some wiki syntax (such as "~~~~" that it would otherwise replace by the user's signature.

You are most probably making a confusion with the "noinclude" tag.

RobinHood70 (talkcontribs)

On both wikis I tried it on, it does return "est". It won't work on Wikimedia wikis, though, as they've disabled the string functions on their wikis.

Verdy p (talkcontribs)

I don't know where you tested it; but clearly "<nowiki>This is a </nowiki>" should not be deleted at all.

However as this tag "nowiki" looks like an HTML tag, it may be stripped by using a function that drops HTML tags to make a plain-text only string, but that implementation would be bogous as well, because in that case it may transform "<span>This is a </span>test" into "test" (not really what HTML considers as the plaintext of the HTML content which should be "This is a test", e.g. when using the standard DOM API for HTML or XML) and certainly not "est" (which makes no sense at all, in HTML, XML, or in MediaWiki!): why do you want to drop an extra character AFTER the closing tag?

So this is likely a bug of the "#sub" parser function implementation (whichs is, as you said, part of the "string functions", which is not enabled on Wikimedia wikis). I tested that "#sub" parser function on other wikis where string functions are enabled, and the "t" after the closing tag is NOT removed. Those wikis that do that may have not been updated with the correct version of string functions to fix that very undesirable bug (their internal code to do HTML tag stripping has a problem, such as a bogous regular expression)

On which wiki do you see that result? Which version of the "string functions" do that use (look at their Special:Version page)?

---

I found a wiki that has that bogous behavior in #sub: Translatewiki.net, which uses incorrect HTML-stripping code that really strips too much where it should return either "This is a test" (if it knows and assumes the semantic of MediaWiki "nowiki" tags), or "test" (if it strips all the "nowiki" element with its content, the same way it would string "ABC<script>...</script>DEF" into "ABCDEF" and not "ABCEF").

RobinHood70 (talkcontribs)

I tested it on my test bed wikis, which are mostly just past the setup point and nothing more. I specifically tested on both 1.29 and 1.35, since I figured Parsoid might make a difference (not that it should, since this is at the pre-processor level, but I figured it was a good idea to try both). I haven't updated it in a while, so I don't have anything more recent installed yet.

I'm not sure I follow your logic on what should be stripped, because you would think that stripping one character would either strip the < off of the <nowiki> or, if it had parsed that properly, it would strip the T from This, not the t from test. I'm assuming that's an artifact of the strip item process, though, so the nowiki section gets ignored entirely.

RobinHood70 (talkcontribs)

Oh and to answer your question about versions, both wikis have ParserFunctions 1.6.0.

Verdy p (talkcontribs)

Note that Parsoid has no effect on that. This is purely a bug inside the implementation of the "string functions" extension (that is not supported directly by Wikimedia wikis and core MediaWiki developers). Instead, Wikimedia uses the supported Scribunto extension and implements these functions in Lua (but not that Translatewiki.net still does not support Scribunto/Lua...)

The effect of "#sub" is very weird, avoid it as much as possible on your wikis! (Note that in Lua, string indexing starts at 1, whereas in string functions, string indexing starts at 0).

If we assume that "#sub" uses string indexing starting at 0, then "<nowiki>This is a </nowiki>test" will be first "HTML-stripped" into "test", then it returns the substring starting at position 1, i.e. drops the first character "t" and returns "est". If string functions were not using "HTML-striping", the result would be "nowiki>This is a </nowiki>test", where it drops only the first "<".

I could test it in a sandbox page of Translatewiki.net, and visibly #sub in string functions really uses string indexing starting at 0, and it first strims its string parameter, then drops all HTML-like or XML-like elements (including "nowiki" even if it's not really HTML or XML) **with** their content, before computing and returning the substring. Because whitespace-trimming is performed first before "HTML tag stripping", if you want to disable the whitespace trimming of the parameter, you can surround that value with "<nowiki/>", so:

  • "ABC{{#sub:<nowiki/> DEF <nowiki/>|1}}XYZ" returns "ABCDEF XYZ"
  • "ABC{{#sub:<nowiki/> <br>DEF <nowiki/>|1}}XYZ" returns "ABCbr>DEF XYZ" (so the "HTML stripping" is not real, apparently it just strips "nowiki" opening and closing tags, **after** the initial whitespace trimming of the argument string)
RobinHood70 (talkcontribs)

As I said, I wouldn't have expected Parsoid to affect the results, since parsing the parameter itself is entirely at the preprocessor level. There was a lot that changed in 1.35 besides Parsoid, though, so I figured it made sense to check both.

As for what's supported by MediaWiki, it's been my experience that they don't seem to realize that not everybody is on the same update cycle they are or running all the same extensions as they are. Even so, at this point, Scribunto and ParserFunctions are both optional. Until they're a required part of the install process, I would expect WMF to support anything that they're distributing. I just checked and at least as of 1.38.1, both are being distributed as optional components.

RobinHood70 (talkcontribs)

You got me curious, so I looked at the version in 1.38 and now I see what's going on. Firstly, it's using the older Parser Function syntax where it parses all of the parameters first and then passes them along to the function that's handling that specific parser function, in this case runsub. So, if I recall correctly, that means the input to the function is converted to "<stripmarker>test". The very first thing runsub does is call killMarkers, so now it's left with just "test". From there, it's obvious why it produces "est".

Edit: I see you've updated your reply with similar info. At least now we understand. And I agree, for straight text, #sub is fine, but for anything out of the ordinary, avoid #sub at all costs.

RobinHood70 (talkcontribs)

You can see the same results at en.uesp.net (which is MW 1.29.3) and starfield.wiki.net (which is on 1.35.2).

Reply to "#sub,{{#sub:<nowiki>This is a </nowiki>test|1}}"