Help talk:Extension:ParserFunctions

About this board

Suggestion to state that {{#if: can check for multiple parameters

14
Snowyamur9889 (talkcontribs)

I discovered a while back through testing on another hosted wiki (MediaWiki 1.39.3, running PHP 8.1.19 (fpm-fcgi) ) that {{#if: can check for multiple parameters as the test string.


For example:

{{#if:{{{Param1|}}} {{{Param2|}}} {{{Param3|}}}|Param1, Param2, or Param3 exist|No parameter strings exist}}

{{{Param1|}}} is checked for as the test string, but {{{Param2|}}} and {{{Param3|}}} can also be checked. If at least one of the parameters exist and has a non-empty string argument, the string "Param1, Param2, or Param3 exist" will be displayed. Otherwise, "No parameter strings exist" will be displayed.


I think this note should be mentioned under the "if" section of this documentation. It's not explicitly mentioned you can do this, but the fact you can makes {{#if: significantly more useful for checking for multiple parameters in the test string.

RobinHood70 (talkcontribs)

Even faster for checking that sort of thing is: {{#if:{{{Param1|{{{Param2|{{{Param3|}}}}}}}}}...etc., though it has the disadvantage of being harder to read. The idea there is that if Param1 has a value, neither of the other two parameters needs to be evaluated.

Verdy p (talkcontribs)

That's not correct. If Param1 is explicitly given an empty value, the result of #if: will be always false, independantly of values given or not given to other parameters (that are used to set a default value for Param1, only if Param1 is not passed at all). As well, this is absolutely not faster. The expression given as a default value of a missing parameter is still evaluated before that parameter is checked, and the recursive inclusion of triple-braces is very unfriendly (avoid it completely, it is very errorprone).


Currently Mediawiki still does not perform "lazy evaluation" for expanding templates and parameters, until their values are effectively used and needed to be partially expanded (and finally fully expanded only at end of parsing).

A true lazy expansion would mean that each node in the parser is in either two state: unexpanded and unparsed text, or expanded and cached value of its expansion (the cache would keep the first unexpanded form as the key: if that key is not found in the cache, then the string is still not expanded and still needs to be parsed; nodes would point then either on an unexpanded text, or on a cache entry containing the <unexpanded,expanded> pair of texts; using a cache would allow avoiding unnecessary conditional expansions, and reuse of results of prior evaluation of the same unexpanded text).

MediaWiki currently uses a cache only for transclusions of templates (after parsing and resolving the template name, and its parameters ordered and named in canonical form), but I'm not even sure it uses any cache for parser function calls, so that they could return different values (e.g. for an "enumerator" parser function, or a parser function return different random numbers. This is alerady false for parser functions that return different time parts, which are all based from the same (cached) local time on the server: once you extract any part of that local time, the resulting parsed page will have its stored cache expiration adjusted/reduced to match the precision of the datepart requeste, but this does not affect how the current page. For "lazy parsing", I was speaking about a transient cache used exclusively in memory by the parser (which is not stored in the DB), and does not survive the full parsing and expansion of the current page.

RobinHood70 (talkcontribs)

I hadn't considered empty values. My mistake, and to be honest, that on its own probably makes the rest of this moot. But leaving that aside for the moment, though, as I recall, the parser will tokenize the values to PPNodes no matter what. So, all three get tokenized regardless of whether they're nested or sequential. During expansion, however, I thought the loop put the resultant value in the named/numbered expansion cache as soon as it found the relevant value and didn't fully expand the default values if it didn't need to (i.e., if one of the values was non-blank/non-whitespace). Am I wrong in that? It's been a while since I've looked at that code. I'll grant that even if it does so, the performance gain is minimal, but it's not nothing.

As far as recursive braces go, the only concern I'm aware of is brace expansion which, of course, is ambiguous if you have 5+ of them. Knowing that the preprocessor bases its decisions on the opening braces, unambiguous parsing is as simple as making sure that your braces use spaces as needed: {{{ {{, for example. Is there something I'm not thinking of that complicates triple-brace usage further? Or did I misunderstand your point?

I'm not sure if I've understood your last paragraph correctly, so I don't know if this is a helpful reply, but I believe parser functions are only cached in the sense that the entire page is cached. If you don't set the cache expiry in your PF, a so-called "random" number generator will happily return the same value on every view/refresh until edited or purged. If the PF does set it to a lower value, the page gets re-evaluated every X seconds. To the best of my knowledge, that means that something like a random number generator would cause the entire page to be re-parsed, most likely at every refresh unless they use a cache-friendlier value.

Verdy p (talkcontribs)

You did not understand: Any parser function that is supposed to return a different random number each tim it is called from the same rendered page will still not return different numbers: the invokation is made once and cached (in memoty only, not stored). What is cached AND stored in the DB is the result of the full page parsing (that's where you need a "purge/refresh", but "purge/refresh" has no effect on the memory cache for multiple invokations of the same parser function all, or module call, which cannot return different values between each call; but may return different values only when refreshing the whole page)

In summary, MediaWiki uses purely functional programming, invokations can be executed in any order, there should be no "side effect" with hidden "state variables". This also allows Mediawiki to delegate part of the work to multiple parallel workers (if supported), running without any kind of synchronization, and reuse their results (synchrionization would occur only on the in-memory cache). If we have hidden state variables that are mutable and that influence the result, this gives a strong performance penalty, forcing the evaluation to be purely sequential, and not allowing a compeltely "lazy" evaluation with all its advantages in terms of performance.

RobinHood70 (talkcontribs)

If you re-read, I actually did say the same thing as your first paragraph. That's what I was getting at when I said that a random number generator that doesn't set a cache-expiry would only be re-evaluated when the cache is invalidated. If the PF does set the cache-expiry, however, it's quite possible to create a random number generator that changes at every refresh, for example, this implementation of #rand. If you edit the page, you'll see a cachetime parameter which you can adjust. The corresponding value will be set in the page's NewPP limit report if you look at the page source, and you can also confirm the time the page was actually cached. (PS, feel free to edit that page if you want to confirm what I'm saying...that's a development wiki, so nobody will mind.)

Are you sure that invocations can be executed in any order? I was under the impression that MediaWiki's parsing was still linear and invocations come in a fixed order (except maybe if you're using the Visual Editor, since that uses Parsoid). That's been a point of some contention, since allowing invocations to come in any order breaks extensions that rely on preservation of state during parsing, like Extension:Variables and those that do other forms of variable assignment (e.g., loading variable values from a database). I know they talked about breaking that in 1.35, but I saw no mention of it in the change list for that or any version thereafter and I was under the impression that it was only Parsoid that did that, not the still-current preprocessor/parser. I honestly haven't had time to look into it thoroughly.

Verdy p (talkcontribs)

executuion order in MediaWiki was made so that it is not significant; this will enve be more critical for Wikifunctions, which must be purely function, and that will run in random order, over possibly lot of backend servers running asynchronously (the first that replies fills the cache so that further invokation is avoided). Lazy evaluation is critical for Wikifunctions to succeed. Various things have been fixed in MediaWiki to make sure that it can behave as a purely functional language, allowing parallelization. There are still things to do and there a some third-part Mediawiki extensions that still depend on sequential evaluation. For a full lazy evaluation it should not be needed to fully process arguments, except as needed from the head (optionally from the tail): this requires virtuallization of string values, so that we don't need the full expansion of the wikicode, but instead can parse it lazily and partially from left-to-right, just as needed to take the correct decision in branches, and then eliminate unparsed parts so that we don't even need to evaluate them.

This is possible in PHP, as well as in Lua, or even in Javascript, by using a string interface and avoiding using functions like "strlen" as much as possible, that requires a full expansion of its parameter string (for example if we use a code that matches only prefixes or suffixes, we jsut need to expand as many initial or final nodes as needed). Internally the "string-like" object would actually be stored as a tree of nodes, as long as they are not expanded, or as a flat list of string nodes (not necessarily concatenated to avoid costly copies and reallocations) if they are expanded. Even the TidyHTML step is not required to take a single physical string argument, where it can as well read characters from a flat list of string nodes (many of them being possibly identical and not takign extra memory, except for the node itself in the tree or list, represented in an integer-indexed table. This means that nowhere in the parsing, expansion and generation of the HTML we could really need to have the whole page in memory in large string buffers, and we can parallelize all steps of parsing/expansion/tidyHTML so that they behave as though they were operating linearily, and incrementally. This would boost the performance, even on a single server thread.

If we really use lazy evaluation, the number of nodes to evaluate would be very frequently reduced (notably for wiki pages that expand a lot of shared templates or parser functions with conditional results (#ifxxx, #switch) or using only part of their parameter (such as "substring" functions from the start or end of the text). The in-memory cache for lazy evaluation can be the tree of nodes itself, whose evaluation and expansion is partial, and that can also be valuated using delegates running asynchronously in parallel, possibly on multiple server/evaluator instances.

RobinHood70 (talkcontribs)

Thanks for all that. In what version of MW is linear parsing officially broken? I thought that was all being done in Parsoid and that we were safe from that kind of change until then, but from what you're saying, it sounds like even the legacy parser is being affected. That completely messes us up, as 75% of our wiki uses a custom extension that relies on linear parsing and state being maintained within any given frame (not to mention we have cross-frame data as well). The idea of it is that it can modify or create variables as well as returning them or inheriting them across frames, not to mention loading data from other pages. The dev team have promised us a linear parsing model as well, but there's been no information anywhere that I've found, so we're really in a holding pattern until we know what's going on.

Is this all documented anywhere? It would be nice to be able to keep abreast of these changes, but I haven't found anything that even remotely touches on these changes the way you just did.

RobinHood70 (talkcontribs)

Oh and apologies to Snowyamur9889 for this getting so far afield.

Till Kraemer (talkcontribs)

Thank you! I was looking exactly for this. Would be great to have it in the documentation. Cheers and all the best!

DocWatson42 (talkcontribs)
Verdy p (talkcontribs)

You're not even required to use space separators between parameter names in the #if: condition. This works only because if all parameters are empty or just whitespaces (SPACE, TAB, CR, LF), you get a string of whitespaces in the 1st parameter, and "#if:" discards leading and trailing whitespaces (but not whitespaces in the middle) from all its parameters. Note that "#if:" also discards HTML comments everywhere from all its parameters (so as well you can freely insert whitespaces or HTML comments just after "#if: ", or around pipes or before the closing double-draces, and you get the same result).

However "#if:" does not discard leading or trailing whitespaces if they are HTML-encoded, e.g. as &#32;, and for that last string, if it is used in the "#if:" condition, it will evaluate as as false condition. So "{{#if: &#32; | true | &#32;false&#32; }}" returns "#&32;false&32;": the expansion of HTML-encoded character entities into " false " is not performed during the evaluation of parser functions or template expansion, but only on the final phase that generates the HTML from the fully expanded wikitext, and cleans it with HTMLTidy (which may compress whitespaces and whick may then move leading and trailing whitespaces in the content of an element outside of that element, where they may be further compressed, except if the element is "preformated" (like HTML "<pre> </pre>" which is left intact; note also that this HTMLTidy step may reencode some characters using numerical character entities, in hexadecimal, decimal or using predefined named entities like "lt", "gt" or "quot", if this is needed to preserve a valid HTML syntax in the content of text elements or the value of element attributes; which reencoding is used exactly at this step does not matter, as they are fully equivalent in HTML and does not affect the generated HTML DOM, and this encoding is not detectable at all in templates or parserfuntions, or in any client-side javascript). Note that "pre" elements are treated much like "nowiki", so that its content in hidden in a uniq tag and not parsed, but regenerated from the "uniqtag" cache that stores its actual value after the TidyHTML step (so its inner whitespaces in the content are left intact, they are just "HTMLized" using HTML-encoding with character entities as needed).

Note also that if there's any "nowiki" pseudo-element in the condition string of "#if", it will always evaluate this condition to false, even if that "nowiki" pseudo element is completely empty. E.g. "{{#if: <nowiki/> | true | false }}" returns "true". Effectively "nowiki" elements are replaced during the early parsing by some "uniq tag" (starting a special character forbidden in HTML and containing a numeric identifier for the content); and are replaced by the actual content at end of template expansion of parser function calls, but just before the HTMLTidy step which may strip part of the content if it starts or ends by whitespaces.

RobinHood70 (talkcontribs)

I just tried {{#if: <nowiki/> | true | false }} on both my testing wiki and WP to be sure, and as I'd thought, it returns "true", not "false", though your reasoning is pretty much correct, otherwise. It sees <nowiki/> as a non-empty value because of the uniq tags that you mention.

Verdy p (talkcontribs)

You're correct (that's what I described, but I made a bad copy-paste from the former code just above). I just fixed my comment above.

Reply to "Suggestion to state that {{#if: can check for multiple parameters"

Replacing Consecutive Characters

4
70.160.223.43 (talkcontribs)

I'm using replace to remove certain characters like "&" from strings like "One & Two". I'm replacing the character with nothing but I'm left with two consecutive spaces. This is causing issues with file names. Is there a way to replace multiple consecutive characters with one?

This post was hidden by Verdy p (history)
Verdy p (talkcontribs)

If you try using just the "#replace:" function, use it a second time to replace two spaces by one space, after replacing the "&" (assuming that it may or may not have a single leading space and/or a single trailing space).

Note that "#replace:" only performs simple substitutions of literal substring, not replacements by matching regexps.

However I think it is a bad idea to silently drop the ampersand to generate filenames, it would be better to replace it with a hyphen and keep spaces where they are. Beware also that such substitution may break strings containing numerical or named character references (frequenely needed and present in the wikitext, or automatically added by the Mediawiki text editors): you should only replace "&amp;" not "&" alone.

As well, the ampersand character itself is authorized in Mediawiki filenames; but it must often be encoded as a character reference (in HTML text or wikilinks), or URL-encoded (when using it in a URL or in a query parameter). For URL-encoding, see the ParserFunction "urlencode:" (and select the correct form depending on the syntaxic type of target: PATH, QUERY...)

For stripping extra spaces (or underscores) in filenames, you can use the "#titleparts:", or "PAGENAME" functions (the later one also strips a leading ":" or recognized namespace prefix, but not a leading interwiki prefix).

Dinoguy1000 (talkcontribs)

The best method here would probably be to rewrite your template(s) as Scribunto modules, but if that isn't an option or practical for some reason, I'd probably approach it with multiple #replaces: remove the target character(s), replace all spaces with some character you're sure won't appear in the input, replace multiple space-replacement characters with a single one, and finally replace the replacement characters with spaces (these replaces can be reordered a little bit to your liking, e.g. removing the target characters can happen before or after replacing the spaces with a standin character). I might code this something like:

{{ #replace: {{ #replace: {{ #replace: {{ #replace: {{{1}}} | & }} | <!-- space --> | ¬ }} | ¬¬ | ¬ }} | ¬ | <nowiki/> <nowiki/> }}

For filenames specifically, speaking from personal experience, I'd recommend not to remove characters if you don't have to (though you might not have much choice if you're already dealing with a large collection of files that are named that way, of course).

Reply to "Replacing Consecutive Characters"

how to process loops ?

3
Wladek92 (talkcontribs)

Hi all is there an equivalent to 'for' or 'while' to process a list of items rather than to apply statement on a single item ? Thanks. --Christian 🇫🇷 FR (talk) 10:31, 25 June 2023 (UTC)

Cavila (talkcontribs)
Tacsipacsi (talkcontribs)
Reply to "how to process loops ?"

ifexist a registered user

3
190.242.129.62 (talkcontribs)

the magic word #ifexist it will always show the second parameter if the first parameter is put "Media:" and then an existing media file, instead if "File:" is put it will show the third parameter if there is no description, however, I don't know if it's possible with users, the user:Google~mediawikiwiki is registered, but as you can see, the user page doesn't exist because it hasn't been created, I've tried with the special pages (logs, listusers and contributions) with non-existent users, but it always gives the second parameter, is there a way it can shows the third parameter if the specified user on the first parameter doesn't exist?

RobinHood70 (talkcontribs)

Currently, there's no way to determine whether a user is registered or not via a parser function, and I can't think of any tricky ways around that limitation, either.

Verdy p (talkcontribs)

Media files are a bit special because they can possibly not exist on the local wiki but may exist in a shared wiki like Commons. #ifexist does not support testing on other wikis (it is much most costly then testing on a local wiki, as it requires using a REST API from another wikk. Usually, links to MEdiaWifiles are just there to render an "external" link to the MEdiafile as is then not tested. Usiing "Media:" in #ifexist to conditionally use "File:" to render it is not warrantied to work, except if the file is hosted on the local wiki (e.g. a file in English Wikipedia that is not importable and replaceable by a file with the same name on Commons).

When using external wikis, the parser will not invoke the thumbaneil renderer of the local wiki, but will just request the "file:" from the external wiki. However there's a delay for that external renderer to provide a reply. Whereas #ifexist can only perform synchronous requests (in a short completion time).

To support testing files on another wiki with #ifexist, it would require an optional parameter asking whever it wants to test another wiki by performing an asynchronous request; the problem is that Mediawiki does not support asynchronous requests, whose completion time is unpredictable. The same would be true idf one wants to used "shared" templates or modules.

So #ifexist is not tuned to allow aynchronous requests, that would block the rendering of the current page being parsed. I have no idea how this could work. #ifexist is supposed to run on the same database as the page being rendered. But if it was working we could test external links to any page on any wiki, using wikilinks with interwikis. Worse, the result would not be cachable in the parser cache; even if that external site has the shard resource, you don't know really how and when it will return the content (you don't even know what will be its metadata, notably the media type and size, which may change at any time (there's no mechanic for cache synchronization between different wikis). When you use a shared image, the parser assumes that the external file exists and that the external site will generate a thumbnial with the requested size and media type. The actual request to the external server is then not made by the parser and not cached, but made on client side, by the web browser of the visitor, making its own asynchronous requests, with the external site making all the needed parsing and transform to HTML or a thumbnail image, so that the local wiki will never parse the result

Reply to "ifexist a registered user"

#replace multiple strings?

3
Summary last edited by Tacsipacsi 12:48, 17 June 2023 9 months ago

Use nested #replaces.

V G5001 (talkcontribs)

Is it possible to replace multiple different strings within one string?

For example, I would want to do {{#replace:The dog is jumping.|dog,jumping|cat,walking}} or something similar to receive the output ”The cat is walking.”

Is this possible in any way?

Dinoguy1000 (talkcontribs)

Yes, just nest #replaces: {{#replace:{{#replace:The dog is jumping.|dog|cat}}|jumping|walking}}

Just be aware of the expansion depth limit (to say nothing of code readability); if you need a lot of separate replaces on the same string, it will probably be better to write it in Lua, as a Scribunto module. (You could also use Extension:Variables, but that extension unfortunately has an uncertain future given the direction the parser is headed in.)

V G5001 (talkcontribs)

Thanks, this worked

What is ParserFunctions programming language?

5
Sokote zaman (talkcontribs)

Which programming language does the functions in ParserFunctions use?

Keyacom (talkcontribs)

The #time function uses PHP's datetime format, except that it also defines extra functionality through x-prefixed properties.

The #expr function uses some custom language. Its operators are similar to the ones used in SQL (hence a single equals sign for equality).

Sokote zaman (talkcontribs)

Thank you for your reply What language do other functions use? Thanks

Keyacom (talkcontribs)

Also:

  • #timel uses the same syntax as #time
  • #ifexpr expressions use the same syntax as #expr
  • all of these functions are coded in PHP.
Sokote zaman (talkcontribs)

Thank you for your reply. Thank you

Reply to "What is ParserFunctions programming language?"

rendering prob w/parser fn in sidebar

6
GrayanOne (talkcontribs)

I would like to put a link in the sidebar to call purge:

[http:////index.php?title={{#replace:{{PAGENAME}}| |_}}&action=purge Purge page]

but it barfs; I end up with this fugly result:

"_}}&action-purge Purge page]" and the url == "MediaWiki:Sidebar".

I've tried n variations on escaping, but am stymied, so posting here in the hope that someone can help.

Aidan9382 (talkcontribs)

Your example isn't working because it is seeing the space in your #replace statement and seeing it as the end of the URL. It seems like urlencode would do what you are attempting to accomplish (see the below example)

  • [http://mediawiki.org/w/index.php?title={{urlencode:{{FULLPAGENAME}}|PATH}}&action=purge Purge page] -> Purge page
GrayanOne (talkcontribs)

Excluding the space is one of the things I tried. #replace docs say "if no char-to-search-for is supplied, then a space is assumed. So I removed the space and its pipe symbol, to give:

[http:////index.php?title={{#replace:{{PAGENAME}}|_}}&action=purge Purge page]

but alas, that gave the same fugly result.

I tried your code, and it parsed fine and works - thank you.

BUT it utterly fails to show up on the sidebar. The link is there in the saved edited page ... but flatly refuses to appear in the sidebar.

... a cuppa tea later ... I finally figured out that the real issue is the brackets. I don't think the sidebar is parsed in the same way as usual. At any rate, leaving them off worked. And thanks for the urlencode tip, I should've thought of that :(

http://mediawiki.org/w/index.php?title={{urlencode:{{FULLPAGENAME}}|PATH}}&action=purge Purge page

Verdy p (talkcontribs)

Actually, the correct option for URLENCODE is not PATH here, but QUERY, because this encoding is not made for inclusion in the path part of an URL, but in the query string. The encoding is a bit different for some characters (notably spaces, that are left as spaces in a path part, but encoded using "+" in a query string (after "?" but before any "#"), and as "%20" in a standard path or as "_" in a Mediawiki page name or its namespace; underscores "_" are left intact in a query string or a path part; there are other characters concerned by different encodings depending on where you place the encoded string in a, URL

Note that QUERY is the option is the default encoding form for "URLENCODE:", if you don't specfiy it. So you can use either:

  • [https://mediawiki.org/w/index.php?title={{urlencode:{{FULLPAGENAME}}|QUERY}}&action=purge Purge page]
  • [https://mediawiki.org/w/index.php?title={{urlencode:{{FULLPAGENAME}}}}&action=purge Purge page]

or equivalently:

  • [https://mediawiki.org/w/index.php?title={{FULLPAGENAMEE}}&action=purge Purge page]

Note that "action=purge" is also part of the query string, where it is a second parameter; when you pass multiple parameters, you need to separate them by a "&", but this "&" (and the "=" sign between the optional name and the value of each parameter) must NOT be urlencoded, so you need to urlencode (with the QUERY option by default) separately each parameter name or parameter value.

Finally, when the resulting URL must be embedded into an HTML page, you may sometime need to reencode the full result so that each "&" becomess a character entity "&amp;" or "&#38;" or "&#x26;", because the "&" is reserved in HTML and could be followed by something looking like a character entity; example if you want to pass the parameter "nbsp=2"


Note also that anchor parts or local data (after the first hash "#") are also encoded differently when targetting sections in MediaWiki pages (using "anchorencode:", which computes the relevant autogenerated id attribute in HTML that have other restrictions, use it if the section header is stable, but otherwise prefer using simple id values that you can place as target in an article with a valid id attribute that does not need to be reencoded and that should never contain any space; but autogenerated ids for section headers in MediaWiki pages are using a custom format to bypass most restrictions, but fail to generate unique ids for pages is there are two subsections with the same header text (in which case you need a separate id for that section: see "Template:Anchor" in Wikipedia).

GrayanOne (talkcontribs)

Wish I'd known about FULLPAGENAMEE. At first I thought you'd made a typo, but looked it up ... and there it is, all packaged up and ready to go. Good to know, appreciated.

Verdy p (talkcontribs)

The appended "E" means "url-Encoded" (with the QUERY encoding form). There are several other variables that are defined the same way. Note that to purge a page on the local wiki, you don't need to know the protocol, base domain name, and path:

  1. First there are a few builtin variables for them (including the server name), see the MediaWiki doc about "Magic" keywords.
  2. There's also a simpler compact form to create valid URLs, using the "{{FULLURL: ... }}" parser function which also accepts a parameter for the query string:
    [{{FULLURL:{{FULLPAGENAME}}|action=purge}} Purge the page]
    • It works only for the "title" query parameter used in almost all calls to the MediaWiki API, where you just specify the full page name (with its namespace), and add an optional parameter for the rest of the query string "action=purge" that must still be URL-encoded appropriately as a list of query parameters separated by "&", where each query parameter name and each of their given value (after the equal sign) must be URL-encoded separately; generally, URL-encoding the query parameter name like "action" is rarely or probably not needed (except possibly with the API of some extensions), but parameter values need it if they are possibly arbitrary text (and not as simple as "purge" here); and here also yous must not URL-encode these "&" and "=" separators in the syntax of the query string.
    • If you don't use any additional query parameter, "FULLURL:" will encode the URL using its short form, where the given full page name is just appended after a "/", using the URL-encoding form "PATH" instead of "QUERY": this gives the type of URLs you use on normal wiki pages (where spaces are replaced by underscores "_" instead of "+" and slashes "/" are left as is). So in summary, you get:
      1. [{{FULLURL:{{FULLPAGENAME:local page name}}}}#{{ANCHORENCODE:anchor}} Go to this page] as the near-equivalent of the wikilink:
      2. [[{{FULLPAGENAME:local page name}}#anchor|Go to this page]] or just
      3. [[local page name#anchor|Go to this page]] or just
      except that the first one is treated as if it was an external link and by default is rendered with a small trailing link icon (unless it is embedded in an element with the "plainlinks" class), so it is always a blue link (no test of existence of the target page is made, you won't get a red link to create a page if its does not exist and is creatable, it will behave as with "action=view" for a simple visit), whereas a red link with the wikilink syntax behaves as with "action=edit").
Reply to "rendering prob w/parser fn in sidebar"

Explode text change suggestion

4
MvGulik (talkcontribs)

Still think the used text: "The limit parameter ... allows you to limit the number of parts returned, with all remaining text included in the final part." is generally unclear. With "returned" being kinda ambiguous in nature.

Think the following would be more clearer/helpful: "The limit parameter ... allows you to limit the number of pieces the string is initially split into, with all remaining text included in the final part."


... O yea(off topic), on an other note. I was blocked from the SMW wiki site, including personal page, by pissing off the SMW team ... at there Github channel (permanent blocked there too of course). For more than 2 years now. No warning, or reprieve options either of course. Just kill and trow away the key behavior (Which is usually a behavior found in not so nice countries ...). Guess the SMW team (or at least one of there members) thinks differently about some things that seem to be generally advertised by the general MW community. But ... I guess that's just the real way the world turns.

Verdy p (talkcontribs)
(Reply to the offtopic subcomment)

This happens when projects are not managed the correct way. There are good reasons why some bug fix may need to be delayed for later, because developers have other priorities or need further researches or independant confirmation, or need additional technical details that the submitter alone cannot confirm. Of because there's another related technical issue blocking the immediate resolution (the initial submission could cause conflicts and technical problems elsewhere).

As long as there is no other proposer, providing some additonal hints or demonstrating a need for an actual use, and finding ways to avoid issues in other places, developers may estimate that this is not their imemdiate priority. But a good project manager (including in GitHub) has a working board that allows them to track and plan their future developments.

However, pressing developers to change their priorities and harassing them too often (or using bird names to qualify their works) may cause such action. However, blocking a user on one site should not have the effect of blocking him compeltely on another unrelated site where there was no such abuse and where the user was cooperating actively and successfully for many unrelated projects (even if messages sent on one site are unlilaterally forwarded by bots from one site to another into a dedicated channel for such forwards: the forwarding bots should just block the forwardings at its source site instead of its target site, because the initial bug submitter on the initial site was not actively posting to the target: this is a fault caused by incorrect administration of the forwarding bot using unfair rules for which the submitter has no control at all).

Note that Wikimedia sites do not allow such propagation of blocking, unless the user spammed too many related sites and abused them and continues trying to do that on other related sites (there's a rule for that, called "global blocking", which uses stricter policies, so that it won't harm all local projects for which the user did not violate any local policy). But I bet that the SMW wiki and SMW GitHub are just considered by them to be the "same" site. Their admins should clearly separate their goals and better explicit what is related, instead of using side-channel decisions based on unfair rules and without any notice.

Blocking indefinitely users, even if there was no real harm (just temporary irritation) and no prior warning (or taking a decision without letting the user being informed of what is going on and is being discussed by others), is a bad and damaging practice that compromize the openness of any opensourced or cooperative projects where there are different participants working on different sets of issues and with different technical backgrounds. If the irritation was temporary, such blocks should be temporary (just the time needed by admins to take some distance and finding their calm and rethink about their past decisions and their global effects on the supported projects where they should not decide everything alone without consulting their contributing community).

Anyway I don't understand why you think that the sentence "The limit parameter is available in ParserFunctions only, not the standalone StringFunctions version, and allows you to limit the number of parts returned, with all remaining text included in the final part" is ambiguous about the meaning of the term "returned", when it is immediately followed by a clarification about the last part returned (which may then include one or more coccurence the delimiter if this part is at a rank maching the specified limit). If the limit is 1, no split will ever occur; if you want to limit the result to 3 parts with no delimiter in any one of them, you just need to set the limit to 4 and ignore/discard the returned part 4 by processing only parts 1 to 3.

However, I agree that if the limit part is used, then an additional boolean parameter may be specified after it to indicate if the last part should contain all the rest, or just one part with no occurence of the delimiter after the spedified limit.

For now, the workaround for that restriction is to process again the returned value with "#replace:", and with the same delimiter, but with only the position parameter. Note that without a limit specified, #replace just returns 1 field, and it does not contain any occurence of the delimiter (this works when you expect to process items 1 by 1 and never in pairs for example).

MvGulik (talkcontribs)

I feel that in

"... allows you to limit the number of parts returned. ..."

In my view the "returned" part can also be interpreted as being directly related to the final "returned(selected)" output. Which it is not.

But thinking that it is 1) potentially hinders understanding of how #explode really works. And 2) kinda renders the trailing "with all remaining text included in the final part." text in my view more or less nonsensical at that point.

This suggested variation:

"... allows you to limit the number of pieces the string is initially split into ..."

Just ditched the "returned" part to get rid of that possible wrong interpretation, and uses "pieces" instead of "parts" to linkup better to the next position paragraph.

And I think its also a better/additional hint about how #Explodes works. (Selectively split string first(if limit is used), than pick (just one!) piece/part of that secondly).

Think that's the best I can come up with.

Verdy p (talkcontribs)

I agree that it should work like #sub for substrings, except that it does not count characters, but fields (or pieces), So to return to all fields (including with their delimiters) after the limit in the last selected part, just pass the limit as -1 (which is already used with a single position parameter).

Otherwise the default limit should be the position+1 (which ensures that there will always be one field returned and no delimiter in it).

If we specify position+2 as the limit, there will be at most 2 fields (and a single delimiter between them): any one of the two fields could be empty); there will be one delimiter return unless the second field is missing (i.e. there's no delimiter at all in the source text after the given starting position, just one field not followed by any delimiter, or none because the source text does not contain at least position delimiters).

Using #explode for getting more than 1 field (by using the limit parameter) is unreliable and gives unpredictable results where we don't know how many fields are actually present in the source text (i.e. if the source text contains more fields than expected than just fields between position and position+limit).

Say you just want to extract the 2 first fields of the text by using "position=0" and "limit=2", for now, you may get:

  • 2 fields with a single delimiter (only if the source text contains exactly two fields with a single delimiter between them), or
  • just one (the text is too short and does not contain any delimiter, the result may even be an empty string if the source text is empty), or
  • an arbitrary number of fields (with as many delimiters than the source text);

i.e. the returned text will be always the same as the input!

You don't have that problem when you just want 1 field (for that you just pass position and no limit at all).

The default limit should then be the same as if we passed limit=position+1; but for now this is not the case: with an explicit limit=position+1, we always get all fields present starting at the given position up to the end of the source text (i.e. what we would expect with limit=-1).

For now the only workaround to detect that you got more fields than expected, is to make a test by making a 2nd call to #explode with the same source text, but passing "position=limit", but NO "limit=" parameter: if what it returns is effectively empty, this means that the first call returned at most (limit-position) fields, separated by at most (limit-position) occurences of the delimiter (but there may be fewer fields and one less delimiters between them, with each field possibly empty between these delimiters).


Option for counting the effectively detected fields in the specified range

Now if you expect to find exactly the given number of fields and delimiters in the output, you need another way to count delimiters in the result, but "#explode" does not propose it (there could be an optional parameter to "#explode" saying what to return: a substring of the source text by default, or the count of fields detected between the two specified positions (counting fields) position and position+limit:

E.g. with "{{#explode:text|delimiter|position|limit|R}}

Such use of the "R" parameter (for raw) could be an alternative to not just detect the starting position (counted in characters in the whole text) of the first occurence of a substring, but actually to detect and count the number of occurences of that substring (specified as a delimiter for "#explode:") of the source text (in the range specified by position and limit and counted in terms of occurences of the delimiter), e.g. "{{#explode:text|delimiter|R}} would count how many delimiters are present in the whole text and return it as a decimal integer. (The "R" option mimics what happens in other parser functions, e.g. to get the number of members in a specified category and return the value as a raw unformatted integer number).


Other possible extensions (may not be necessary in a first implementation)

Optionally, using "N" instead of "R" could format that number in user's locale (you could as well specify a locale code like "en" or "fr", or a code like the empty string "" to use the default language of the local wiki, or "." for the locale of the local page if it has been set in its metadata) instead of returning the raw unformatted number (suitable for use in "#expr:" or "#ifexpr: "), but this is not really necessary as there's a separate parser function for that, and we don't need to repeat other parameters given to "#explode:".

Another possibly useful (but optional) parameter would be a parameter specifying by what we want to replace the delimiters in the results, e.g. |by=,&#32; to replace delimiters by a comma and space. But this could still be done as well by a separate call to another external parser function to make these replacements; however it requires passing again the separator to this replacement call, and it may just be performed more efficiently and with little efforts within "#explode:" itself, because it actually explodes the source text into an array of fields, but then really reassembles them into a single string using (by default) the same delimiter as the one used to explode the text. This optional "by=" parameter would have no effect if you use the extra "N" or "R" parameters as they actually never return any substrings from the source text, but just a single number.

We could use |by= with an empty value to remove all delimiters between the extracted field in the result, however the default value when |by= is not explicitly passed must be the same as the specified delimiter (meaning that delimiters won't be replaced or deleted in the reassembled output), to preserve the compatiblity.


Currently unspecified behaviours

For the position, if it is negative it is clearly described as meaning positions counted negatively, backward from the end of the string at position -1, so if you want to get at most the last two fields of a given text, you can use "{{#explode:text|delimiter|-3}}. But when the source text has less fields than expected (two fields in this example), the behavior is not clearly specified: does it return as many fields than those actually present in the text starting at the given position (because you could not count backward past the start of the source text but you return all those that were counted inclusively), or an empty string (not enough delimiters found), or two fields anyway (by appending an extra delimiter in the result)?

Another thing that is still not specified is what happens when this time the given limit is negative. In my opinion, this could return the empty string, but the backward counting rule used for the starting position could be applied too (meaning that it indicates fields counted in the opposite direction than the direction specified by the position). If the limit is zero, the result should always be an empty string.

Finally, it is not well described what happens when the position or limit is specified but as an empty parameter. In my opinon they should default respectively. to 0 (counting from the start of the text) and -1 (as many as needed to scan all fields from the starting position, so here it sets the limit the end of the text because the starting position 0 at start of text is scanned in the forward direction), as if these two parameters were not specified and used the same defaults.

Now, what is is also not clearly specified is what happens when this time the delimiter is empty:

  • is it invalid and returns an error text, or does it return an empty string (or maybe "0" with the "N" or "R" options), or
  • does it mean that it delimits and count individual characters (as if each one was suffixed by an empty delimiter), so that we can explode a text character by character and return them in a list separated by the string specified in "by=" (and terminated if the range specified by position and "limit"" scans past the end of text; if the scan is performed backward because the position is negative, all would occur as if each field was prefixed by an empty delimiter) —meaning that if the "by=" parameter is omitted or empty, the result would be the same as with "#len:" with the "N" or "R" options, or the same as the input text in absence of a range specidied by position and limit, or the same as "#sub:" otherwise?
Reply to "Explode text change suggestion"

Replace Function and Links

3
70.160.223.43 (talkcontribs)

I have a template variable called cast that I then link to the actor's page, like [[{{{cast|}}}]].

This works well unless there is more than one actor. In that situation they are seperated with a semicolon.

I'd like to #replace the semicolon with the end of a link and the start of a new link, given the starting and end link markup is already hardcoded. <nowoki>[[{{#replace:|;|]][[}}]]</nowiki>

The above example doesn't work and I think it's because of an order of operations issue of when the replace is done and links are made, but I'm not sure.

I've tried the ReplaceSet extension, and changed my code to <nowoki>[[{{#replace:|;=]][[}}]]</nowiki> but it also didn't work.

Dinoguy1000 (talkcontribs)

You can accomplish this by creating templates that contain just the characters ] and [ (or ]] and [[, to halve the template transclusions), then use those templates in place of the literal characters in your #replace. For example, creating "Template:))!" with the code ]] and "Template:!((" with [[, then using [[{{#replace:{{{cast}}}|;|{{))!}}{{!((}}}}]], will result in the list of links you expect.

However, this isn't a particularly readable method; the better option would be to write a Lua module (if you have that extension installed), or to use Extension:Arrays functions (if you have that one installed).

Verdy p (talkcontribs)

Note that the code suggested above by Dinoguy1000 removes all separators between links, so all actir names linked in the list would be glued together. You need to include separators (e.g. semicolon and space) with: [[{{#replace:{{{cast}}}|;|{{))!}}; {{!((}}}}]].

The same will be true with Extension:Arrays to process the semicolon-separated list (without necessaily needing any extra template transclusion like {{))!}} and {{!((}}), because each enumerated item can be formatted serparately as a full link, and the code can supply the extra separators between each item when appending all formatted links.

Note also that the third parameter for the replacement string for {{#replace: text | search | replacement }} is not separated from the second parameter for the search string by an equal sign, but by a vertical bar.

So maybe what you wanted to do, using only #replace and not extra templates, was actually [[{{#replace:{{{cast|}}}|;|]]; [[}}]]. It still works in MediaWiki because square brackets inside paired braces are left unchanged (not preprocessed for links) by Mediawiki before calling the #replace parser function, so the first occurence of ]] inside #replace parameters does not close the first occurence of ]] before the function call ; the same is true for the second occurence of [[ inside #replace parameters, which is not closed by the the second occurence of ]] after the function call. With that syntax, it would transform the value Alice;Bob given to the cast variable into [[Alice]]; [[Bob]], as expected, that MediaWiki will then process as two wikilinks. With Dinoguy1000's suggestion, you'd get[[Alice]][[Bob]] (most probably not what you want, given the description in your question)

Reply to "Replace Function and Links"

too many #time calls error

8
Findsky (talkcontribs)

I have a page that uses too much #time, and it ends with an error, can I modify it to increase the number of calls?

Matěj Suchánek (talkcontribs)

No, it's hard-coded: .

RobinHood70 (talkcontribs)

There's always the option of directly altering the extension file itself, though if you do that, you bear the responsibility for any issues it may cause and you'll have to re-make them any time you download an update. If you really need to do that, though, it's just a matter of changing the private const MAX_TIME_CHARS = 6000; near the beginning of the file Matěj Suchánek linked to a higher number.

Verdy p (talkcontribs)

I don't think this constant makes any limit on the number of calls as stated above; it sets a limit on the length of the format string to be parsed. So "#time" is not counted as a "costly parser function call" (like "#ifexist"): it is a memory limit and I think it's reasonnable that a format string should not be arbitrarily long text, its purpose is just to format a datetime value with a few punctuations and short words that may need to be escaped in quotation marks (e.g. "on", "at", "to", "the" and so on in English). So 6000 characters should always be MUCH sufficient for any datetime format string (in almost all common cases, this length does not exist about a dozen characters and extremely frequently it is less than 10 characters!).

Are you sure that the reported error is about "#time" ? As you don't provide any hint about which page is affected (and on which wiki, because this is not this one according to your contribution history here!), we can only guess some hints here about the possible common causes.

Isn't it about too many calls to "#ifexist" (possibly via some transcluded templates where such call cannot be conditionally avoided for wellknown and frequenetly used page names) or an error caused by too expensive Lua expansions (be careful about what you place in infoboxes, and may be there are spurious data in Wikidata, or insufficient filters in the data queries)?

One way to isolate these cases is to edit the page or a section of it, comment out some parts and make a preview (and look at parser statistics displayed at bottom of the preview or in HTML comments at end of the "content" section). If a part generates too much things, then it's a sign that it should go into a separate page or subpage (not to be transcluded from, but linked to).

Other tricks can also be used to reduce the expansion cost, notably if you use templates in long tables with too much inline CSS styles repeatedly: using page stylesheets can help reduce that cost a lot.

Other common cases include too long talk pages, that may need archiving (replace old tals by including a navigation template to the archives, don't transclude too many subpages).

However #time has another limit on the number of locale instances it can load and work with. It is rarely reached but may occur in some multilingual pages using too many locales simultaneously. Most pages shjould limit themselves to use only the default wiki language, or the page's language, or the user's preferred language (or native autonyms only for language names) to avoid breaking that limit.

RobinHood70 (talkcontribs)

If I understood the code correctly, #time is basically implementing its own version of an expensive parser call function, presumably since #time on its own is too cheap to count every single time as expensive. That 6000 characters isn't for a single call; it's the total length of the first parameter for all calls on the page. It's constantly increasing and the only time it's ever reset is on a call to ParserClearState.

Verdy p (talkcontribs)

I've not said that; #time is not an expensive call. But it has limits on the maximum lenth of its parameter for the format string, and a secondary limit on the number of locales it can process from the last optional parameter. It may produce errors, but not to the point of causing a serverside HTTP 500 error: you'll get a red message and a tracking category added to the page when it cannot exceed these limits, but there will still be a default bahavior, the rest of the page should be processed.

As well expensive parser calls (like #ifexist) are counted and even if the limit is reached it does not cause the page not to be processed and the server to reply with an HTTP error 500 without any content displayed. Instead a default behavior is chosen arbitrarily (e.g. #ifexist will operate as if the given page name in parameter was not existing). When template transclusion or parserfunctions expansions cause the maximum page size to be exhausted, there's as well a default behavior: the template is not expanded, instead MediaWiki just displays a visible link with the template page name, the rest is expanded if possible.

However shard limits that cause server side error 500 are memory limits for the expansion of the whole page in some cases, but most of the time it is the time limit (about 10 seconds) which may be reached on pages with too much contents that take too much time to process (especially in Lua modules, e.g. with some infoboxes trying to load too much data from Wikidata). All this has nothing to do with #time.

You did not post any hint abuot which page causes an error for you, so it's impossible to investigate what is the real issue. But I really doubt this is caused by #time: the 6000 character are certianly much enough for the format string, or your wiki page is definitely broken nad has to be fixed (e.g. there were mistched closing braces, or some unclosed "nowiki" section or HTML comment inside the parameters, causing a parameter to be larger than expected.)

RobinHood70 (talkcontribs)

You're misunderstanding what I'm saying completely. Pull out of your current mindset about what's going on here because you've misread both the initial report and my comments, and you're down a path that's completely unrelated to what's going on.

The OP isn't getting a 500 error or a "too many expensive parser functions" error or any other such thing. All they said was that they were getting the error "too many #time calls". That's a defined error in ParserFunctions/i18n/en.json, so we can infer that the error is coming from ParserFunctions, not somewhere else.

Now, re-read the code linked to, above. That error occurs not just when any format text is 6000 characters or greater, it occurs when the total length of the format parameter to all calls exceeds 6000. Notice that the length is accumulating via self::$mTimeChars += strlen( $format );. For whatever reason, that function has been designed to be self-limited in a fashion similar to an expensive parser function, but not actually part of that mechanism.

Verdy p (talkcontribs)

OK this message is misleading, I did not see that there was a "+=" accumulating all calls to #time, and I don't understand at all why this is done. The only useful thing would be to limit the size of the format string itself (and 6000 is even too large for that when this string should prbably never exceed about 256 bytes). If there are other performance issues, the message saying that there are too many "calls" is misleading, and insterad of accumulating lengths, it should use a separate counter (incrementd by 1 for each call, and in that case formatting 6000 dates would seem reasonnable; not if we format the same number of dates to some user languages we get a variable result; it may pass in English, not in Chinese if they need non-ASCII separators or extra characters like the CJK year/month/day symbols). So the implementation (or desiogn choice) is problematic, as well as the message reported.

I don't know why formatting many dates (in the same target language) would be expensive, when we do that for free when formatting numbers. Even if this requires loading locale data, this is one only once and cached for each target language.

With a total of 6000 bytes for all occurences of date format strings, and with each format string taking about a dozen of types for each, sometimes a bit more, this means we can only format about 500 dates on the same page: this is really not a lot, many data tables or talk pages will contain or exceed that number (notably when signature schemes are generated by a template or extension, and not by "tildes" expanded on each saved edit. This will then impact even the ongoing changes needed for talk pages and signatures (depending on how "tildes" are expanded) and will affect many pages showing modest tables with dates (e.g. in articles about sports, history, and so on, possibly generated by data loaded from Wikidata.).

This can as well affect the usabiility of administrative pages showing reports. Making them randomly usable depending on the user language, even though the real cost will be the same (formattiing dats costs much less than parsing any page, or tidying the whitespaces or HTML result, there's even more cost with HTML comments and independation that can fill up large amounts of data in the loaded page, need extra IO on database storage, and extra memory and CPU usage on servers that are much higher than the length of these small format strings; a total of 6000 bytes for all format strings is ridiculously small; it would not even change anything if it was 8KB, and in most cases these short strings are normally interned once they are parsed as parameters, all costs being in the main Mediawiki parser and not in, the #time call itself).

Reply to "too many #time calls error"