Help talk:Extension:ParserFunctions

About this board

Replace Function and Links

3
70.160.223.43 (talkcontribs)

I have a template variable called cast that I then link to the actor's page, like [[{{{cast|}}}]].

This works well unless there is more than one actor. In that situation they are seperated with a semicolon.

I'd like to #replace the semicolon with the end of a link and the start of a new link, given the starting and end link markup is already hardcoded. <nowoki>[[{{#replace:|;|]][[}}]]</nowiki>

The above example doesn't work and I think it's because of an order of operations issue of when the replace is done and links are made, but I'm not sure.

I've tried the ReplaceSet extension, and changed my code to <nowoki>[[{{#replace:|;=]][[}}]]</nowiki> but it also didn't work.

Dinoguy1000 (talkcontribs)

You can accomplish this by creating templates that contain just the characters ] and [ (or ]] and [[, to halve the template transclusions), then use those templates in place of the literal characters in your #replace. For example, creating "Template:))!" with the code ]] and "Template:!((" with [[, then using [[{{#replace:{{{cast}}}|;|{{))!}}{{!((}}}}]], will result in the list of links you expect.

However, this isn't a particularly readable method; the better option would be to write a Lua module (if you have that extension installed), or to use Extension:Arrays functions (if you have that one installed).

Verdy p (talkcontribs)

Note that the code suggested above by Dinoguy1000 removes all separators between links, so all actir names linked in the list would be glued together. You need to include separators (e.g. semicolon and space) with: [[{{#replace:{{{cast}}}|;|{{))!}}; {{!((}}}}]].

The same will be true with Extension:Arrays to process the semicolon-separated list (without necessaily needing any extra template transclusion like {{))!}} and {{!((}}), because each enumerated item can be formatted serparately as a full link, and the code can supply the extra separators between each item when appending all formatted links.

Note also that the third parameter for the replacement string for {{#replace: text | search | replacement }} is not separated from the second parameter for the search string by an equal sign, but by a vertical bar.

So maybe what you wanted to do, using only #replace and not extra templates, was actually [[{{#replace:{{{cast|}}}|;|]]; [[}}]]. It still works in MediaWiki because square brackets inside paired braces are left unchanged (not preprocessed for links) by Mediawiki before calling the #replace parser function, so the first occurence of ]] inside #replace parameters does not close the first occurence of ]] before the function call ; the same is true for the second occurence of [[ inside #replace parameters, which is not closed by the the second occurence of ]] after the function call. With that syntax, it would transform the value Alice;Bob given to the cast variable into [[Alice]]; [[Bob]], as expected, that MediaWiki will then process as two wikilinks. With Dinoguy1000's suggestion, you'd get[[Alice]][[Bob]] (most probably not what you want, given the description in your question)

Reply to "Replace Function and Links"

too many #time calls error

8
Findsky (talkcontribs)

I have a page that uses too much #time, and it ends with an error, can I modify it to increase the number of calls?

Matěj Suchánek (talkcontribs)

No, it's hard-coded: .

RobinHood70 (talkcontribs)

There's always the option of directly altering the extension file itself, though if you do that, you bear the responsibility for any issues it may cause and you'll have to re-make them any time you download an update. If you really need to do that, though, it's just a matter of changing the private const MAX_TIME_CHARS = 6000; near the beginning of the file Matěj Suchánek linked to a higher number.

Verdy p (talkcontribs)

I don't think this constant makes any limit on the number of calls as stated above; it sets a limit on the length of the format string to be parsed. So "#time" is not counted as a "costly parser function call" (like "#ifexist"): it is a memory limit and I think it's reasonnable that a format string should not be arbitrarily long text, its purpose is just to format a datetime value with a few punctuations and short words that may need to be escaped in quotation marks (e.g. "on", "at", "to", "the" and so on in English). So 6000 characters should always be MUCH sufficient for any datetime format string (in almost all common cases, this length does not exist about a dozen characters and extremely frequently it is less than 10 characters!).

Are you sure that the reported error is about "#time" ? As you don't provide any hint about which page is affected (and on which wiki, because this is not this one according to your contribution history here!), we can only guess some hints here about the possible common causes.

Isn't it about too many calls to "#ifexist" (possibly via some transcluded templates where such call cannot be conditionally avoided for wellknown and frequenetly used page names) or an error caused by too expensive Lua expansions (be careful about what you place in infoboxes, and may be there are spurious data in Wikidata, or insufficient filters in the data queries)?

One way to isolate these cases is to edit the page or a section of it, comment out some parts and make a preview (and look at parser statistics displayed at bottom of the preview or in HTML comments at end of the "content" section). If a part generates too much things, then it's a sign that it should go into a separate page or subpage (not to be transcluded from, but linked to).

Other tricks can also be used to reduce the expansion cost, notably if you use templates in long tables with too much inline CSS styles repeatedly: using page stylesheets can help reduce that cost a lot.

Other common cases include too long talk pages, that may need archiving (replace old tals by including a navigation template to the archives, don't transclude too many subpages).

However #time has another limit on the number of locale instances it can load and work with. It is rarely reached but may occur in some multilingual pages using too many locales simultaneously. Most pages shjould limit themselves to use only the default wiki language, or the page's language, or the user's preferred language (or native autonyms only for language names) to avoid breaking that limit.

RobinHood70 (talkcontribs)

If I understood the code correctly, #time is basically implementing its own version of an expensive parser call function, presumably since #time on its own is too cheap to count every single time as expensive. That 6000 characters isn't for a single call; it's the total length of the first parameter for all calls on the page. It's constantly increasing and the only time it's ever reset is on a call to ParserClearState.

Verdy p (talkcontribs)

I've not said that; #time is not an expensive call. But it has limits on the maximum lenth of its parameter for the format string, and a secondary limit on the number of locales it can process from the last optional parameter. It may produce errors, but not to the point of causing a serverside HTTP 500 error: you'll get a red message and a tracking category added to the page when it cannot exceed these limits, but there will still be a default bahavior, the rest of the page should be processed.

As well expensive parser calls (like #ifexist) are counted and even if the limit is reached it does not cause the page not to be processed and the server to reply with an HTTP error 500 without any content displayed. Instead a default behavior is chosen arbitrarily (e.g. #ifexist will operate as if the given page name in parameter was not existing). When template transclusion or parserfunctions expansions cause the maximum page size to be exhausted, there's as well a default behavior: the template is not expanded, instead MediaWiki just displays a visible link with the template page name, the rest is expanded if possible.

However shard limits that cause server side error 500 are memory limits for the expansion of the whole page in some cases, but most of the time it is the time limit (about 10 seconds) which may be reached on pages with too much contents that take too much time to process (especially in Lua modules, e.g. with some infoboxes trying to load too much data from Wikidata). All this has nothing to do with #time.

You did not post any hint abuot which page causes an error for you, so it's impossible to investigate what is the real issue. But I really doubt this is caused by #time: the 6000 character are certianly much enough for the format string, or your wiki page is definitely broken nad has to be fixed (e.g. there were mistched closing braces, or some unclosed "nowiki" section or HTML comment inside the parameters, causing a parameter to be larger than expected.)

RobinHood70 (talkcontribs)

You're misunderstanding what I'm saying completely. Pull out of your current mindset about what's going on here because you've misread both the initial report and my comments, and you're down a path that's completely unrelated to what's going on.

The OP isn't getting a 500 error or a "too many expensive parser functions" error or any other such thing. All they said was that they were getting the error "too many #time calls". That's a defined error in ParserFunctions/i18n/en.json, so we can infer that the error is coming from ParserFunctions, not somewhere else.

Now, re-read the code linked to, above. That error occurs not just when any format text is 6000 characters or greater, it occurs when the total length of the format parameter to all calls exceeds 6000. Notice that the length is accumulating via self::$mTimeChars += strlen( $format );. For whatever reason, that function has been designed to be self-limited in a fashion similar to an expensive parser function, but not actually part of that mechanism.

Verdy p (talkcontribs)

OK this message is misleading, I did not see that there was a "+=" accumulating all calls to #time, and I don't understand at all why this is done. The only useful thing would be to limit the size of the format string itself (and 6000 is even too large for that when this string should prbably never exceed about 256 bytes). If there are other performance issues, the message saying that there are too many "calls" is misleading, and insterad of accumulating lengths, it should use a separate counter (incrementd by 1 for each call, and in that case formatting 6000 dates would seem reasonnable; not if we format the same number of dates to some user languages we get a variable result; it may pass in English, not in Chinese if they need non-ASCII separators or extra characters like the CJK year/month/day symbols). So the implementation (or desiogn choice) is problematic, as well as the message reported.

I don't know why formatting many dates (in the same target language) would be expensive, when we do that for free when formatting numbers. Even if this requires loading locale data, this is one only once and cached for each target language.

With a total of 6000 bytes for all occurences of date format strings, and with each format string taking about a dozen of types for each, sometimes a bit more, this means we can only format about 500 dates on the same page: this is really not a lot, many data tables or talk pages will contain or exceed that number (notably when signature schemes are generated by a template or extension, and not by "tildes" expanded on each saved edit. This will then impact even the ongoing changes needed for talk pages and signatures (depending on how "tildes" are expanded) and will affect many pages showing modest tables with dates (e.g. in articles about sports, history, and so on, possibly generated by data loaded from Wikidata.).

This can as well affect the usabiility of administrative pages showing reports. Making them randomly usable depending on the user language, even though the real cost will be the same (formattiing dats costs much less than parsing any page, or tidying the whitespaces or HTML result, there's even more cost with HTML comments and independation that can fill up large amounts of data in the loaded page, need extra IO on database storage, and extra memory and CPU usage on servers that are much higher than the length of these small format strings; a total of 6000 bytes for all format strings is ridiculously small; it would not even change anything if it was 8KB, and in most cases these short strings are normally interned once they are parsed as parameters, all costs being in the main Mediawiki parser and not in, the #time call itself).

Reply to "too many #time calls error"

Helpful magic words needed

3
Sarang (talkcontribs)

The commons tempkate "Information" and all its derivates need parameter values for <code>date</code> and <code>author</code>.

Parser functions or magic words to generate these values wuld be very helpful.

Currently some users take "<code><nowiki>~~~~~</nowiki></code>" as a workaround for the date; but this needs correction because a date consist of year-month-day and is not a timestamp which has also the time of the day.

For the author a valid user name is needed, and not a signature as created by <code><nowiki>~~~</nowiki></code>; nevertheless some users take it as a workaround, because no better possibility exists. But it makes more troubles and needs cleanup, because an author can be a wikipdia user but not his talk page.  

Impressive names for these functions may be e.g. ~date and ~user, or ~~~d and ~~~u.

~~~~

Dinoguy1000 (talkcontribs)

Date can be inserted via some variant of {{ subst:#time: F j, Y | now }}, which can be included in whatever default skeletons are provided for copy-pasting to quickly fill out on pages. Username is a slightly more interesting case, assuming the syntax automatically substitutes when the page is saved, as ~~~~ et al do, but this is the wrong place to request such an addition; you'll need to file a feature request on Phabricator.

Tacsipacsi (talkcontribs)

For the username, one can use {{subst:REVISIONUSER}} (although I don’t think ~~~ is wrong; different users write their names differently). For the date, it should be {{subst:#time:Y-m-d}} (ISO 8601 format) so that it can be formatted in the user’s interface language. (The |now part is unnecessary, as “now” is the default anyway if nothing else is specified.)

Reply to "Helpful magic words needed"

How to config list of full month names?

7
Plantaest (talkcontribs)

There is a small problem. {{#time: F | 2022-1-1 }} will produce the result Tháng 1 on Vietnamese Wikipedia, but the community wants to localize the result to tháng 1 (first character is lowercase). How to config list of full month names of Vietnamese language? Thanks!

Verdy p (talkcontribs)

"#time" and "#timel" are part of the core ParserFunction extension, prebundled with MediaWiki. Its localisation is part of that extension itself. See Extension:ParserFunctions.

Note that its translation is said there to be made in translatewiki.net (in the ext-parserfunctions message group). Once translated there, they are later synchronized to the MediaWiki repository, and a bit later deployed to wikis (this usually takes about one week, but may be more).

However, that message group does not contain month names (or weekday names), but only a few usage error messages: instead, these month/weekday names (and other related date/time items or formats, with support for many calendars) seem to be imported into the Mediawiki project from CLDR.

But you should look at how they are translated in CLDR. See https://cldr.unicode.org/translation/date-time/date-time-names for the description, and then look at data charts (on https://unicode-org.github.io/cldr-staging/charts/latest/by_type/date_&_time.fields.html) and search for ·vi·in the web page with your browser (you'll see that they were vetted in CLDR with leading capitals). You may attempt to contact the CLDR project (or participate to it, including in its own local discussions) if Vietnamese people think that such forced capitalization should not be used in Vietnamese (this capitalization is not a requirement, it is not forced in many other languages)

If CLDR rejects the request, but the Wikimedia community wants to change that, Mediawiki may override these inside its data sources in PHP: ask to Phabricator if you need such overrides (if you make such Phabricator request, they will probably check before the opinion from the CLDR technical comity and its vetters, but getting a reply from them can be long: new CDLR data are released occurs only about yearly, the vetting process has always been very slow to get a consensus for changes, and sometimes a single review cycle is not sufficient and it takes several years to reach the minimum vetting quorum needed for such changes; but the CLDR TC may sometimes decide faster than vetters). Note also that CLDR data is not used just by Mediawiki, it is used for the localisation of lot of softwares (including for example standard C libraries, and i18n components inside operating system APIs). However the single demand for changing *only* the capitalization should be easier to obtain (there's likely no major technical issues).

Plantaest (talkcontribs)
Verdy p (talkcontribs)

sprintfDate is part of PHP... its uses data from CLDR (directly inside its sources, or indirectly via some OS API, I don't know). Also I don't know if PHP developers have given a way to provide overrides. Generally date elements and formats are so common in many softwares that i18n libraries have stopped maintaining such data themselves; so this is done is CLDR which supports the vastest set of languages and scripts, with maximal interoperatbility across systems (and with less risks of serious errors or ambiguities that could be causing serious problems, e.g. when processing data containing date/time).

Today, CDLR data is de facto the standard used in lot of systems for such very common things, because it is technically "clean" (this does not mean that it is "complete" and the only possible choice for localization: CLDR is not an absolute requirement for many softwares that can still provide overrides, or ignore completly some parts of the data, but then they have to maintain these overrides themselves and to endorse the risks).

For some 18n libraries however such updates of CLDR can take many years... or will never happen if those softwares are not updated, notably applications statically linked with C standard libraries, and systems installed with such softwares are not also updated themselves (there are for examples examples for security-critical components implementing wellknown protocols like HTTP and MIME, where any change is blocked or will not happen just because CLDR vetted for prefering something else; this generally happens for English date formats, documented and stabilized in IETF RFC's or standards from ISO, W3C, ITU, and other international standard bodies: the CLDR TC admits that, and when needed provides a few specific "technical locale codes", such as "root" or "POSIX").

Plantaest (talkcontribs)

Thanks again, I will follow your instructions. If changing data of CLDR is difficult, I think I will add lowercase() function in some wiki templates or Lua modules.

Verdy p (talkcontribs)

But you should still try to create a ticket for that on Phabricator (which will also allow tracking an identical request made to CLDR, and then follow the decisions and updates: if there's an override used in Mediawiki that becomes unnecesary later if CLDR accepts it, the tracker will help make that cleanup for data that we likely don't to maintain alone). Note that a well documented community decision in Wikimedia is not ignored by the CLDR TC: they honor the request also by seeing evidences: the CLDR data is just the current best "state of the art" they know about (that's why it includes a "vetting" process open to everyone in the world: that data CAN change later).

Note: be careful in templates or modules to NOT force a lowercase() call or similar {{lc:}} ParserFunction call without knowing precisely the language to which it applies (it must still not apply to English or German for example, or in "default" options applying to all languages you're not currently supporting in these templates/modules: do that specifically for Vietnamese languages and varieties you support). And be prepared to support users of your wiki, that may complain that some other templates or pages no longer work as expected, showing a lowercase initial instead of the expected leading capital: announce the change in a public area, document it in an appropriate place for how to simply solve that "problem" (e.g. with a simple {{Ucfirst:}} in wiki pages and templates).

You don't want to see others reverting your change, flaming you (or worse entering in some edit war and blocking you, independantly of the efforts you took to check as many affected pages as you could find and test). One way to avoid that risk is to first add a tracker category in templates/modules before applying the change and look at its existing usage on your wiki (Look at the "Special:What links here" tool). There may be some limited temporary quirks by applying the change, but explain that affected pages are currently being updated to take the change into account and fix these pages that can't work instantly with some compatibility code you've prepared: this takes some time (may be several days to do that correctly with enough care and not in precipitation with risks of adding some new errors), so allow people to be patient and to even help you in the process (quirks that can happen are for example how pages are autocategoried with some date element: this may change how a target category is named; other affected names could be the names of subpages; in some cases it may help to add some redirection pages to make a smoother transition, with less people complaining or hostile to your change). See for example how the addition of the "c:" interwiki prefix was added a few years ago for Wikimedia Commons, instead of just "commons:" which unfortunately was also one of its local namespaces: it took some time of preparation to detect affected pages and document the changes and how to fix some remaining easy-to-fix bugs that may persist but were not detected, and to hear the discussions about how to perform that change with less friction.

Plantaest (talkcontribs)

Thanks for the advice. I will be careful with changes! The Vietnamese Wikipedia community is very kind; just discuss first, everyone will follow. (I'm not good at English, so I can only write basic sentences)

Reply to "How to config list of full month names?"

Using {{lc:}} in templates

5
Dandorid (talkcontribs)

It took me a while to figure out why my Template:IPaddr broke when I moved a {{lc:}} that encompassed the whole template to somewhere deep within, where it was actually needed. Instead of {{lc: [template contents]}} I wrote just {{lc:{{{1|}}}}}, as only the first anonymous argument needed lower case substitution. At first glance this code is perfectly right and it worked like a charm. However, when used to display the IPv6 loopback address ('::1') the working of the template collapsed, as the wikicode generated a <dl><dd></dd>...</dl> sequence!

Using {{lc:<nowiki>{{{1|}}}</nowiki>}} seems to stop the interpretation of {{{1}}}, or even {{lc:<nowiki></nowiki>{{{1|}}}}}, but I am wondering if there is a nicer way to tell {{lc:}} not to interpret the argument if it starts with a colon.

Verdy p (talkcontribs)

Why not using {{lc:<nowiki/>{{{1|}}}}} ?

Internally the empty <nowiki/> tag is first replaced before preprocessing the wikitext, by a special marker (delimited by a pair of reserved ASCII controls forbidden in valid HTML, but containing an identifier which is not case-sensitive 'the identifier is referencing the hidden content of the nowiki tag, but here it is empty): this should prevent the interpretation of the next colon following this stripped marker as being part of the "lc:" syntax for calling the builtin parser function or as being part of a namespace prefix, so this colon (at start of the value of the IPv6 address coming from the expanded value of parameter {{{1|}}}) will be passed verbatim to the builtin function (the builting function will also receive the special marker, it may eventually decide to drop it).

Later that special stripped marker will be stripped when the final HTML is tidied (at end of the template/functions expansion phase), if the builtin function has not already stripped it in its return value from its input parameter (some parser functions first strip their parameters, for example removing leading/trailing spaces or stripped markers; others don't and preserve them (but this may have suprising results in some parser functions, notably those extracting substrings or computing string lengths, if they are not aware of the possible presence of stripped markers, which should be either kept entirely or removed completely, but not subdivided; the same remark applies to invokations of Lua functions stored in Scribunto modules, but it is less visible, given the fact that function invokations start by #invoke: which is immediately followed by a module name that cannot start by a colon, then follwed by a vertical bar and the function name which also cannot start with a colon, and all other optional function parameters are prefixed by a single vertical bar).

If this still does not work, maybe a workaround would be to invoke a Lua function invokation (instead of a wiki template transclusion) to process the IPv6 address (it may be even more efficient, as it has many subtle complexities due to its variable allowed format, if you want to transform it into canonical form).

Note also that an IPv6 address starting by :: may also be written by prefixing it with 0:: instead (so ::1 and 0::1 are both valid and equivalent): this 0:: prefix in IPv6 is currently only used in IPv6 for some local link addresses and only the least significant 64-bit part (in fact only 32-bit in almost all existing implementations) may have an host address part (there's no other assignment in the 0::/16 IPv6 address block).

Note also that there exists two "canonical forms" for IPv6 addresses: one where only the first sequence of /(0{1,4}:)+/ 16-bit fields is replaced by "::" and all other occurences of /0{1,4}:/ are replaced by "0:"; the other one removes any occurence of "::" and insert the necessary number of "0:" fields to match the expected 128-bit length of IPv6 (i.e. with eight 16-bit fields). There cannot be two occurences of "::" in the same IPv6 adress specification. As well, both canonical forms effectively have its letter case unified (for its ASCII hexadecimal digits, generally to lowercase). Some applications also do not compress leading zeroes in each 16-bit field, to use instead a fixed format so that a fields starts by 2, 3 or 4 zeroes, but usually the compressed form without duplicate leading zeroes in fields, and where the first sequence of zero fields separated by ":" is cleared to exhibit a remaining "::" is the most widely used, as it is easier to read

For internal processing this formatting generally does not matter, all is performed internally in protocol frame formats using 128-bit binary fields, not binary... except for IPv6 addresses between square brackets when they are used as hostnames in URLs: because URLs are indexed also for domain names, canonicalization of the string form is necessary and usually uses the most compact format, even if this does not change the protocol for DNS queries, or for reverse IPv6 DNS lookups which uses a very different "backward" dotted format, grouping hexadecimal digits one by one and not by group of four... with an exception for a small range of IPv6 addresses that are designed for "compatibility" to be strictly equivalent to IPv4 addresses: in that case the reverse IPv6 DNS lookup "fallbacks" at the final least significant 32-bit part, to use reverse IPv4 lookup byte by byte, but modern DNS implementations do not need such fallback and can also reverse lookup in that space without using any special delegation: this IPV4 address is already reindexed inside IPv6 by internet gateways for routing and delegation announcements; but if your DNS provider does not handle this small IPv6 space, the fallback to IPv4 queries still works in that subspace and will work for long and in practice its is rarely used by final clients which for now all uses a dual stack, it is only used by IPv6-only clients but generally immediately handled by the router of their upstream ISP or DNS provider, as well modern OSes all have support for this specific ccompatiblity space in their existing IPV6 implementation when performing IPV4 lookup via an IPV6 connection, either with UDP, or TCP, or the new secure HTTP proxying protocol aka "DOH").

Other forms are not using strings, but just binary representation, unsigned 128-bit integer values, or an ordered pair of unsigned 64-bit integer values, or an ordered vector of sixteen unsigned 8-bit bytes: this is generally used for internal purpose, such as indices and the efficient implementation of filters/routers/firewalls, to internally compress their database and faster processing, but it is of course also the format on which it is used in the core IPv6 protocol itself.

Note also that if the IP address was already preprocessed once using {{lc:}} in a template, and the template result is then passed as a parameter value to another template, you may have dififculty to pass this value if you don't use again another empty nowiki tag to prefix this value. You should also not embed the IPv6 address inside a nowiki tag. Stripped markers for empty wiki tags are also normally constant in all occurences as their content value is also a constant empty string and so these markers use the same internal reference identifier (MediaWiki may also optimize internally these constants and reduce automatically sequences of empty nowiki markers to a single one; the internal form of this marker is implementation dependant but does not matter, what only matters is the presence of the pair of leading/trailing forbidden controls to surround them).

Dandorid (talkcontribs)

Thanks for this elaborate answer. I think that your proposed solution is a valid one. I have, for the moment, settled for putting a <span> inside it, which also works.

Your explanation about IPv6 addresses is correct, but the point is that we need to describe exactly that on the Wiki page for IPv6 addresses. Therefore, the IPaddr template needs to take any form (full, shortened, valid or even invalid, as the text may be) and still try to display it like an IP address. It is not likely that this template would be called from another template, as it is a markup template used directly in the written text. But anyway it is good to know that these issues with {{lc:}} exists, so I can be vigilant.

Verdy p (talkcontribs)

Doing IP address sanitization and normalization is really tricky to do with the Mediawiki parser functions in templates. It's a use case where Lua modules called with Scribunto will make it much easier to perform and test correctly. Such module can perfectly handle the presence of some "nowiki" tags (or stripped markers) in the given parameters, it can easily manage whitespace trimming, lettercase, and the various ways to abbreviate an IPv6 address, with a variable number of colons, and the possible presence of surrounding square brackets found in URLs, and then return a canonicalized form of that address that is easy to compare (it may return several forms: a short form using "::", an expanded form with static length using only single ":" colons, an hexadecimal form without separator, an IPv4 dotted-decimal form for some compatibility ranges, all that with a single letter case for hexadecimal parts....). There's already such Lua modules using such technics.

Matěj Suchánek (talkcontribs)

Sounds like another interesting case of phab:T14974. I think no way of avoiding it will be nice.

Reply to "Using {{lc:}} in templates"

Cannot subst string functions?

4
Summary by RZuo

StringFunctions is not enabled on Commons.

RZuo (talkcontribs)

{{Countries of Asia|prefix={{subst:#sub:{{subst:FULLPAGENAME}}|0|{{subst:#pos:{{subst:FULLPAGENAME}}| in }}}}}}

for example, i used this string in c:special:diff/702219712. why the substitution of string functions are not done?

Verdy p (talkcontribs)

You're just trying to use an uninstalled extension: "#sub:" and "#pos:" are part of the "string functions" extensions, which is not available on Commons (and not even needed in the page you tried to edit to use them). Commons uses basic parser functions, and some other extensions, but not this one.

Before using "subst:", first check that your edit works without it; add "subst:" only after tests.

RZuo (talkcontribs)

obviously i'm just using that page to test, not intending to use the string on that page. cant even understand that?

Verdy p (talkcontribs)

Just try in a local sandbox or in a preview: {{#sub:ABC|1}} on Commons, and you'll see that it is not recognized (and left unexpanded, so you cannot use "subst:" on such invokation if the extension is not installed and supported on the wiki).

The "string functions" (i.e. "#sub:", "#pos:", "#len:") are not par of the core parser functions (you are talking here on a page about the core ParserFunctions, which does not include these functions). Commons however allows you to use another extension, Scribunto, which supports string functions in a more advanced module.

The situation may be different depending on wikis (e.g. "string functions" are working in Translatewiki.net, which on the opposite does not support the "Scribunto" extension to call Lua modules; "String functions" are documented in another subject page but not this one).

First identify the extension defining the function, and then check if it is deployed on the target wiki (there's a summary matric in MediaWiki about deployment of extensions, but you can also look at the "Special:Version" page on the target wiki).

What is ParserFunctions programming language?

5
Sokote zaman (talkcontribs)

Which programming language does the functions in ParserFunctions use?

Keyacom (talkcontribs)

The #time function uses PHP's datetime format, except that it also defines extra functionality through x-prefixed properties.

The #expr function uses some custom language. Its operators are similar to the ones used in SQL (hence a single equals sign for equality).

Sokote zaman (talkcontribs)

Thank you for your reply What language do other functions use? Thanks

Keyacom (talkcontribs)

Also:

  • #timel uses the same syntax as #time
  • #ifexpr expressions use the same syntax as #expr
  • all of these functions are coded in PHP.
Sokote zaman (talkcontribs)

Thank you for your reply. Thank you

Reply to "What is ParserFunctions programming language?"

How to compare text strings ?

9
Place Clichy (talkcontribs)

Hello!

I am looking for a way to test 2 text strings alphabetically in template code. I want to add a parent bilateral relations category, which are in the format of e.g. commons:Category:Relations of Bangladesh and Myanmar (note the alphabetical order: Bangladesh < Myanmar). What I would like to do is something like that:

{{#ifexpr: {{{1}}} < {{{2}}} 
   | [[Category:Relations of {{{1}}} and {{{2}}}]]
   | [[Category:Relations of {{{2}}} and {{{1}}}]]
}}

Of course this does not work, because #ifexpr compares numerical expressions, not alphabetical ones. The way I am currently doing it is (simplified):

{{#ifexist: Category:Relations of {{{1}}} and {{{2}}}
   | [[Category:Relations of {{{1}}} and {{{2}}}]]
   | [[Category:Relations of {{{2}}} and {{{1}}}]]
}}

The trouble is that if parameters are in the reverse alphabetical order (e.g. {{{1}}} is Myanmar and {{{2}}} is Bangladesh) and there is a category redirect (e.g. commons:Category:Relations of Myanmar and Bangladesh softly redirects to commons:Category:Relations of Bangladesh and Myanmar), then this code adds the redirected category instead of the target one.

Does anyone have any idea? The template in question is commons:Template:Aircraft of in category.

Verdy p (talkcontribs)

There's no builtin support in "#ifexpr:" or "#if:" to compare strings. You need another parserfunction, for example you can use "#invoke:" via Scribunto, to call a function defined by a Lua module. Note however that Lua basically performs a lexicographic comparison of strings with its "<" operator, it does not trim them, does not parse any HTML (not even HTML comments that may be present in parameters), does not convert HTML character entities, does not normalize strings, and then does not perform any UCA collation (so "é" would sort *after* "f" and not between "e" and "f").

You may want to call:

{{#ifeq: {{{1|}}} | {{{2|}}}
| <!-- empty -->
| {{#ifexpr: {{#invoke:Modulename|compare|{{{1|}}}|{{{2|}}}}} < 0
  |  {{#ifexist: Category:Relations of {{{1|}}} and {{{2|}}}
     | [[Category:Relations of {{{1|}}} and {{{2|}}}]]
     | [[Category:Relations of {{{2|}}} and {{{1|}}}]]
     }}
  | {{#ifexist: Category:Relations of {{{2|}}} and {{{1|}}}
    | [[Category:Relations of {{{2|}}} and {{{1|}}}]]
    | [[Category:Relations of {{{1|}}} and {{{2|}}}]]
    }}
  }}
}}

However this does not resolve the redirects (#ifexist are giving false hints). For that you need a Lua module that can not only test the effective existence iof either links, and then detect if one is a redirect and get its target (it has to load the page and parse its begining, because MediaWiki still does not expose in Lua if a page is a redirect and what is its target; MediaWiki internally parse pages and detects that and maintain that in a cache that is used when loading any page name via some links, but it does not index that information in an accessible way; loading and parsing the page manually in Lua is a bit costly and errorprone due to the MediaWiki syntax).

So the best way you can do is to use your template with parameters 1 and 2, not perform any test on them. But then update the page containing tranclusions of your template using the explicit parameters values in the correct order (and then swap that order if one is a redirect).

There are other caveats: the parameters 1 and 2 may contain disambiguation suffixes, that may be removed in the binary relation (e.g. "Paris, Texas" and "Austin, Texas": would you name your category as "Relations of Austin, Texas and Paris, Texas", or as "Relations of Austin and Paris (Texas)"... Beware that naming pages automatically is tricky, there are frequently "aliases" (e.g. "Relations of France with the United States" or the reverse, note that there may be other way to express the combination), and some preferences that may change over time (or will need to take into account some decisions, not always the same between countries or languages, and sometimes conflicting). As well you have to manage the possible insertion of articles (like "the" in English) before some entity names, which may not be present when entity names are used alone in page names (e.g. with "United States": "Relations of France and the United States", "Relations of the United Kingdom and the United States", "Relations of the United States and Vietnam").

Such binary relations with arbitrary combination should be avoided, they explode exponentially and are a nightmare to maintain (e.g. for 200 countries, you get almost 40,000 relations, and most of them will be empty; and for ternary relations you'd reach about 8,000,000!). They should be created manually and added individually where relevant.

Dinoguy1000 (talkcontribs)

MediaWiki still does not expose in Lua if a page is a redirect and what is its target

This is blatantly false; the mw.Title library supports finding if a title is a redirect and what title it redirects to, as seen with e.g. w:Template:Target of.

Verdy p (talkcontribs)

Interesting to know, because the last time I checked, there was no such extension in the Scribunto library. So it was added recently after many years asking for it (yes I know it was present in the internal PHP API, but it was not at the original and pages had to be parsed to find if it was a redirect and find its target; this was added to accelerate the navigation, because of course the MediaWiki parser could store the result when parsing a saved page)!

Also please moderate your terms and avoid such fast unthought reply in your first phrase. For many years we had to use a workaround for that (for example in Commons) because there was no such builtin support. And remember that this question is essentially about categories in Commons, rather than English Wikipedia.

Dinoguy1000 (talkcontribs)

isRedirect has been part of Lua since at the latest March 2013; redirectTarget dates to May 2016 (phab:T68974). Hardly "recent" on either account, when we've only had Scribunto/Lua since 2012 or so.

Also please moderate your terms and avoid such fast unthought reply in your first phrase.

Given basically everything I've seen you say/do, and my own past interactions with you, I think I won't, thanks.

Tacsipacsi (talkcontribs)

Actually, it doesn’t really matter whether Scribunto provides information on what MediaWiki thinks to be a redirect; it won’t catch category redirects using c:Template:Category redirect anyway. Category redirects are rarely if ever real redirects.

Verdy p (talkcontribs)

On Commons there are redirects on categories. Especially those given in the example above.

There's a workaround used actually in Commons that can also detect soft redirects on categories, and find their targets (that cannot use the "redirectTarget", which is also very costly, jsut like almost all functions in the "mw.title" module in Lua, and does not work in practice due to its severe limitations). This still requires parsing category pages, because there's still no support in MediaWiki for them (by some extension?). I've tried the "redirectTarget" and yyes your suggestion does not work and is not a correct reply to the request made above, so my reply was correct (absolutely not "blatantly false" as you said with your abusive reply).

And if you (Dinoguy1000) don't want to moderate your terms in direct reply to a thread where you were not involved or cited at all, then you are clearly abusing the contributor terms, because you don't provide any help to any one, and you are here just to cause troubles.

Place Clichy (talkcontribs)

Thanks for the input! I guess that my question is now: is there an available Lua-coded function which compares two text strings alphabetically e.g. is A < B? User:Verdy p mentioned that Lua basically performs a lexicographic comparison of strings with its "<" operator but I'm not sure how I can use this operator, and writing a Lua module entirely for that seems overkill and out of my reach.

The suggestion of putting parameters in the right order in the first place is not feasible, as the template does other things too. Obviously Aircraft of Brazil in France (populated by {{Aircraft of in category|Brazil|France}} is not the same as Aircraft of France in Brazil (populated by {{Aircraft of in category|France|Brazil}}; however both should be in the same parent Category:Relations of Brazil and France.

I do not really intend testing for category redirects. Category redirects are always soft redirects (either on Commons or English Wikipedia), so they're hard to track.

The article before the country name is managed by {{CountryPrefixThe}} and that works well.

Re: other suggestions, there is in fact an implicit assumption that this one template will only be used for country names found in commons:Category:Bilateral relations and its subcategories. There may therefore be no need to clean HTML formatting, disambiguators and the like. In case the bilateral relations category does not exist, the template's code catches in a maintenance category and it can be created manually. Of course, there are some cases that cannot be entirely foreseen, such as the inconsistent use of China vs. People's Republic of China in the bilateral relations category tree, but they can, or have to, be managed manually.

My main concern really is the management of these category redirects related to alphabetical order.

Verdy p (talkcontribs)

Lexicographic comparison means that it only compares the texts byte per byte (it is UTF-8 encoded). Lua strings themselves do not directly handle Unicode and the related UCA collation.

MediaWiki provides an API with the module "ustring", which adds some support for Unicode, but not any comparison operator or UCA collation for now (what it supports is the concept of Unicode "code points" so that a single code point may be encoded on several UTF-8 bytes, and positions for substrings are counted by codepoints, being aware of their variable encoding length; it also provides support for normalization, as well as case conversion needed by MediaWiki for its builting basic parser functions "LC:" "UC:", "LCFIRST", "UCFIRST"; note that it does not perform any MediaWiki parsing, so it's up to the caller to manage trimming).

So for now there's no collation in mw.string, and so no function you can call from it to compare strings. Some modules have defined a "weak" collation algorithm for sorting. But still this won't be sufficient for your need on Commons, because there's actually for now NO standard fixing the order in category names between "Relations of A and B" and "Relations of B and A". So you'll end up having redirecting categories from one to the other (using {{Category redirect|Target name}}: you need to parse in Lua the target page to detect these templates and fing the target that you'd like to link to (and that will solve the ordering problem without needing any collation, and also take into account the problem of variable disambiguation suffixes that may be needed in category names).

As you see, there's no "simple" instantaneous solution. This requires code and tests by navigating all the categories you'll want to link to and find how they are effectively named.

There's a module in Commons for that: "Module:Redirect", but others can be helpful to help you manage category redirections.

One example is "Module:Countries" that performs such detection of redirects (pluis handles know "aliases" for category names that don't always need a disambiguation suffixes, or a leading article), and also provides a basic collation for its listed items (note that they are ordered using their *translated* names, found in Wikidata; that order is "crude", but for now has been sufficient even if technically it's still not fully UCA compliant, and cannot manage collation orders depending on the language used: the order is locale-neutral, similar to the UCA DUCET, except that it sometimes needs tweaks, notably in Chinese where the order of times can be tuned, or in Germanic languages that consider letters with diacritics sorted as primary letters at end of their alphabet and not as secondary variants: tweaking the order is made in data modules).

Reply to "How to compare text strings ?"

#sub,{{#sub:<nowiki>This is a </nowiki>test|1}}

10
Istudymw (talkcontribs)

This should return est!

Verdy p (talkcontribs)

No, the #sub: returns the substring starting at character 1 as specified here (no ending position is specified so this is until the end of the string). #sub is a parser function, so its parameters are NOT '''pre'''processed by Mediawiki, so it can contain any syntax needed, not necessarily MediaWiki or HTML, and it is also not stripped from leading/trailing whitespaces, because the parameter is not named. It is only processed by PHP

So #sub will return "<nowiki>This is a </nowiki>test" unchanged!

(the same would be true if you used a parserfunction call to a Lua module using #invoke).

Parserfunctions can do what ever they want for the parameters. and pass return a string in any format, which can further be used as an argument to call another parserfunction (or Lua module). Only the parserfunction itself can decide whever to strip leading/trailing whitespaces, HTML comment, or "nowiki" tags, At that level MediaWiki only processes pipes (|) to separate characters, and "noinclude" or "includeonly" tags).

Then, after the call, the return value will be processed by Mediawiki: at that time it will process "<nowiki>This is a </nowiki>test" for the rest of the expansion of the page. And then the "nowiki" tags will be considered by MediaWiki and will result into "This is a test", that will be displayed. The effect of "nowiki" tags does not remove the content, it just indicates that the content surrounded by this tag must not be parsed by MediaWiki, if it ever contains some wiki syntax (such as "~~~~" that it would otherwise replace by the user's signature.

You are most probably making a confusion with the "noinclude" tag.

RobinHood70 (talkcontribs)

On both wikis I tried it on, it does return "est". It won't work on Wikimedia wikis, though, as they've disabled the string functions on their wikis.

Verdy p (talkcontribs)

I don't know where you tested it; but clearly "<nowiki>This is a </nowiki>" should not be deleted at all.

However as this tag "nowiki" looks like an HTML tag, it may be stripped by using a function that drops HTML tags to make a plain-text only string, but that implementation would be bogous as well, because in that case it may transform "<span>This is a </span>test" into "test" (not really what HTML considers as the plaintext of the HTML content which should be "This is a test", e.g. when using the standard DOM API for HTML or XML) and certainly not "est" (which makes no sense at all, in HTML, XML, or in MediaWiki!): why do you want to drop an extra character AFTER the closing tag?

So this is likely a bug of the "#sub" parser function implementation (whichs is, as you said, part of the "string functions", which is not enabled on Wikimedia wikis). I tested that "#sub" parser function on other wikis where string functions are enabled, and the "t" after the closing tag is NOT removed. Those wikis that do that may have not been updated with the correct version of string functions to fix that very undesirable bug (their internal code to do HTML tag stripping has a problem, such as a bogous regular expression)

On which wiki do you see that result? Which version of the "string functions" do that use (look at their Special:Version page)?

---

I found a wiki that has that bogous behavior in #sub: Translatewiki.net, which uses incorrect HTML-stripping code that really strips too much where it should return either "This is a test" (if it knows and assumes the semantic of MediaWiki "nowiki" tags), or "test" (if it strips all the "nowiki" element with its content, the same way it would string "ABC<script>...</script>DEF" into "ABCDEF" and not "ABCEF").

RobinHood70 (talkcontribs)

I tested it on my test bed wikis, which are mostly just past the setup point and nothing more. I specifically tested on both 1.29 and 1.35, since I figured Parsoid might make a difference (not that it should, since this is at the pre-processor level, but I figured it was a good idea to try both). I haven't updated it in a while, so I don't have anything more recent installed yet.

I'm not sure I follow your logic on what should be stripped, because you would think that stripping one character would either strip the < off of the <nowiki> or, if it had parsed that properly, it would strip the T from This, not the t from test. I'm assuming that's an artifact of the strip item process, though, so the nowiki section gets ignored entirely.

RobinHood70 (talkcontribs)

Oh and to answer your question about versions, both wikis have ParserFunctions 1.6.0.

Verdy p (talkcontribs)

Note that Parsoid has no effect on that. This is purely a bug inside the implementation of the "string functions" extension (that is not supported directly by Wikimedia wikis and core MediaWiki developers). Instead, Wikimedia uses the supported Scribunto extension and implements these functions in Lua (but not that Translatewiki.net still does not support Scribunto/Lua...)

The effect of "#sub" is very weird, avoid it as much as possible on your wikis! (Note that in Lua, string indexing starts at 1, whereas in string functions, string indexing starts at 0).

If we assume that "#sub" uses string indexing starting at 0, then "<nowiki>This is a </nowiki>test" will be first "HTML-stripped" into "test", then it returns the substring starting at position 1, i.e. drops the first character "t" and returns "est". If string functions were not using "HTML-striping", the result would be "nowiki>This is a </nowiki>test", where it drops only the first "<".

I could test it in a sandbox page of Translatewiki.net, and visibly #sub in string functions really uses string indexing starting at 0, and it first strims its string parameter, then drops all HTML-like or XML-like elements (including "nowiki" even if it's not really HTML or XML) **with** their content, before computing and returning the substring. Because whitespace-trimming is performed first before "HTML tag stripping", if you want to disable the whitespace trimming of the parameter, you can surround that value with "<nowiki/>", so:

  • "ABC{{#sub:<nowiki/> DEF <nowiki/>|1}}XYZ" returns "ABCDEF XYZ"
  • "ABC{{#sub:<nowiki/> <br>DEF <nowiki/>|1}}XYZ" returns "ABCbr>DEF XYZ" (so the "HTML stripping" is not real, apparently it just strips "nowiki" opening and closing tags, **after** the initial whitespace trimming of the argument string)
RobinHood70 (talkcontribs)

As I said, I wouldn't have expected Parsoid to affect the results, since parsing the parameter itself is entirely at the preprocessor level. There was a lot that changed in 1.35 besides Parsoid, though, so I figured it made sense to check both.

As for what's supported by MediaWiki, it's been my experience that they don't seem to realize that not everybody is on the same update cycle they are or running all the same extensions as they are. Even so, at this point, Scribunto and ParserFunctions are both optional. Until they're a required part of the install process, I would expect WMF to support anything that they're distributing. I just checked and at least as of 1.38.1, both are being distributed as optional components.

RobinHood70 (talkcontribs)

You got me curious, so I looked at the version in 1.38 and now I see what's going on. Firstly, it's using the older Parser Function syntax where it parses all of the parameters first and then passes them along to the function that's handling that specific parser function, in this case runsub. So, if I recall correctly, that means the input to the function is converted to "<stripmarker>test". The very first thing runsub does is call killMarkers, so now it's left with just "test". From there, it's obvious why it produces "est".

Edit: I see you've updated your reply with similar info. At least now we understand. And I agree, for straight text, #sub is fine, but for anything out of the ordinary, avoid #sub at all costs.

RobinHood70 (talkcontribs)

You can see the same results at en.uesp.net (which is MW 1.29.3) and starfield.wiki.net (which is on 1.35.2).

Reply to "#sub,{{#sub:<nowiki>This is a </nowiki>test|1}}"

If 1979, create category 1970s

3
AmeliaLH (talkcontribs)

Hi! I'm sorry if this is a stupid question but numbers tend to hurt my brain. On my wiki, I have a template that generates links to specific dates based off three parameters, month, day, and year. However, I want the <code>year</code> field to generate a category for the decade that year is apart of. For example Jan|17|1979 should generate the category, 1970. I feel like this can be done via #expr somewhere but it's going over my head.

Any help would be appreciated.

Aidan9382 (talkcontribs)

If I'm not mistaken, it sounds like something like {{#expr:1979-(1979 mod 10)}} or {{#expr:floor(1979/10)*10}} would do what you are asking for. There may be a simpler solution, but its not coming to my head right now, so hopefully this'll do.

Dinoguy1000 (talkcontribs)

If you have access to stringfunctions, {{ #sub: 1979 | 0 | 3 }}0 will also work. (If you need it to work on years that aren't four digits long, you can manage it by replacing the "3" with {{ #expr: {{ #len: 1979 }} - 1 }}, but by that point you might as well just use one of Aidan's #expr suggestions.)

Reply to "If 1979, create category 1970s"