Topic on Talk:Parsing/Replacing Tidy

Jump to navigation Jump to search
2A01:E0A:30:2B70:0:0:D724:C9A4 (talkcontribs)

Lot of templates do not test the presence of absence of a leading colon for page names, but simply prepend a ":" always to make sure they'll get a link, and not render a file as an image (or audio/videoplayer), and will not categorize a page, and will not add interwikis. Extra leading colons are harmless, but now the lint checker complains about these and I don't know why it is needed to fix that, given that page names can never start by a colon.

Fixing that in templates is sometimes very complex as it forces testing the values to check if they start by a colon or not, and generate the colon conditionally (and this test increase the expansion nodes count, so it will break several pages, as well it will increase the expansion depth by 1)...

Can't we fix that simply in the new parser instead of asking people to fix pages and templates ? -- Verdy p (talk) 01:21, 22 June 2018 (UTC)

JJMC89 (talkcontribs)

[[::Test]] gives [[::Test]] with either parser.

Verdy p (talkcontribs)

This were harmless, and this is a radical change... All these were equivalent, all of them generating a wikilink to the same page:

  • [[::Test]], [[:Test]], [[Test]]
    (gives now: [[::Test]], Test, Test)
  • [[:::fr:Test]], [[::fr:Test]], [[:fr:Test]]
    (gives now: [[:::fr:Test]], [[::fr:Test]], fr:Test)
    (but not [[fr:Test]] which generates an interwiki metadata and not a link to a resolved wiki)
  • [[::Category:Test]], [[:Category:Test]]
    (gives now: [[::Category:Test]], Category:Test)
    (but not [[Category:Test]] which sets a categorization metadata)

In images, we can use |link= with a value which can be either a wikilink, or a URL (starting by "http:" or "https:" or "//"), the colon may also be used to force a pagename wikilink instead of a URL starting by "http". Testing parameter values to know when to generate or not a colon is complex or will require adding some helper templates to know when to generate a colon depending on specific rules.

I do not see the interest of displaying verbatim "[]" pairs enclosing multiple leading colons except for [[:]] because there's no trailing pagename after the leading colons and it's impossible to generate a link from that.

If an article has to refer to a title starting by ":" we need to change the pagename: do you want to allow pagenames stargting by colons or just a page name with title ":" (that pagename would be quite difficult to use as targets of wikilinks or URLs) ?

Or do you plan to use "::" for adding new syntaxic features in MediaWiki (disambiguating more easily pagenames from other interwiki or namespace or special prefixes)?


Note that we use multiple leading colons with a visual editor (and in this talk thread using "Flow"), the whole link with brackets becomes now surrounded by "nowiki" tags (added silently). This does not happen when using the wikitext editor. I think this silent (and unexpected) addition of "nowiki" is in fact a nuisance (a pollution in fact). This unnecessarily obscures the code (and also caused edit bugs in this message when adding "(gives now: ...)" lines above, where all subsequents tags or wiki markup were corrupted)

Arlolra (talkcontribs)

I believe [[:::fr:Test]] would always have been rendered as plaintext; it's only 1 or 2 leading colons that would have worked. However, 2 colons would have resulted in a leading colon being part of the link text, which differs from the single colon escaping.

Verdy p (talkcontribs)

There's absolutely NO possibility for a link to start with a colon, as it is invalid in every page name (on all wikis, not just those from Wikimedia).

And there's absolutely no point at all in changing that to a plain-text with visible brackets surrounding the text with the multiple leading colons, and no point to enforce it (badly) by silently inserting nowiki tags, which also obscures the wikitext and make it even less editable.

Why do you want to see "[[::" and "]]" as plain-text ? If one really wants to see that, the "nowiki" tags can be added manually to escape them **only** where this plain-text is expected (extremely rare case in fact, compared to the very common cases where extra leading colons may be inserted by templates using optional parameters which may be empty for an optional namespace indicator or interwiki prefix).

Leading colons are used explicitly to force the interpretation as a wikilink and not as a rendered image or categorization metadata, or to make distinction between template names in the template namespace (no leading colon) or a transcluded page from another specified namespace or the root namespace.

This new requirement removes that useful distinction and just breaks many pages and complicates a lot the development of templates by forcing them to inspect the values of substituted parameters (we need now to add various tricky "#if" tests, the expansion time or nesting level is increased, the number of expanded nodes increases by forcing parameters tro be expanded multiple times... in summary this adds additional charges on the server or makes some page impossible now to render correctly due to resource exhaustion caused by these extra tests).

So in summary I do not like this new requirement that just breaks things and makes things just more complicated (and does not even help other parsers to disambiguate things. For me any wikitext sequence matched by this regexp (except those found in "nowiki" sections, or in HTML comments normally stripped in an earlier first stage of the parser, before handling "includeonly", "noinclude" and "onlyinclude" sections in the second earlier stage):

\[\[[ \t]*:*[ \t]*([^:\|\[\]][^\|\]]*)[ \t]*(\|[^\]]+)?\]\]

is a wikilink (or interwiki link) whose target is the page indicated in the first regexp-grouping parentheses in blue (to render as an HTML link with inline text content) if there's 1 or more leading colons (after stripping ignorable whitespaces , just indicated as [ \t]* in this simplified regexp, as there are other ignorable whitespaces), and the content of the second regexp-grouping parentheses in green is the inline content to render in that surrounding HTML link, independantly of the number of colons (indicated in red).

Note that the content matched by the blue group above may include transclusions of templates (or expansion of magic keywords) surrounded by {{...}}; their expansion could return leading colons to discard silently as well if they are in excess and there's already at least one colon before them...

Only when there's no leading colon at all (in red, or in the expansion of wiki templates in the blue group), the target may be interpreted as inline file rendering (image thumbnails, or audio/video player objects), or as a categorization (when the content of the first regexp-parentheses pair starts by the special namespaces names for files or categories); and otherwise it will also generate an HTML link with the inline content of the second group displayed.

Arlolra (talkcontribs)

To be clear, before any changes were made,

[[:{3,} ... ]]

already rendered as plaintext. It was only,

[[:{1,2} ... ]]

that gave the desired wikilink escaping.

The change was made because the 2 was likely the arbitrary result of some refactoring and not an explicit goal. There was no comment in the parser saying why it existed. And the functionality of being able to escape a wikilink did not depend on it.

When it came time to write another wikitext parser, it was a surprising find.

The point of the linting pass was to try and determine the extent to which it was relied upon.

As we saw in the ambassadors thread, it did result in some template authors having to use cumbersome workarounds rather than specifying that page titles passed to the templates shouldn't be manually colon escaped to begin with.

At the time I said I'd be willing to revert the change if it proved too bothersome but that was a year ago and there hasn't been much noise about it.

The proposal you're making here,

[[:+ ... ]]

is obviously more lenient and seems fine, but let's not pretend like that was ever the case.

Reply to "extra leading colons"