Topic on Extension talk:SyntaxHighlight

Novem Linguae (talkcontribs)

Is lang="regex" supported? Is there an alias for it that I'm missing? If not supported, what's the proper repo to request its addition?

Dinoguy1000 (talkcontribs)

Regex might be supported as part of Perl (that's the first thing I'd try after lang="regex", at least). If it's not supported at all (or you can't find any other language that supports it), the place to request support would be the Pygments project, after which you'd need to file a task on Wikimedia Phabricator to have the Pygments version used in this extension updated.

Stang (talkcontribs)
Dinoguy1000 (talkcontribs)

Yes, that's what I said.

PerfektesChaos (talkcontribs)

Well, the problem is that there is no unique RegExp syntax but a large variety of dialects. They all need different lexers since some elements are permitted in the one language but unknown and an error within the other context.

Verdy p (talkcontribs)

Since the transition to pygments, there's no longer any support for any regular expression syntax, notably lang="pcre" (which was commonly used in various talk pages, notably on Wikidata which uses PCRE extensively), or the legacy lang="regexp". All these talk pages are now listed in a tracking category for all kinds of errors produced by "syntaxhighlight" (without distinguishing any tracking category for languages indicated in the "lang" attribute" but that are not (or no longer) recognized. PCRE regular expressions are not complex to parse, but this should recognize all features of PCRE 5+, including multiline expression with whitespace compression, comments (important in talk pages), and distinctive coloring for inners of character classes, open/close brackets, operators, escapes, defined label names, and a few keywords in special constructs. The "regexp" language is just a subset (without multiline/comments and a much more limited syntax). As well we should have a recognizer for Lua patterns (lang="luapattern"?) which are much more restricted (notably no support for repetitions in a limited range, or for '|' alternates, or defined subpatterns, or extended character classes based on POSIX or Unicode character properties).

Verdy p (talkcontribs)

As well, before we could use lang="HTML5", which was working for colorizing most the Mediawiki wiki syntax (but not lang="html"). Now we have to use lang="html" (but not lang="html5", which is not recognized). There's an evident need to create some common aliases.

Reply to "RegEx"