Topic on Extension talk:TemplateStyles

deduplication of scoped styles : max stripped size limit exploded.

4
Verdy p (talkcontribs)

Currently the styles are supposed to be scoped. This means that each reference to the stylesheet causes the whole stylesheet to be "stripped" (and the size of the stylesheet) is cumulated repeatedly for each reference, even if the stylesheet is finally "deduplicated" by replacing further references to the same stylesheet by a "link" element. But for page that use the same stylesheet repeatedly, this has the same effect (in terms parser limits) as if we had included the same styles repeatedly.

So we still easily reach the 5MB limit of "stripped contents" and we need complex code to determine when to include or not the "templatestyle" element and avoid expanding the stripped contents. And this happens **even** if the stripped content is identical.

Things would be easier if the "<templatestyle>" tag had a simple "scope=content" attribute, saying that the content does not need to be stripped, and that instead the stylesheet will be generated once (preferably in the page header) and that no further including on the page content of additional "link" elements will be needed.

This does not change the rule about rewriting the all selectors found in the stylesheet in the ".mw-content" element (so that these styles cannot be used to override the other parts of the page.

This would save lot of memory, by just assuming that pages using such stylesheet will not use selectors that may conflict with other contents of the page.

Is it possible to add this "scope=content" attribute or possibly allow to specify additional selectors to restrict the stylesheet to other containers of the page ?

Note that stylesheets may be quite long, containing multiple contextual styles for different classes.

So for example the pages about Unicode Tables in French Wikipedia. Now it works but it is still fragile and it requires complex management for tracking when the same stylesheet may be referenced (basically the code ensures that the stylesheet is now referenced only at the start of each table, but no longer for each cell, or table row (a flag is used to indicate when the stylesheet should be included, the absence meaning that the stylesheet is assumed to be already loaded and the classes are then used directly without reincluding the same stylesheet again and again.

Without these flags, the parser rapidly explode the "max stripped size limit", notably when the stylesheet grows in size (for example when adding new styles for other new Unicode scripts for which we define a set of usable fonts for each script, or because new fonts are made available to support that script), or if we make too many identical references to the same stylesheet, from multiple inclusions of templates.

Another way to manage this would be that a stripped content of the stylesheet does not need to be counted multiple times: a single copy will be generated and all further references will reuse the same generated stylesheet ID for which all the parser will have to do is to just generate the "link" element with that existing ID.

Tgr (talkcontribs)

This would be more useful as a bug report, with less suggestions and instead the exact error you are getting (and ideally a link to a page where the error is reproduced).

Verdy p (talkcontribs)

This concerns all these pages liked together: https://fr.wikipedia.org/wiki/Table_des_caract%C3%A8res_Unicode_(0000-0FFF)

They use custom style to render all cells with common styles, plus specific styles per script (basically a set of fonts usable for each script).

You may see that each cell calls a template called "Uni", or "UniCtrl", or UniCtrlForm", or "UniCombXxxx" (one for each script defining combining characters), with a suitable base character (needed notably for Indic scripts that require special handling) or "UniCombDouble" (for double diacritics)

Initially these tables used incline styles, but this limited the evolution of the tables and the decription of details, or inclusion of links on each character. So these pages started to explode. I coverted them to use template styles, for a few scripts, but when the number of scripts to support grew, I immediately reached the 5MB limit because each invokation of "<templatestyles>" even with exactly the same stylesheet, caused the FULL stylesheet to be counted multiple times (and in fact parsed multiple times to "sanitized CSS").

Currently the code does not seem to detect that the same stylesheet is referenced multiple times. Each inclusion causes the size of the "stripped" stylesheet to be counted. On a single page where the style sheet could be referenced up to 4096 times, this means that the same stylesheet was counted as 4096 times its size, even if it was exactly the same one (same generated stylesheet id). In addition each invokation caused an identical "link" element to be generated. Only the first occurence of the stylesheet did not have the "link" element, but directly the "style" element containing the whole "sanitized CSS".

So the computed limit was wrong. This had finally the effect that MediaWiki generated lot of red error messages in the page because of the maximum stripped size reached.

This is a bad behavior: if the same stylesheet is referenced again, you don't need to count again the size of the repeated stylesheet and don't even need to generate the "link" element. The stylesheet can be included once in the page header, it is santitized only once, and no "link" element is ever needed !

Verdy p (talkcontribs)

Note that there are no more errors, because I found a way to avoid the error (basically the invokation to the "<templatestyles>" is avoided most of the time using a condition in the template

{{#if:{{{styles|}}|<templatestyles ...>}}

but then I need to say somewhere in the Wikipage where I need to set the styles=1 option (to simplify things, I have set styles=1 only in the table header, but you can see that each table in this page above generates a copy of the link element, and if you look at parser statistics in the HTML comments at end of the content, you can see that it reached about 170KB when the actual stylesheet is much smaller: there are 36 tables in the page, so the sanitized stylesheet size is counted 36 times).

Normally such conditional #if should not even be needed, we should just use templatestyles directly, and MediaWiki will detect itself when a new stylesheet is needed and must be sanitized, and then generated in the page header.

I don't see any interest of inserting multiple times in the same page the same "link" element referencing exactly the same stylesheet (same generated sanitized stylesheet id). A single inclusion of the "style" element would be enough, and no further "deduplication" of stripped content is ever needed (if you want to count them, each distinct stylesheet should be counted only once, it should be parsed and sanitized only once)

Reply to "deduplication of scoped styles : max stripped size limit exploded."