Topic on Talk:Page Previews/API Specification

Add markup for chemical formulas

6
Summary by Jdlrobson

TextExtracts removes `span` from the output whereas the new endpoint retains them. This means Template:chem works fine with the new endpoint extracts.

OVasileva (WMF) (talkcontribs)
Phuedx (WMF) (talkcontribs)

@OVasileva (WMF): Just to note that those examples are from enwiki. HTML previews are only enabled on the Beta Cluster wikifarm. This may explain the behavior in your last two examples.

Now, the chem template produces spans with inline styles, not super- or subscript tags (sup and sub respectively). For the former case, TextExtracts will strip the inline styles from the span, thus losing its formatting; for the latter case, TextExtracts will preserve the tags.

The reason that the chem template produces spans with inline styles is so that it can correctly format things like charge. Consider the example in the template's documentation: if the template were to produce sup and sub tags, then the 4 and 2- in the example wouldn't be above and below one another.

I see two ways of solving this.

  1. The chem template does wrap its output in a span class="chemf" element. We could disable the inline style stripping behavior for this wrapper element and its children.
  2. Create a page-preview-preserve or pp-preserve class, which, like the above, would disable all stripping behavior in the API for this element. We'd then ask the chem template author(s) to use the class in their template.

I'm leaning towards something like #2 as it'd also allow editors to mark, say, parentheticals that they want preserved in the preview… OTOH #1 would be quicker to implement but puts the burden of finding and adding exceptions to the whitelist firmly on the maintainers of Popups.

Also, I've updated the spec to add an exception for sup and sub tags.

Phuedx (WMF) (talkcontribs)

@OVasileva (WMF): As we discussed today, there is a third alternative:

We could conditionally disable the inline style stripping and span flattening behaviour on one or more wikis – these are the two processing steps that break the output of the chemf template for HTML previews – and test whether it causes more harm than good.


P.S. As a Sandman fan, you may be thinking of the same scene as I am when I say "there is a third alternative".

Jdlrobson (talkcontribs)

If we're not using TextExtracts this example works fine. The new endpoint generates a great summary for this example page.

Phuedx (WMF) (talkcontribs)

@Jdlrobson: For reference, please could you link a paste or dump the summary?

Jdlrobson (talkcontribs)

Summaries look like so

http://0.0.0.0:6927/en.wikipedia.org/v1/page/preview-html/Hydrogen_peroxide

<p style="font-size:100%; line-height:1;"><b>Hydrogen peroxide</b> is a <a href="/wiki/Chemical_compound" title="Chemical compound">chemical compound</a> with the formula <span class="chemf nowrap">H<span style="display:inline-block;margin-bottom:-0.3em;vertical-align:-0.4em;line-height:1em;font-size:80%;text-align:left"><br>
2</span>O<span style="display:inline-block;margin-bottom:-0.3em;vertical-align:-0.4em;line-height:1em;font-size:80%;text-align:left"><br>
2</span></span>. In its pure form, it is a colourless <a href="/wiki/Liquid" title="Liquid">liquid</a>, slightly more <a href="/wiki/Viscosity" title="Viscosity">viscous</a> than <a href="/wiki/Properties_of_water" title="Properties of water">water</a>. Hydrogen peroxide is the simplest <a href="/wiki/Peroxide" title="Peroxide">peroxide</a> (a compound with an oxygen–oxygen <a href="/wiki/Single_bond" title="Single bond">single bond</a>). It is used as an <a href="/wiki/Oxidizer" class="mw-redirect" title="Oxidizer">oxidizer</a>, <a href="/wiki/Bleach" title="Bleach">bleaching</a> agent and <a href="/wiki/Disinfectant" title="Disinfectant">disinfectant</a>. Concentrated hydrogen peroxide, or "<a href="/wiki/High-test_peroxide" title="High-test peroxide">high-test peroxide</a>", is a <a href="/wiki/Reactive_oxygen_species" title="Reactive oxygen species">reactive oxygen species</a> and has been used as a <a href="/wiki/Propellant" title="Propellant">propellant</a> in <a href="/wiki/Rocket" title="Rocket">rocketry</a>.<sup id="cite_ref-4" class="reference"><a href="#cite_note-4">[4]</a></sup> Its chemistry is dominated by the nature of its unstable <a href="/wiki/Peroxide" title="Peroxide">peroxide</a> bond.</p>

http://0.0.0.0:6927/en.wikipedia.org/v1/page/preview-html/Dioxygenyl

<p>The <b>dioxygenyl</b> <span>ion</span>, <span class="chemf nowrap">O<span style="display:inline-block;margin-bottom:-0.3em;vertical-align:-0.4em;line-height:1em;font-size:80%;text-align:left">+<br>2</span></span>, is a rarely-encountered <span>oxycation</span> in which both <span>oxygen atoms</span> have a formal <span>oxidation state</span> of +<span class="nowrap"><span> </span></span><span class="frac nowrap"><sup>1</sup><span>⁄</span><sub>2</sub></span>. It is formally derived from <span>oxygen</span> by the removal of an <span>electron</span>:</p>