Topic on Talk:Parsoid/Language conversion/Preprocessor fixups

Cscott (talkcontribs)
Cscott (talkcontribs)

All wikis except for wikidata are now complete. For some reason the JSON blobs in the wikidata dump seem to take much longer than any other wiki. I'm not entirely sure how/if to fix them, either -- it depends on whether the type of the wikidata item is wikitext, plaintext, or something else. There don't seem to be very many matches in wikidata in any case.

Elitre (WMF) (talkcontribs)
Elitre (WMF) (talkcontribs)
Elitre (WMF) (talkcontribs)
Thiemo Kreuz (WMDE) (talkcontribs)

There are 2 results in other namespaces. I have not looked at them in detail, but I think it's fine to let them break and fix after the fact.

The +8000 results in the main namespace are mostly because of labels of chemicals (obviously from a mass import because they have sequential IDs). These labels are meant to be plain text, and must be wikitext escaped when used in a wikitext context. The code we maintain does this. It might be that Lua modules and other code written by volunteers uses these labels in a wrong way, assuming they don't need escaping. But this is not a new issue introduced by the planned change. Such code already does unexpected things whenever it encounters something that looks like wikitext syntax. If it did not broke before on labels that contain [[, or {{, or ''', it won't break now with -{.

Elitre (WMF) (talkcontribs)

That's reassuring to hear. Danke.

Elitre (WMF) (talkcontribs)

So if I understand it correctly, now the question is, what if a template elsewhere starts embedding that content after the change? should we nowiki everything just to stay safe?

Cscott (talkcontribs)

In general, properly escaping wikitext (other than wrapping <nowiki> around the whole thing) is quite tricky. So I'd hope that any existing code would in fact be just wrapping <nowiki> around everything, and thus wouldn't require any changes. But if someone was trying to be clever and (for example) only escape "special characters" like [[, then they might miss the newly-special -{ sequence. I thought it was worth bringing this to the attention of the wikidata team just in case they knew of any specific code which we could proactively patch.

Elitre (WMF) (talkcontribs)

I'm reading the exchange between DePiep and Thiemo here and (again, if I'm understanding correctly: I wasn't able to track down a chemistry template recalling data from Wikidata) I dunno if we should be worried, because I don't know if templates embedding data from Wikidata actually nowiki anything.

DePiep (talkcontribs)

Please cotinue at: Parsoid/Language conversion/Preprocessor fixups/20170501#wikidatawiki

Example. Such a template would be like :en:Template:Infobox chemical

<nowiki>{{infobox |label1=IUPAC name |data1={{#property|P123456}} }}</nowiki>

With the property value being, example case, "3-<nowiki/>{[(1''S'',2''R'')-2,15-Dimethyl-5,14-dioxotetracyclo]sulfanyl}-propanoic acid" (that is: has the tricky code, unescaped really there/here not for obvious reason).

Note 1: Actually not often applied this way in enwiki chemical templates, because editors are worried about data quality & sourcing.

Note 2: The IUPAC name is not yet a Property in Wikidata. The example stays.

Now, we know that the value is safe within Wikidata. But in this case, it is read into an enwiki article for regular template parameter processing. This way, that value, while safe in WD, can create the error we are trying to prevent, in enwiki.

IKhitron (talkcontribs)

Off-topic: can anyone finally explain me what does the code -{...}- should mean if it is not an error? Thank you.

SSastry (WMF) (talkcontribs)
IKhitron (talkcontribs)

Thank you.