Parsoid/Language conversion/Preprocessor fixups/Edit logbook

This page (b)logs the editing process of the page-lists that were made on 20 March 2017. It is fluid, as in: edits done and developing insight. It is more flexible that its parent page, which should be more stable for future reference and documentation. Posts might be signed. -DePiep (talk) 17:16, 15 April 2017 (UTC)

The March 20 list
On March 20, 2017, sixteen wikis were scanned for "-{" code in source text. They were listed per wiki. These are the pages (articles and non-articles) that might need to be fixed.

(Number of pages may be incorrect, +/- 10%)

ceb
ceb
 * Status: Exclusively valid language converter markup; perhaps LanguageConverter was enabled on cebwiki at one point in the past?

de
de
 * Status:

en
en || 552 || 106 ||
 * Status: Done, see Edit Rules. DePiep (talk) 22:19, 11 April 2017 (UTC)

es
es
 * Status:

fr
fr || 104 || 14
 * Status:

it
it || 77 || 187
 * Status:

ja
ja || 87 || 2
 * Status:

mw*
mw || 35 || 9 ||
 * Status: Some /zh translations are LanguageConverter-related false positives.

nl
nl || 32 || 62
 * Status:

pl
pl || 51 || 2
 * Status:

pt
pt || 73 || 1
 * Status:

ru
ru || 95 || 3
 * Status:

sv
sv || 7 || 4 ||
 * Status: Done, see Edit Rules. -DePiep (talk) 20:47, 14 April 2017 (UTC)

vi
vi || 13 || 9 ||
 * Status: Done, see Edit Rules. Needs some edit requests (convert templates). -DePiep (talk) 22:04, 14 April 2017 (UTC)

war
war || 5 || 1 || * Status: Done, see Edit Rules. -DePiep (talk) 19:26, 14 April 2017 (UTC)

zh
zh || 716|| 117 ||
 * Status: These are probably intentional uses of valid language converter markup, since LanguageConverter is enabled for zhwiki
 * Redlink now but has numbers? -DePiep (talk) 19:16, 15 April 2017 (UTC)

Edit process
I am:
 * At home at enwiki I am a editor (TE, no admin).
 * For the enwiki list I could use my AutoWikiBrowser licence (en:WP:AWB), and of course individual page editing.

I know:
 * I am maintaining several chemicals templates (with that 'IUPAC name'), and that is why I arrived here.
 * I have no grasp of the MW side, the main issue (parsoid, LangConverter, parsing issues). I really edit into the blind for this. However. I do note that the name 'LanguageConverter' is misleading&mdash;Wikidata should do that! It is more like a script convertor ;-).

I do:
 * Only listed pages are approached.
 * Learning: Questions are gathered (e.g. what to do with .js, module pages)
 * All edits are manual (by individual page), or by AWB (check-before-save). No bot.
 * AWB is not a bot&mdash;I must check each edit individually (I do. Then, some are glancing, and some require aedit research check). This also may cause mistakes, at a typo level. Such an error once passing my own check, might be hard to find.


 * enwiki
 * First runs (500 P), got the issue, met questions


 * Other lang wikis
 * Low numbers: did per-page edit (sv, vi, war)
 * Higher numbers: will try to get AWB access for these lang-wikis, per local wiki.
 * zhwiki: the original list gives numbers, but today the page is redlinked. No action by me.
 * met more questions


 * Sister projects (like wikiquote)
 * mw is listed, will approach that one.
 * Other sister project: not listed, no action

Edit Rules

 * This set of Rules is developing by editing experience. April 2017.

Edit Rules as applied:
 * Edits done in the listed pages:
 * Change  into
 * In chemical names (mostly IUPAC names and similar; could be 75% of all affected pages)
 * When not a balanced full pair (like, right hyphen missing so do edit)
 * In species description (example: Oloo, G.W. (1975) Sugarcane. 1.- {Aulacaspis} spp. and other scales; note: closing hyphen not present)
 * In module documentation pages ( Module:.../doc ; /doc has regular wikitext)
 * In language construct descriptions: "{NounRoot}- ___ - {PosSuffix}" in [[:en:Heuristic evaluation


 * in url: change into  (example Alan Turing)
 * Better not : en:because opening curly-bracket is a reserved character in en:CS1, en:CS2
 * Note that Help:CS1 says that the { should escaped when it occurs in urls used in citation template parameters. So, only when a "-{" occurs in a url in a citation template parameter, do you need to escape both of those characters. But, if if the "-{" occurs in a url outside of that, it is sufficient to just escape "-". SSastry (WMF) (talk) 18:05, 11 May 2017 (UTC)
 * Would it hurt or break anything if I did it everywhere for the -{ sequence in an url? Any advice to restrain outside of enwiki? -DePiep (talk) 18:09, 11 May 2017 (UTC)
 * No, it it safe to escape both always. I was just clarifying the guideline. SSastry (WMF) (talk) 18:12, 11 May 2017 (UTC)
 * If  is unsafe in citation template parameters, you should never find a   to begin with, right?  It should already be , which should be fine as-is. cscott (talk) 18:23, 11 May 2017 (UTC)


 * Removed when typo (for example  in wikitable pipe code)


 * Pending
 * When in static archive or log page (mostly before 2010).
 * Protected page (example vi:Bản mẫu:Convert/Dual/LoffAoffDxSoffT)


 * Listed pages not edited:
 * No edit when in ns Module (Lua code; but did edit when .../doc page)
 * No edit when (possibly) intentionally used for LanguageConverter: en:Bulgarian language has "-{ost/est}" (See note R/hyphen missing, so do edit)
 * No edit when inside TeX source code description (example: en:File:Homotopy lifting property.svg). (See note R)
 * No edit when in Regex string:  (example from vi:MediaWiki:Gadget-navpop.js). .js page exempt same as module? (See note R)


 * Note R
 * These rules might need refinement, to select the true positives for editing.
 * To check: when unbalanced hyphen, then no confusion (do edit "-{ost/est}").

Future
After processing the list (all wikis), we could


 * notepad notes only


 * do
 * rerun the same listing by same regex. (expect: 10% of 1500 = 150 pages to reconsider)
 * refine the regex
 * list all lang-wikis
 * list all sister wikis
 * when ns:module, add some marking

For me and for this: maybe not needed (easier to run?!)
 * do not
 * not the quotes -- we should understand the situations by then (they do help now!)
 * no split over url-chem-other subsets (because we want automated approach anyway)
 * no split per wiki (automated approach)


 * q
 * btw, what if in 2019 someone enters "-{" in enwiki?