Parsoid/Language conversion/Preprocessor fixups/Edit logbook

This page (b)logs the editing process of the page-lists that were made on 20 March 2017. It is fluid, as in: edits done and developing insight. It is more flexible that its parent page, which should be more stable for future reference and documentation. Posts might be signed. -DePiep (talk) 17:16, 15 April 2017 (UTC)

The March 20 list
On March 20, 2017, sixteen wikis were scanned for "-{" code in source text. They were listed per wiki. These are the pages (articles and non-articles) that might need to be fixed.

(Number of pages may be incorrect, +/- 10%)

ceb
ceb
 * Status: Exclusively valid language converter markup; perhaps LanguageConverter was enabled on cebwiki at one point in the past?

de
de
 * Status:

en
en || 552 || 106 ||
 * Status: Done, see Edit Rules. DePiep (talk) 22:19, 11 April 2017 (UTC)

es
es
 * Status:

fr
fr || 104 || 14
 * Status:

it
it || 77 || 187
 * Status:

ja
ja || 87 || 2
 * Status:

mw*
mw || 35 || 9 ||
 * Status: Some /zh translations are LanguageConverter-related false positives.

nl
nl || 32 || 62
 * Status:

pl
pl || 51 || 2
 * Status:

pt
pt || 73 || 1
 * Status:

ru
ru || 95 || 3
 * Status:

sv
sv || 7 || 4 ||
 * Status: Done, see Edit Rules. -DePiep (talk) 20:47, 14 April 2017 (UTC)

vi
vi || 13 || 9 ||
 * Status: Done, see Edit Rules. Needs some edit requests (convert templates). -DePiep (talk) 22:04, 14 April 2017 (UTC)

war
war || 5 || 1 || * Status: Done, see Edit Rules. -DePiep (talk) 19:26, 14 April 2017 (UTC)

zh
zh || 716|| 117 ||
 * Status: These are probably intentional uses of valid language converter markup, since LanguageConverter is enabled for zhwiki
 * Redlink now but has numbers? -DePiep (talk) 19:16, 15 April 2017 (UTC)

Edit process
I am:
 * At home at enwiki I am a editor (TE, no admin).
 * For the enwiki list I could use my AutoWikiBrowser licence (en:WP:AWB), and of course individual page editing.

I know:
 * I am maintaining several chemicals templates (with that 'IUPAC name'), and that is why I arrived here.
 * I have no grasp of the MW side, the main issue (parsoid, LangConverter, parsing issues). I really edit into the blind for this. However. I do note that the name 'LanguageConverter' is misleading&mdash;Wikidata should do that! It is more like a script convertor ;-).

I do:
 * Only listed pages are approached.
 * Learning: Questions are gathered (e.g. what to do with .js, module pages)
 * All edits are manual (by individual page), or by AWB (check-before-save). No bot.
 * AWB is not a bot&mdash;I must check each edit individually (I do. Then, some are glancing, and some require aedit research check). This also may cause mistakes, at a typo level. Such an error once passing my own check, might be hard to find.


 * enwiki
 * First runs (500 P), got the issue, met questions


 * Other lang wikis
 * Low numbers: did per-page edit (sv, vi, war)
 * Higher numbers: will try to get AWB access for these lang-wikis, per local wiki.
 * zhwiki: the original list gives numbers, but today the page is redlinked. No action by me.
 * met more questions


 * Sister projects (like wikiquote)
 * mw is listed, will approach that one.
 * Other sister project: not listed, no action

Edit Rules

 * This set of Rules is developing by editing experience. April/May 2017.

Edit Rules (or guidelines) as applied:
 * Edits done in the listed pages:
 * Change  into
 * In chemical names. Mostly IUPAC names and similar; could be 75% of all affected pages.
 * When not a balanced full pair:, right hyphen missing so do edit. Example: en:Bulgarian language has "-{ost/est}". Warning: articles about language could very well have language converter code, so for these no edit.
 * In species description. Example: Oloo, G.W. (1975) Sugarcane. 1.- {Aulacaspis} spp. and other scales. (and unbalanced).


 * In module documentation pages. Module:.../doc pages are in Module namespace (so expect Lua code), but /doc pages have regular wikitext by wiki setup.
 * In language construct descriptions. Example: "{NounRoot}- ___ - {PosSuffix}", in [[:en:Heuristic evaluation (and unbalanced).
 * Emoticon by character, like :- {
 * When used as show-the-template trick: {- {Harvnb}}


 * in url: change into  (see en:Alan Turing) (see also discussion)
 * Removed when typo (for example  in wikitable pipe code)
 * Not restricted, do edit:
 * (see exception): When in static archive or log page (mostly before 2010 somehow). Keep unbroken page trumps static-ness. Should and may not change content. (So far, in enwiki and dewiki).
 * Except: when a log page is intended for automated reading (? no excample found).


 * Pending (Rules to be looked at)
 * Protected page (example vi:Bản mẫu:Convert/Dual/LoffAoffDxSoffT) -- needs editrequest + aftercheck.
 * When in filename (imagename): "File:{Subject name in English} relief location map- {language}.svg". Could be both in wikitext or wl. (see example, could not test)


 * Not edited
 * No edit when intended Language converter construct. Expect this in bi-script wikis like zhwiki. (expand this, list bi-wikis? Note that bi-language often means bi-script, like servian).
 * No edit when in ns=Module (Lua code); but did edit when .../doc page.
 * Tricky situation: "--{par="width", ..." is a comment line in Lua. Do not edit.


 * No edit when inside TeX source code description (example: en:File:Homotopy lifting property.svg. Image home pages tend to have this description).
 * No edit when in Regex string:  (example from vi:MediaWiki:Gadget-navpop.js). .js page exempt same as module? (See note R)
 * No edit when in &lt;math> code. (? not seen yet, no example).


 * Note R
 * These rules might need Refinement, to select the true positives for editing. Could be when seeing the page.

Discussion

 * -{ in url (en:Alan Turing)
 * Better not : en:because opening curly-bracket is a reserved character in en:CS1, en:CS2
 * Note that Help:CS1 says that the { should escaped when it occurs in urls used in citation template parameters. So, only when a "-{" occurs in a url in a citation template parameter, do you need to escape both of those characters. But, if if the "-{" occurs in a url outside of that, it is sufficient to just escape "-". SSastry (WMF) (talk) 18:05, 11 May 2017 (UTC)
 * Would it hurt or break anything if I did it everywhere for the -{ sequence in an url? Any advice to restrain outside of enwiki? -DePiep (talk) 18:09, 11 May 2017 (UTC)
 * No, it it safe to escape both always. I was just clarifying the guideline. SSastry (WMF) (talk) 18:12, 11 May 2017 (UTC)
 * If  is unsafe in citation template parameters, you should never find a   to begin with, right?  It should already be , which should be fine as-is. cscott (talk) 18:23, 11 May 2017 (UTC)
 * , said it right: "should not find". I don't know how or if the enwiki CS-people have cleaned this up yet (they take errors in batches, by categorising etc.). Did find them in the Alan Turing example. Not an issue, I pick it up walking. -DePiep (talk) 19:17, 11 May 2017 (UTC)

Future
After processing the list (all wikis), we could


 * notepad notes only


 * do
 * rerun the same listing by same regex. (expect: 10% of 1500 = 150 pages to reconsider)
 * refine the regex
 * list all lang-wikis
 * list all sister wikis
 * when ns:module, add some marking

For me and for this: maybe not needed (easier to run?!)
 * do not
 * not the quotes -- we should understand the situations by then (they do help now!)
 * no split over url-chem-other subsets (because we want automated approach anyway)
 * no split per wiki (automated approach)


 * q
 * btw, what if in 2019 someone enters "-{" in enwiki?