Wikimedia Developer Summit/2017/Wikitext 2.0

Session Overview
Title: Wikitext 2.0

Day & Time: January 9, 2017; 1:30 - 2:40 pm

Room:

Phabricator Task Link: T151950

Facilitator(s):

Note-Taker(s): Tilman Bayer, Amir Aharoni

Remote Moderator:

Advocate:

Slides: https://commons.wikimedia.org/wiki/File:Wikitext_2.0.wikimedia.devsummit.2017.pdf

Purpose
Present a few different proposals for addressing problems in wikitext and get feedback about them: what seems reasonable, what is too much, what is too little, what things have we not accounted for, etc.

Discussion Topics
[To notetaker(s): What useful insights you would like to provide someone who didn't attend the session? This section is the only one that makes this template different from the Etherpad template's chronology section. It's great if you were able to capture the entire conversation, but, in this section, we encourage you to add only a brief summary covering the key discussion points. Apply your imagination to categorize the content.]

Action Items

Chronology
Subbu: "wikitext 2.0" is an umbrella term for a host of ideas, not a specific proposal

pop quizzes for audience about non-local effects

why is this a problem? (...)

e.g. section editing: preview of section might not be accurate

when I joined Parsoid team back in 2012, I got a lot of sympathetic comments ;)

Can we do anything about it?

Solution constraints: ...

FIve ideas for the future

1. Parsoid and Linter (deployed later this quarter)

2 more structured processing model instead of string-based (Typed wikitext)

3. Force it to be balanced

...

The Parsoid experience:

..

Idea is to leverage this going forward

If you look at corpus, most templates (statistically) are well-behaved

but Parsoid has fair amount of hacks

can we use this information to fix broken wikitext

[slide] Linter: Migrate wikitext patterns

"Typed Wikitext" :

[slide] Why do we have non-local effects

string processing, without sematic information

preprocessing like in C compilers (replace transclusion)

semantic.. can cross boundaries of transclusions

[slide] Typed wikitext

make sure can recognize multi-template blocks

[slide] DOM Forest type

this is the biggest change

unbalanced is rendered as well-formed HTML

types: infobox, navbox (to explore)

[slide] Parsing strategy

[slide] Parsing strategy (contd.)

could generate things prohibited by HTML5

[slide] Pop quiz revisited

[slide] Potential benefits

...simpler codebase, language spec...

...incremental updates (templates etc?)

[slide] Challenges and drawbacks

Most importantly: can not be half-hearted, need to go all the way

[slide] Minimizing disruption

[slide] Balanced Templates

https://phabricator.wikimedia.org/T114445

[slide] Enforcing balance

add marker to well-behaved templates " "

Templates without the marker will render, but be "slower" or "less safe"

[slide] Context & intent matters

 tag in first template leaked out

[slide] Context & intent matters (#2)

b tag doesn't leak out any more

template still doesn't do exactly what [one wants?]

had to close a tag here, open a tag there, did that based on knowledge what template does

[slide] tricky case:  tags

[slide] VAriant: Balance inference

overlap with Subbu's proposal

get performance in all cases instead of just opt-in cases

[slide] Variant: "Zero Parsers in Core"

wikitext is just an extension

Parsers and Tempate engines can be pluggable

[slide} Part 2: Wikitext syntax

[slide] Pop quiz: Templat processing (1)

[slide] Pop quiz: Templat processing (2)

[slide] Pop quiz: Templat processing (3)

again, inserting colon generates newline

C. Scott: Our templates and syntax are coupled together too tightly, which is unfortunate.

[slide] Pop quiz: Escaping

nowikis are a pretty reliable ways for escaping, but they don't always work

= as a template will work, but must exist as such in the wiki.

entities mostly work, except where our regexes disallow them

[slide] Pop quiz: precedence

How are brackets grouped?

right-most opening thing wins

plus special rule for # braces: 3 win over 2

third rule broken constructs get matched

Actually well-defined, Tim wrote this up years ago

[slide] Other syntactic monsters

will skip over these

[slide] Syntax is a problem

Very few 3rd party users of wikitext (perhaps none)

we don't really have an archive of Wikipedia, except archive.org

[slide] Five ideas for the future

(new version of this slide)

"i made a slight change, here are the 500 pages this breaks, please fix them"

now, two crazy ideas for syntax:

[slide] Long

[slide] Long template arguments

(new syntax for unbalanced templates)

[slide] A better way to escape argumentsr[slide] Citation regions

will also let you do citation regions (from Brion[?]'s 2015 Wikimania talk)

[slide] Do we need more arguments

[slide] T149659

[slide] Syntax reform for wikitext

this is the closes we'll get to wikitext 2.0, which is why it's at the end of this talk ;)

"single sheet of paper" eliminate corner cases

elminate backtracking: --> O(n) processing time

[slide] What does it look like?

Change some current syntax:

 bla  -> {  bla  }

Heading
-> {=Heading=}

(and some more)

[slides] Escaping is fundamental

[slide] credits

those were five idea, what are yours?

DISCUSSION
Anomie: re single quote for italics, why not just use ?

CScott: because it's more characters

I'm open to other options

Ed: ...

CScott: using a unique start tag avoids backtracking

Kaldari: benefits for developers obvious, but what are the benefits for editors?

Subbu: for the typed wikitext proposal, more predictable behavior of templates and wikitext

more fine grained editing (at the level of list items, table rows, cells, maybe) and fewer edit conflicts

performance (no longer have to be afraid of editing templates and overloading the cluster)

better editing experience in VE

CScott: gives more powerful option to e.g. edit lists(?) in VE?

Gabriel: what are the key questions you would like to get input on today?

Subbu: mostly, get an understanding on what kind (size) of changes might be acceptable

learn of concerns we don't know about

CScott: 1. everyone should know where it's going, leave with good idea of solution space

2. figure out priorities

Halfaker:

means of migration

I'm totally happy with learning new markup languages, but e.g. the Research namespace on Meta - I couldnt convince Meta community

What are ways for me as early adopter?

Subbu: We are starting to think about it this quarter. Getting Linter up. Not starting from substantial changes, like syntax. Might be abstracted with Content Handler. The problems will begin when we change syntax.

I think most people won't even notice with enforcing typing, it affects mostly template authors

Kinzler:

multi-content revisions are a different topic, there will be an unconference session tomorrow

cross-type transclusions. We have a new type: "tabular data". Will it be able to tranclude it in other content? transclusion using content objects itself?

CScott:

...sometimes extensions need to pass around (more elaborate DOMs ?), e.g. citation list

Forrester: When writing in non-Latin script (e.g. RTL), mixing it with wiki syntax might be terrible. Some communities depend on templates or other tricks to avoid actual wiki syntax.

CScott: that's actually an argument against  and  tags (because the letters are derived rom the English words "bold" and "italic")

JForrester: we have one kind of transclusion that is already different, "File:"

Anomie: but localizing names would making cross-wiki copypast difficult (what's the German equivalent of "nowiki")

CScott: If you use VE, than it's auto-converted.

Ed: VE already can do this[?] with image copypaste, converts to DOM and back to wikitext

Brad: With some of these proposals, will we now have 3 parsers to maintain?

Subbu: The proposal with typed wikitext is to move over completely over time; As for Parsoid, the plan is to move over to using Parsoid instead of the PHP parser

Brad: Question about 3rd parties using Mediawiki

Tim: at offsite last year, discussed converting Parsoid from JS to PHP. Around 35000 lines of code. With HHVM, PHP might be ready for this

Gabriel: session tomorrow to discuss distribution (?) strategies for 3rd parties.

Subbu, CScott: straw poll on these five ideas:

Linting: good: 20 or so? bad:none

Typed wikitext: better semantic - many yes (15 or so?). one no/lower


 * 1) balance:  fewer pros (than preceding items; 10 - 15), no noes

long argument syntax: few pros (similar to #balance), 1 or 2 nos?

reforming wikitext from scratch: yes: ca. 3, no: many more

CScoft:: converting everything to Markdown[?]  (majority against)

Halfak: ways to help?

Subbu: help with talking to communities; come talk to us

CScott:

I wrote equivalent of mwparserfromhell with Parsoid[?], nobody uses it yet, if you have bot tasks etc. I would love to try it out

[question about Markdown idea]

JForrester: third option - add semantic info

JForrester: in VE, we would really like for Reading to move to Parsoid HTML

because then you could copypaste...

but that's hard