Wikimedia Developer Summit/2017/Wikitext 2.0
Session Overview
[edit]Title: Wikitext 2.0
Day & Time: January 9, 2017; 1:30 - 2:40 pm
Room:
Phabricator Task Link: T151950
Facilitator(s):
Note-Taker(s): Tilman Bayer, Amir Aharoni
Remote Moderator:
Advocate:
Session Summary
[edit]Purpose
[edit]Present a few different proposals for addressing problems in wikitext and get feedback about them: what seems reasonable, what is too much, what is too little, what things have we not accounted for, etc.
Agenda
[edit]Style
[edit]Discussion Topics
[edit][To notetaker(s): What useful insights you would like to provide someone who didn't attend the session? This section is the only one that makes this template different from the Etherpad template's chronology section. It's great if you were able to capture the entire conversation, but, in this section, we encourage you to add only a brief summary covering the key discussion points. Apply your imagination to categorize the content.]
Action Items
Chronology
[edit]Subbu: "wikitext 2.0" is an umbrella term for a host of ideas, not a specific proposal
pop quizzes for audience about non-local effects
why is this a problem? (...)
e.g. section editing: preview of section might not be accurate
when I joined Parsoid team back in 2012, I got a lot of sympathetic comments ;)
Can we do anything about it?
Solution constraints: ...
FIve ideas for the future
1. Parsoid and Linter (deployed later this quarter)
2 more structured processing model instead of string-based (Typed wikitext)
3. Force it to be balanced
...
The Parsoid experience:
..
Idea is to leverage this going forward
If you look at corpus, most templates (statistically) are well-behaved
but Parsoid has fair amount of hacks
can we use this information to fix broken wikitext
[slide] Linter: Migrate wikitext patterns
"Typed Wikitext" :
[slide] Why do we have non-local effects
string processing, without sematic information
preprocessing like in C compilers (replace transclusion)
semantic.. can cross boundaries of transclusions
[slide] Typed wikitext
make sure can recognize multi-template blocks
[slide] DOM Forest type
this is the biggest change
unbalanced is rendered as well-formed HTML
types: infobox, navbox (to explore)
[slide] Parsing strategy
[slide] Parsing strategy (contd.)
could generate things prohibited by HTML5
[slide] Pop quiz revisited
[slide] Potential benefits
...simpler codebase, language spec...
...incremental updates (templates etc?)
[slide] Challenges and drawbacks
Most importantly: can not be half-hearted, need to go all the way
[slide] Minimizing disruption
[slide] Balanced Templates
https://phabricator.wikimedia.org/T114445
[slide] Enforcing balance
add marker to well-behaved templates "{{#balance}}"
Templates without the marker will render, but be "slower" or "less safe"
[slide] Context & intent matters
<b> tag in first template leaked out
[slide] Context & intent matters (#2)
b tag doesn't leak out any more
{{Tb}} template still doesn't do exactly what [one wants?]
had to close a tag here, open a tag there, did that based on knowledge what template does
[slide] tricky case: <a> tags
[slide] VAriant: Balance inference
overlap with Subbu's proposal
get performance in all cases instead of just opt-in cases
[slide] Variant: "Zero Parsers in Core"
wikitext is just an extension
Parsers and Tempate engines can be pluggable
[slide} Part 2: Wikitext syntax
[slide] Pop quiz: Templat processing (1)
[slide] Pop quiz: Templat processing (2)
[slide] Pop quiz: Templat processing (3)
again, inserting colon generates newline
C. Scott: Our templates and syntax are coupled together too tightly, which is unfortunate.
[slide] Pop quiz: Escaping
nowikis are a pretty reliable ways for escaping, but they don't always work
{{=}} as a template will work, but must exist as such in the wiki.
entities mostly work, except where our regexes disallow them
[slide] Pop quiz: precedence
How are brackets grouped?
right-most opening thing wins
plus special rule for # braces: 3 win over 2
third rule broken constructs get matched
Actually well-defined, Tim wrote this up years ago
[slide] Other syntactic monsters
will skip over these
[slide] Syntax is a problem
Very few 3rd party users of wikitext (perhaps none)
we don't really have an archive of Wikipedia, except archive.org
[slide] Five ideas for the future
(new version of this slide)
"i made a slight change, here are the 500 pages this breaks, please fix them"
now, two crazy ideas for syntax:
[slide] Long
[slide] Long template arguments
(new syntax for unbalanced templates)
[slide] A better way to escape argumentsr[slide] Citation regions
will also let you do citation regions (from Brion[?]'s 2015 Wikimania talk)
[slide] Do we need more arguments
[slide] T149659
[slide] Syntax reform for wikitext
this is the closes we'll get to wikitext 2.0, which is why it's at the end of this talk ;)
"single sheet of paper" eliminate corner cases
elminate backtracking: --> O(n) processing time
[slide] What does it look like?
Change some current syntax:
'''bla''' -> {''bla''}
==Heading== -> {=Heading=}
(and some more)
[slides] Escaping is fundamental
[slide] credits
those were five idea, what are yours?
DISCUSSION
[edit]Anomie: re single quote for italics, why not just use <i>?
CScott: because it's more characters
I'm open to other options
Ed: ...
CScott: using a unique start tag avoids backtracking
Kaldari: benefits for developers obvious, but what are the benefits for editors?
Subbu: for the typed wikitext proposal, more predictable behavior of templates and wikitext
more fine grained editing (at the level of list items, table rows, cells, maybe) and fewer edit conflicts
performance (no longer have to be afraid of editing templates and overloading the cluster)
better editing experience in VE
CScott: gives more powerful option to e.g. edit lists(?) in VE?
Gabriel: what are the key questions you would like to get input on today?
Subbu: mostly, get an understanding on what kind (size) of changes might be acceptable
learn of concerns we don't know about
CScott: 1. everyone should know where it's going, leave with good idea of solution space
2. figure out priorities
Halfaker:
means of migration
I'm totally happy with learning new markup languages, but e.g. the Research namespace on Meta - I couldnt convince Meta community
What are ways for me as early adopter?
Subbu: We are starting to think about it this quarter. Getting Linter up. Not starting from substantial changes, like syntax. Might be abstracted with Content Handler. The problems will begin when we change syntax.
I think most people won't even notice with enforcing typing, it affects mostly template authors
Kinzler:
multi-content revisions are a different topic, there will be an unconference session tomorrow
cross-type transclusions. We have a new type: "tabular data". Will it be able to tranclude it in other content? transclusion using content objects itself?
CScott:
...sometimes extensions need to pass around (more elaborate DOMs ?), e.g. citation list
Forrester: When writing in non-Latin script (e.g. RTL), mixing it with wiki syntax might be terrible. Some communities depend on templates or other tricks to avoid actual wiki syntax.
CScott: that's actually an argument against <b> and <i> tags (because the letters are derived rom the English words "bold" and "italic")
JForrester: we have one kind of transclusion that is already different, "File:"
Anomie: but localizing names would making cross-wiki copypast difficult (what's the German equivalent of "nowiki")
CScott: If you use VE, than it's auto-converted.
Ed: VE already can do this[?] with image copypaste [ ie. "mini| .. ..> "[[File:...|thumb|..]] ?], converts to DOM and back to wikitext
Brad: With some of these proposals, will we now have 3 parsers to maintain?
Subbu: The proposal with typed wikitext is to move over completely over time; As for Parsoid, the plan is to move over to using Parsoid instead of the PHP parser
Brad: Question about 3rd parties using Mediawiki
Tim: at offsite last year, discussed converting Parsoid from JS to PHP. Around 35000 lines of code. With HHVM, PHP might be ready for this
Gabriel: session tomorrow to discuss distribution (?) strategies for 3rd parties.
Subbu, CScott: straw poll on these five ideas:
Linting: good: 20 or so? bad:none
Typed wikitext: better semantic - many yes (15 or so?). one no/lower
#balance: fewer pros (than preceding items; 10 - 15), no noes
long argument syntax: few pros (similar to #balance), 1 or 2 nos?
reforming wikitext from scratch: yes: ca. 3, no: many more
CScoft:: converting everything to Markdown[?] (majority against)
Halfak: ways to help?
Subbu: help with talking to communities; come talk to us
CScott:
I wrote equivalent of mwparserfromhell with Parsoid[?], nobody uses it yet, if you have bot tasks etc. I would love to try it out
[question about Markdown idea]
JForrester: third option - add semantic info
JForrester: in VE, we would really like for Reading to move to Parsoid HTML
because then you could copypaste...
but that's hard
<end of session>