Topic on Talk:Wikimedia Developer Summit/2017/Topic ideas

WikiDev17 topic: Wikitext

11
RobLa-WMF (talkcontribs)
This is the text of the "Wikitext" section of WikiDev17/Topic ideas as of this writing

How do we make manipulating the information on our websites easier and more useful? (both for humans and computers). Improving Modular Wikitext Maintenance

  • Infoboxes from wikidata, categories from wikidata, wikidata in commons, oh my!
  • Visual editing of templates, alternative template mechanisms, etc
  • Wikitext 2.0 -- how to shave off the rough edges but still provide a text-based power-user editing interface
  • Global pages, Global templates, etc
  • Improving composition of text and media content on the page
  • Moving to a Glossary model for LanguageConverter rules
  • Splitting metadata (categories, page flags, etc) from content in the DB
  • Multi-Content Revisions

Fora: wikitech-l, wikitext-l, and Wikidata readers and participants

Qgil-WMF (talkcontribs)

Isn't "Wikitext" a too wide umbrella for a Summit main topic?

Cscott (talkcontribs)

It may be a wide umbrella, but the Venn diagram combining wikitext development with typical summit participants is small! Most WMF engineers don't actually work directly with wikitext or the wikitext workflow (templates, scripts, gadgets), odd though that may sound. This suggests that actual breadth of topic will be limited in practice by the expertise of the folks we can recruit to participate.

(And besides, "wikitext" was Rob's title. My topic description was "maintenance of wikitext", more or less.)

RobLa-WMF (talkcontribs)

A couple months ago, @John Vandenberg and I had a great conversation about the future of Wikitext (in the "CommonMark" topic on the Markdown RFC). He wrote:

[A world where Markdown becomes the accepted standard for wikitext is a world] in which the MediaWiki (and wiki) technical community has failed (that includes me), as the world overtook it with a less useful format, and the generic terms 'wiki' and 'markup' have lost their meaning, and the Wikimedia content is stored in a format that is no longer a format that MediaWiki believes and invests in.

He suggests that MediaWiki wikitext is still in the running, and that we should do a few things if we believe in the format. His suggestions:

Invest heavily [into the Wikitext spec], and into alternative/reference implementations, including providing a saner version of the syntax that cater for the needs of other organisations that consider the security and processing overhead of wikitext to be problems compared to markdown. It should be the default for new MediaWiki installs, and older installs should have a nice tool that converts the crazy unspecified wikitext into 'wikitext-simplified' where possible.
Work with other wiki vendors that might want to make 'markdown' syntax an implicit choice, not clearly identified as markdown syntax, stressing that causes confusion for users.
And work with other wiki vendors to increase compatibility between their syntax and ours, so that 'wikitext' and 'wikitext-simplified' are viable long term formats.

Other wiki vendors have given us up for dead (see Google Trends comparison for MediaWiki and Markdown over time). Let's not roll over and play dead. WikiDev17 seems like a great opportunity to gather the wiki markup community and inclusively shape our long term markup strategy.

Cscott (talkcontribs)

Hm, I'm not sure I agree that we should double down on wiki markup. That's partly the sunk-cost fallacy I think. Markdown has succeeded in large part because it's generally *nicer to write*, with fewer corner cases in the syntax and fewer special cases that apply only to mediawiki ([[..]] syntax, magic RFC links, {{..}} behavior, etc). I certainly prefer to write markdown where possible, although that's in large part to the `...` construct, which is extremely useful in the type of writing I do most often.

We have written a markdown-to-wikitext converter using Parsoid. The main thing that "the less useful format' (markdown) is missing is an extension mechanism (which other competitors such as restructured text include). If a standard extension mechanism could be defined for markdown, then you could build a template mechanism and special "link to an WMF project" syntax on top of that.

As it turns out, the template mechanism we have for wikitext is pretty fragile as well, in addition to being completely non-portable to non-mediawiki usage. And it's pretty fundamental! So thinking hard about a more elegant and general template mechanism -- or at least cleaning up some of the grungy corners of our existing mechanism, like token concatenation and start-of-line fudging -- would be vital if we actually wanted to push "wikitext" further into the world.

But in any case, I do think that work on "wikitext simplified" (syntax and templates) would be interesting. Lots of our power users would like something that is very close to the wikitext they are familiar with, but without some of the hidden pitfalls. Lots of our engineers would like to have a parser/template implementation which is drastically simpler than what we have now.

But the other side of this ought to be (in my opinion) decoupling the core of mediawiki from the particular choice of markup language. If you want to have a pure-VE wiki with HTML-native storage, you should be able to. If you want to use markdown (+a template mechanism) for your wiki, you should be able to do that. Similarly with "legacy wikitext" and "wikitext 2.0" and who-knows-what in the future. Step 0 was gwicke's introduction of "lossless round trip translation" as embodied with Parsoid, which decoupled the user experience from the actual markup stored in our database. Step 1 is completing that process so the entire mediawiki core is markup agnostic. *Then* we can have a thousand flowers bloom. If one of the hardy flowers is "simplified wikitext", hurrah. If users really prefer markdown syntax, good for them. We can make both work, and (to a large degree) even let "edit in markdown syntax" co-exist with "edit in wikitext syntax", with the content represented in the underlying database using some other format altogether.

Cscott (talkcontribs)

At the parsing team offsite, my thoughts above got boiled down into a "zero parsers in core" proposal (parsers should be fully pluggable and optional), and a concrete "simplified wikitext" proposal that I hope to write up properly soon. (In the meantime see https://gerrit.wikimedia.org/r/316237 ).

It also became apparent (to me at least) that "wikitext syntax" and "template semantics" are largely orthogonal. Some of what we think of as "wikitext 2.0" is actually related to the template engine, and can be worked in with proposals like {{#balance}} irrespective of wikitext syntax reform discussions.

SSastry (WMF) (talkcontribs)

Exactly! :-) I think syntactical changes should be decoupled from changes to the processing model. I think we should go all the way with adopting DOM semantics for wikitext and {{#balance}} is just the first step along the way.

SSastry (WMF) (talkcontribs)

I think Parsing/Notes/Wikitext 2.0, Parsing/Notes/Wikitext 2.0/Strawman Spec , and Parsing/Notes/Two Systems Problem are all relevant to this topic. We are going to be talking in depth about some of these at the parsing team offsite in october and possibly the editing team offsite as well, but yes, there is a lot of group discussions to be had in this area to cohere around strategies. I think "evolving wikitext" or "technical debt in wikitext" or "wikitext maintenance" are all interesting subtopics in this area.

SSastry (WMF) (talkcontribs)

And, to clarify my comment about "group discussions ... to cohere around strategies", the discussions will probably benefit from participation of template editors, bot writers, toolling developers.

Qgil-WMF (talkcontribs)

Reading back the list of points in @RobLa-WMF's first post, I think one title that would capture the intention could be "Handling wiki content beyond plaintext".

This main topic would be a good umbrella for those points and the topics discussed here.

Antigng (talkcontribs)

Any attempt in the name of evolution / modification / technical debt payment that change the wikitext syntax greatly will never be supported by the commnuity. Storing content in other forms (no matter how efficiently they are parsed ) other than the old wikitext is also a bad idea, since this will make database dumping (in wikitext form) impossible, and all bots that rely on dumped data will stop working.

Reply to "WikiDev17 topic: Wikitext"