Parsoid/Wikimania 2014

Lets summarize our talk plans for Wikimania

Talk abstract
Parsoid is changing the way we can work with wiki content by representing it as equivalent and editable semantic HTML+RDFa markup. It powers the VisualEditor, but is also used by a growing number of innovative projects including the Flow discussion system, the Kiwix offline reader, and the new ContentTranslation and PDF rendering systems. In the longer term it is on track to provide the default content representation and Wikitext user interface for MediaWiki.

In this presentation, we will illustrate some of the problems we faced while building the bi-directional conversion between Wikitext and HTML. We will show how we addressed some of them, and which limitations remain. We will also describe how we systematically test the quality of the conversion to catch issues like 'dirty diffs' early before they break pages in production, and where this testing has failed in the past.

The second focus of the presentation will be on how the HTML+RDFa format and the Parsoid API can help you write more powerful gadgets, bots, edit or data extraction tools. We will illustrate this using examples from existing projects. As a more hands-on example, we will demonstrate a small editing gadget for micro-contributions.

Finally, we will close our presentation by talking about future plans for Parsoid and MediaWiki's content representation in general. This includes directly storing HTML+RDFa for pages to speed up the site for editors, and research into HTML-based templating and visual diffing.

Questions to answer

 * What is the problem we are trying to solve?
 * Why is this hard? (examples!)
 * How do we address those problems?
 * How does having HTML+RDFa enable new features?
 * kiwix, PDF rendering, LintTrap, Google, translations, Flow, ...
 * How can I use this in my gadget / bot / whatever?
 * How to use the API (hopefully save API by then)
 * What are the future plans re content, templating etc?
 * Rashomon, fast page loads for logged-in users, HTML templating, ?

Outline
Here's one possible outline: (cscott)


 * 1) Introduction / motivating example
 * 2) The difficulties of wikitext
 * 3) * wikitext tarpits
 * 4) * parser codebase
 * 5) * practical issues: hard to write bots, etc
 * 6) Vision
 * 7) * A more standard representation
 * 8) * Editable with existing CE tools
 * 9) The HTML+RDFa promised land
 * 10) * Some examples: it's just HTML!
 * 11) ** can use jquery to find all links, etc
 * 12) * RDFa semantic data
 * 13) Current applications
 * 14) * Visual Editor
 * 15) * kwix
 * 16) * PDF rendering
 * 17) Future applications
 * 18) * LintTrap
 * 19) * better templating
 * 20) * easier bots
 * 21) * unified storage?
 * 22) Community
 * 23) * how can i use parsoid? (parsoid API service, etc)