Topic on Talk:VisualEditor/Design/Software overview

(talkcontribs)

I think you're making some heavy mistakes there. I also was thinking about such an live editor and semantic autoformatting, and instead of starting hacking (OK, I did and rewrote Preprocessor_DOM in javascript) I pondered a lot about parsing and editing. I had loved to join in the hackaton, but I had to learn for my tests.

At first I also thought about a top-down document model, but I fastly came to the conclusion that this is only doable at very, very simple pages. A autoformatter that sees an unclosed table/div/whatever never knows what's hidden in the following templates. A live-parser/autoformatter/semantic lexer has to use a bottom-up model, just like the current parser. Steps would be

  1. Getting the xml-like tag hooks, comments and inclusion handlers (what to do if malformed? Current: run to the end)
  2. Parsing headings, templates and tpl-arguments
  3. expanding templates
  4. parsing wikitexts into tables/blocks/images/whatever and doing text annotations
  5. tidy the generated html for output

The current parser does the first two steps together, semantically they could be divided. I'm not sure about the fourth step, I've not dived into the source code yet so maybe I'm writing nonsense about that.

My conclusion is that a semantic lexer has to start at the bottom, a autoformatter or editing transaction needs to run down from the top (generated result) again. Everthing other would narrow the required syntax possibilities.

Of course, I think its right to have the document-block-annotatedText model as a data format for saving pages with parsing possibilities to html4, html5, pdf, rss etc, for quick-generating cached content and, most of all, for creating diffs. But for editing we will have to go deeper into wikitext, which has to stay as uncomfortable as today, and templates should not be a part of the DOM.

Trevor Parscal (WMF) (talkcontribs)

You have some great points and have clearly thought this out. Most of what you are focusing in on has to do with the parser, so you might want to get involved over here. One thing I will say though is that it's important to remember that there are, and will always be, many edge cases that aren't being addressed. What we hope to do is meet in the middle, between supporting exotic cases and content being reformed. While it may not be reasonable for us to support every imaginable edge case, it is quite reasonable for us to provide alternative solutions to the use cases that are causing the edge cases. With careful consideration and research, these alternative solutions can serve the use case and the editor software equally well. It's important to keep a sense of balance in this work, not diving too deep on edge cases, and also not pretending there are none. Hopefully you can help User:Brion VIBBER and others who are focused on the parser to keep that balance and contribute your expertise.

This post was posted by Trevor Parscal (WMF), but signed as Trevor Parscal.

Reply to "Constraints"