User:SSastry (WMF)

My name is Subramanya Sastry and in May 2012, I joined the Visual Editor team at the Wikimedia Foundation as a senior software engineer. I will primarily work with Gabriel Wicke on the Parsoid backend parser piece.

Wiki pages with wikitest use cases/tests

 * Quotes
 * UL/OL Lists
 * DL Lists

Other useful wiki pages to test against

 * Mediawiki Formatting Help page: Help:Formatting
 * Big page that can be a stressor: en:Wikipedia:Village_pump_(technical)

Parsoid/VE Notes
Notes I am making as I work through the code/algorithms/strategies for parsing wikitext in the context of the Visual Editor project. These notes may reflect a partial understanding or even misunderstanding of the issues involved, and are more notes to myself than anything else.
 * Handling whitespace

While the specific newline issues that led to the formulation of the note have mostly been addressed, the broad idea contained in the above note is applicable and possible useful in a more general sense, not just for whitespace, i.e. use the original wikitext to serialize most of the original text -- this also has an added benefit that for minor edits, there is no need to serialize a humongous DOM. For example, if someone corrects a typo on a barack obama page, does it make sense to really re-serialize everything? Is it simpler to issue a patch request to the PHP service to string replace specific sections of the original wikitext?

More generally, it may be useful to think of serialization as a diff patch in certain contexts, where applicable. Not sure how easy it will be to do this, but something to consider for large pages where progressively, changes on those pages will mostly be minor relative to the size of the page. Serialization has to be complete in and of itself to support all use cases and cannot rely on modification hints for correctness. But, modification hints from the visual editor could help the serializer optimize performance by focusing on modified bits and patching the source wikitext string rather than regenerate it from scratch.