Wikitext/Parsoid test cases


 * list any existing test cases used for editors/parsers
 * adapt MediaWiki's parser tests
 * list of pages that are known bad in some system
 * build a corpus of Wikipedia pages to use for tests
 * newparser bugs in Bugzilla

Consider converting test cases by hand to the initial AST -- Future/AST

Timeline
Neil & Trevor in Wikimedia's parser team are looking at JavaScript testing frameworks to build initial tests for JS-based demo/exploratory work.

We plan to have tests available from day 1 of main exploratory demo; maintenance of those tests will be ongoing.

Getting involved

 * Join the wikitext-l mailing list if interested in following along or getting involved; there should be posts from Brion, Trevor, or Neil at least a couple times a week, and we're going to need feedback and help!
 * Give feedback on the initial prelim docs & demos via Future/AST (to come soon)
 * Collect references to existing alternate parser output formats via Future/AST
 * Collect test cases (example pages, known problematic pages, corpus from Wikipedia, adapted parser tests) here

Forward / Reverse Test Cases
Wikia uses the cases found in http://trac.wikia-code.com/browser/wikia/trunk/tests/acceptance/com/wikia/selenium/tests/RTETest.java#L21 to test the idempotency of the forward/reverse parsing process.

Edge Cases
Wikia's Rich Text Editor detects the following cases and, when found, degrades to source mode:
 * COMMENT - comment found in the middle of wikitext line - foo bar
 * COMPLEX.01 - wikitext marker found in original wikitext (triggered in RTEData::replaceIdxByData)
 * COMPLEX.02 - data marker found in original wikitext (triggered in RTEData::replaceIdxByData)
 * COMPLEX.03 - template found within template call -
 * COMPLEX.04 - marker found in table's attributes - {|
 * COMPLEX.05 - marker found in row attributes - |-
 * COMPLEX.06 - marker found in table's caption
 * COMPLEX.07 - marker found in original wikitext (triggered in RTEData::replaceIdxByData)
 * COMPLEX.08 - marker found in HTML tag attributes - 
 * COMPLEX.09 - double brackets found in image/video caption - [[Image:Foo.png|]]
 * COMPLEX.10 - table cell line begin with a comment
 * COMPLEX.11 - parser hook found inside HTML table -