Wikitext/Parsoid test cases


 * list any existing test cases used for editors/parsers
 * adapt MediaWiki's parser tests
 * list of pages that are known bad in some system
 * build a corpus of Wikipedia pages to use for tests
 * newparser bugs in Bugzilla

Consider converting test cases by hand to the initial AST -- Future/AST

Test frameworks in progress

 * Round-trip testing (parse to AST, then return to original source)
 * node.js CLI script to run through an XML export dump: roundtrip.js, worker.js, roundtrip-test.js
 * needs better reporting & refactoring
 * XML dump of Wikia's round-trip test cases: wikia-rte-roundtrip-tests.xml
 * HTML rendering output tests
 * incomplete: node.js CLI script to run through MediaWiki's parserTests.txt cases:
 * runs through but doesn't test output or do anything useful yet

Todo (Brion's working on these):
 * refactor the CLI test cases into a common framework
 * automatic multiprocess w/ web workers on both CLI and web
 * CLI nicely formatted output (colors, exit code)
 * web nicely formatted output (table w/ colors, find a way to integrate in testswarm)
 * web-accessible data sources
 * HTML rendering tests
 * sensible HTML comparison so it doesn't trigger on non-important differences
 * flag certain test cases as not relevant to the JS parser to reduce terror

Timeline
Neil & Trevor in Wikimedia's parser team are looking at JavaScript testing frameworks to build initial tests for JS-based demo/exploratory work.

We plan to have tests available from day 1 of main exploratory demo; maintenance of those tests will be ongoing.

Getting involved

 * Join the wikitext-l mailing list if interested in following along or getting involved; there should be posts from Brion, Trevor, or Neil at least a couple times a week, and we're going to need feedback and help!
 * Give feedback on the initial prelim docs & demos via Future/AST (to come soon)
 * Collect references to existing alternate parser output formats via Future/AST
 * Collect test cases (example pages, known problematic pages, corpus from Wikipedia, adapted parser tests) here

Forward / Reverse Test Cases
Wikia uses the cases found in http://trac.wikia-code.com/browser/wikia/trunk/tests/acceptance/com/wikia/selenium/tests/RTETest.java#L21 to test the idempotency of the forward/reverse parsing process.

Edge Cases
Wikia's Rich Text Editor detects the following cases and, when found, degrades to source mode:
 * COMMENT - comment found in the middle of wikitext line - foo bar
 * COMPLEX.01 - wikitext marker found in original wikitext (triggered in RTEData::replaceIdxByData)
 * COMPLEX.02 - data marker found in original wikitext (triggered in RTEData::replaceIdxByData)
 * COMPLEX.03 - template found within template call -
 * COMPLEX.04 - marker found in table's attributes - {|
 * COMPLEX.05 - marker found in row attributes - |-
 * COMPLEX.06 - marker found in table's caption
 * COMPLEX.07 - marker found in original wikitext (triggered in RTEData::replaceIdxByData)
 * COMPLEX.08 - marker found in HTML tag attributes - 
 * COMPLEX.09 - double brackets found in image/video caption - [[Image:Foo.png|]]
 * COMPLEX.10 - table cell line begin with a comment
 * COMPLEX.11 - parser hook found inside HTML table -

Interesting Wikipedia content

 * en:Barack Obama - size and template tests, especially cite templates
 * en:Wikipedia:IPA for Portuguese and Galician - nested tables with rowspan, colspan and templates
 * en:Template:IPAc-pl
 * en:Template:WPBannerMeta/core - Template and parser function torture test
 * The cite template system in the English Wikipedia:

Template and parser function edge cases

 * meta:Help:Table and meta:User_talk:Patrick
 * es:Plantilla:Columnas