UploadWizard/Message parser

Features
We now have an experimental message string library that leverages wikitext, jQuery, and our internationalization process, to deliver some very powerful features to frontend developers.

Simple internationalized message strings with a global function
Users of the MwEmbed or ResourceLoader frameworks will be familiar with this already: one can define a global message function to snatch translated messages out of the air.

One simply defines a message, in PHP -- for extensions this is typically in an ".i18n.php" file.

'mwe-upwiz-file-all-ok' => 'All uploads were successful!',

And then, with a little more glue, you can use this message on the frontend in JavaScript:

$( '.status' ).append( gM( 'mwe-upwiz-file-all-ok' ) );

With results like:

All uploads were successful!

And if you were using a different language, like French:

Tous les imports ont réussi !

Using jQuery mode
The above example showed you how "string mode" works. It's called "string mode" because the gM function returns a string, which you then have to do something with. But for the majority of cases, we want to append data into a jQuery-selected node. In "jQuery mode" we can leverage jQuery directly, like so:

$( '.status' ).msg( 'mwe-upwiz-file-all-ok' );

With the same result:

All uploads were successful!

And of course .msg is chainable like most other jQuery functions.

Most features work identically in string mode and in jQuery mode. But the jQuery mode is more concise, and more powerful, and there are some advanced features that only work with jQuery mode. Read on!

Using 'magic'
Some common replacements can be made in your messages. Within MediaWiki, this is sometimes called 'magic'.

PHP: 'mwe-upwiz-deeds-macro-prompt' => " requires you to provide copyright information.',

JavaScript: $( '.prompt' ).msg( 'mwe-upwiz-deeds-macro-prompt' );

Result: The Awesome Wiki requires you to provide copyright information.

Using parameters
One can use $1..$n to signify a parameter to replace:

PHP: 'mwe-upwiz-autoconverted' => 'This file was automatically converted from the $1 format to the $2 format',

JavaScript: $( '.note' ).msg( 'mwe-upwiz-autoconverted', 'TIFF', 'JPEG' );

Result: This file was automatically converted from the TIFF format to the JPEG format.

Using parameters with grammar like PLURAL
Since UploadWizard handles one or more files simultaneously, this is used all over the place.

PHP: 'mwe-upwiz-upload-count' => '$1 of $2 NaN filess uploaded',

JavaScript: $( '.upload-count' ).msg( 'mwe-upwiz-upload-count', 3, 5 );

Result: 3 of 5 files uploaded

Note that, depending on the language, $1 and $2 might not be rendered in the same order as English.

Creating links
PHP: 'mwe-upwiz-previously-uploaded' => 'This file was previously uploaded to and is already available [$1 here].',

JavaScript: $( '.warning' ).msg( 'mwe-upwiz-previously-uploaded', 'http://sample.com/wiki/File:SomeImage.jpg' );

Result: This file was previously uploaded to The Awesome Wiki and is already available.

History
So, I was hired to deal with multimedia uploads, but ended up taking a detour for a few weeks to write a limited wikitext parser. Here's why.

In the course of writing UploadWizard, I started to rely on MwEmbed's message library, which had limited wikitext parsing. This was a great help to internationalization, since one could simply define a message like this:

requires you to provide copyright information for NaN these workss, to make sure everyone can legally reuse NaN thems

...and be assured of this working in every language that TranslateWiki knew about.

MwEmbed was ultimately not accepted for integration into MediaWiki, so the ResourceLoader framework was invented to replace that. But we had little or no support for wikitext-parsed messages like the above.

Michael Dale wrote another class (MwMessage.js) to supply the needed features and some advanced ideas like using jQuery in arguments to create advanced behaviours. But I felt that it was still a bit too hacky and had some annoying flaws. For one, parameters like $1 were replaced before the message was actually parsed, leading to some unnecessary convolutions, and potential sources of error (what if your parameter string contained valid wikitext?). Also, very similar code had to be repeated all over for strings and for jQuery.

The biggest flaw, however, was that the parsing was done on the client side. It seemed to me that it would be dramatically easier if MediaWiki would only give us a structure that the frontend could use more easily. In this way we could avoid parsing on the client side entirely, and only ship a tiny, tiny library to render code.

/*                * Parses the input wikiText into an abstract syntax tree, essentially an s-expression. *                * Why you would want to do this: ASTs make some complex stuff easy. *     - this allows the parser to be stateless -- all parsing state is in the AST, which is cached, or is just there temporarily as                  *          replacement parameters are swapped in                 *      - it's MUCH simpler to do complex replacements, such as swapping in jQuery objects into wikitext like "[$1 my link]". No string hackery required *     - decouples target format from parsing -- you can output a string of HTML, or jQuery-compatible array of nodes, or whatever you like. *                * However, these are also all arguments for doing this on the server. Stay tuned... we can reduce the code on the client by half * by porting everything below this function to PHP. *                * Examples: *  "A simple input string" => "A simple input string" *  "Simple  message" => [ 'CONCAT', 'Simple ', [ 'TEMPLATE' ], ' message' ]; *  "Undelete NaN $1 editss", => *       [ 'CONCAT', *         'Undelete ', *         [ 'PLURAL', *           [ 'REPLACE', 0 ],  // zero-based index. $1 is argument 0 *           'one edit', *           [ 'CONCAT', *             [ 'REPLACE', 0 ] *             ' edits' *           ]                 *          ]                 *        ]                 *                 * The following code is a highly hand-hacked and optimized * parser based on a generated PEG parser using the grammar in the file mediawiki.parser.peg *                * CAVEAT: This does not parse all wikitext, and it makes a lot of assumptions that may not reflect * how the actual parser works. It works for pretty much all cases where we will want to pass translation * strings to the frontend, however. *                 * More caveats: there are a lot of things here which could be more efficient, but it's already pretty * efficient already and we may not use this client side for very long until we move it server side. *