UploadWizard/Message parser

We now have an experimental message string library, parserPlus, that leverages wikitext, jQuery, and our internationalization tools, to deliver some very powerful features to frontend developers.

The library is located in mediawiki.language.parser.js.

This library was inspired by some features in Michael Dale's MwEmbed, but was written from scratch by User:NeilK.

Simple internationalized message strings with a global function
Users of the MwEmbed or ResourceLoader frameworks will be familiar with this already: one can define a global message function to snatch translated messages out of the air.

One simply defines a message, in PHP -- for extensions this is typically in an ".i18n.php" file.

'mwe-upwiz-file-all-ok' => 'All uploads were successful!',

And then, with a little more glue, you can use this message on the frontend in JavaScript:

$( '.status' ).append( gM( 'mwe-upwiz-file-all-ok' ) );

With results like:

All uploads were successful!

And if you were using a different language, like French:

Tous les imports ont réussi !

Using jQuery mode
The above example showed you how "string mode" works. It's called "string mode" because the gM function returns a string, which you then have to do something with. But for the majority of cases, we want to append data into a jQuery-selected node. In "jQuery mode" we can leverage jQuery directly, like so:

$( '.status' ).msg( 'mwe-upwiz-file-all-ok' );

With the same result:

All uploads were successful!

And of course .msg is chainable like most other jQuery functions.

Most features work identically in string mode and in jQuery mode. But the jQuery mode is more concise, and more powerful, and there are some advanced features that only work with jQuery mode. Read on!

Using 'magic'
Some common replacements can be made in your messages. Within MediaWiki, this is sometimes called 'magic'.

PHP: 'mwe-upwiz-deeds-macro-prompt' => " requires you to provide copyright information.',

JavaScript: $( '.prompt' ).msg( 'mwe-upwiz-deeds-macro-prompt' );

Result: The Awesome Wiki requires you to provide copyright information.

Using parameters
One can use $1..$n to signify a parameter to replace:

PHP: 'mwe-upwiz-autoconverted' => 'This file was automatically converted from the $1 format to the $2 format',

JavaScript: $( '.note' ).msg( 'mwe-upwiz-autoconverted', 'TIFF', 'JPEG' );

Result: This file was automatically converted from the TIFF format to the JPEG format.

Note that, depending on the language, $1 and $2 might not be rendered in the same order as English.

Using parameters with grammar like PLURAL
Since UploadWizard handles one or more files simultaneously, this is used all over the place.

PHP: 'mwe-upwiz-upload-count' => '$1 of $2 NaN filess uploaded',

JavaScript: $( '.upload-count' ).msg( 'mwe-upwiz-upload-count', 3, 5 );

Result: 3 of 5 files uploaded

Creating links
PHP: 'mwe-upwiz-previously-uploaded' => 'This file was previously uploaded to and is already available [$1 here].',

JavaScript: $( '.warning' ).msg( 'mwe-upwiz-previously-uploaded', 'http://sample.com/wiki/File:SomeImage.jpg' );

Result (simulated): This file was previously uploaded to The Awesome Wiki and is already available.

Using functions as parameters
But what if you want to add JavaScript behaviour to a message -- say, have some short help text, that allows the user to click on a link to find out more?

Click here to learn more about writing a good title for your image.

String hackery doesn't work, since what you want is to preserve Javascript bindings. If your HTML passes through any string hacking, that's all gone.

You might try something horribly convoluted like embedding actual HTML into the message (nasty) and then fishing it all out with string manipulation and then adding the JS binding. That works if there's one link... but what if there's two? Again the links might be in some arbitrary order. So you wouldn't know which was which.

But, this is straightforward with parserPlus:

PHP: 'mwe-upwiz-title-help' => 'Click here to learn more about [$1 writing a good title] for your file.',

JavaScript: $titleDialog = ...; $( '.description-help' ).msg( 'mwe-upwiz-tooltip-title',                              function { $titleDialog.dialog.open } );

When you use a function in the place where a link HREF should go, the library interprets that to mean you want to make a link with a click handler.

Here are screenshots of a real interaction that uses very similar code to what's above:

Using jQuery-created HTML as parameters
In the examples above we learned that [$1 link text] could be used in message strings to create a link. One can use a URL as a parameter, or even a function to make a click handler. But what if you wanted something even more complicated? What if you want to change the href, title, target, or indeed any other property? In that case you can use a jQuery node:

PHP: 'mwe-upwiz-title-help' => 'Click here to learn more about [$1 writing a good title] for your file.',

JavaScript: $titleDialog = ...; $( '.description-help' ).msg( 'mwe-upwiz-tooltip-title',                              $('').click( function { $titleDialog.dialog.open } );                                      .attr( { title: 'someTitle', id: 'someId' } )                                     .addClass( 'tooltip' )                            );

In that case we are telling it not to bother creating a link, because we supply one in the jQuery-created &lt;a&gt;. The parserPlus library will "wrap" that &lt;a&gt; node around the "writing a good title" text.

But you don't even have to use anchor tags or the link syntax. You can use any element created with jQuery and have it replace one of the $1..$n parameters.

Here's a case where we are creating a form. We want the user to claim authorship of one or more files they just uploaded, with the ceremonial language "I, so-and-so, the copyright holder of these works...".

We'll put the author name input right in the "so-and-so" place, and prefill it with the user's MediaWiki username. In case they want to use their real name, they can still change it. This is hard to do with just string hacking, because we also want to preserve certain complicated bindings on the author input. But we can do it with parserPlus.

PHP: 'mwe-upwiz-source-ownwork-assert' =>

'I, $2, the copyright holder of NaN these workss, hereby irrevocably grant anyone the right to use NaN these workss for any purpose, as long as they credit me and share derivative work under the same terms.'

JavaScript: // a complicated object with bindings... var $authorInput = $j( ' ') .attr( { name: "author", type: "text", value: wgUsername } ) .addClass( 'mwe-upwiz-sign' ) .keyUp( valueChangedBinding )

// just uploading a single file... var uploadCount = 1;

$j( ' ' ).msg( 'mwe-upwiz-source-ownwork-assert',                uploadCount,                 $authorInput );

Result:

How to use it
Some sample initialization code that works for me at the moment. This is inefficient because it creates two identical parsers when what is wanted is just two different kinds of outputs from the same parser. May change in the near future.

jQuery( document ).ready( function {       // add "magic" to Language template parser for keywords        var options = { magic: { 'SITENAME' : wgSiteName } };

// create jQuery plugin, outputting jQuery nodes $.fn.msg = mediaWiki.language.getJqueryMessagePlugin( options );

// create global, string-outputting function (useful when jQuery nodes won't do, like attributes) window.gM = mediaWiki.language.getMessageFunction( options );

} );

History and future directions
So, I was hired to deal with multimedia uploads, but ended up taking a detour for a few weeks to write a limited wikitext parser. Here's why.

In the course of writing UploadWizard, I started to rely on MwEmbed's message library, which had limited wikitext parsing. This was a great help to internationalization, and the PLURAL support was nice.

MwEmbed was ultimately not accepted for integration into MediaWiki, so the ResourceLoader framework was invented to replace that. But we had little or no support for wikitext-parsed messages. Simple replacements were handled, but not complicated or nested parsing.

Michael Dale and NeilK (that's me) wrote another class (MwMessage.js) to supply the needed features and some advanced ideas like dropping jQuery nodes right into message strings. But I felt that it was still a bit too hacky and had some annoying flaws. For every message, you needed to instantiate another parser. Also, parameters like $1 were replaced before the message was actually parsed, leading to some unnecessary convolutions and code repetition for the advanced jQuery-oriented features that Michael was exploiting heavily.

The biggest flaw, however, was that the parsing was done on the client side. It seemed to me that it would be dramatically simpler if MediaWiki would only give us a pre-parsed structure. In this way we could avoid parsing on the client side entirely, and only ship a tiny, tiny library to render code.

But our standard PHP parser doesn't give us any output like that. (Yet.) I had heard about PEG-based parsers, and I experimented a little with one, and I had something that worked really well almost immediately.

However, for the time being, that parser is written in JavaScript. So it's still doing the parsing on the frontend, which bloats this library unnecessarily. If we could simply transfer the parser to the server, the library would shrink from 7k compressed to about 2k compressed.

Under the hood
This is a relatively simple parser-emitter.

The parser itself is hand-hacked JS that was generated from a PEG syntax (see the same directory where the libary is located).

It parses the input wikiText into an abstract syntax tree, essentially an s-expression in JSON.

Why this is good: abstract syntax trees make some complex stuff easy.
 * this allows the parser to be purely functional -- the AST stays the way it is, the parser stays the way it is, and all parsing state exists only temporarily as one is walking the tree. The algorithm to walk the tree fits in a few lines, because it's just depth-first recursion. (A clever Javascript compiler can make some great optimizations here.)
 * it's MUCH simpler to do complex replacements, even ones that preserve bindings. No string hackery required.
 * decouples target format from parsing -- if we are outputting text for an attribute, we can flatten the result to text. Otherwise, we deal with jQuery-wrapped DOM nodes.
 * when we move the parsing to the server, we will be doing things in the right order -- "magic" replacements like SITENAME can occur on the server, and parameter replacement can happen on the client. All current parsers do this in the opposite (wrong) order.
 * playing around with this kind of technology makes it more clear how to do rich text editing, which is a coming focus for our tech team.

Examples
Simple strings are stored as simple strings.

"A simple input string" => "A simple input string"

Anything which uses constructs which require parsing turns into this kind of s-expression:

"Undelete NaN $1 editss" =>

[ 'CONCAT', 'Undelete ', [ 'PLURAL', [ 'REPLACE', 0 ], // zero-based index. $1 is argument 0 'one edit', [ 'CONCAT', [ 'REPLACE', 0 ] ' edits' ] ] ]

These structures are cached, and when it comes time to render them, we walk the nodes depth-first and perform the operations requested -- replace, concatenate, plural, link, and so on. One can create new operations just by adding a new item to a dictionary held in the library, and simple replacements can be easily configured with parser options via "magic".