Extension:Scribunto/Tim's draft roadmap

Outline
The main use case for Lua in the short term will be people porting existing wikitext templates to Lua. So what we want is:


 * 1) Enough development and debugging features that the developers don't become frustrated and quit.
 * 2) Rough equivalents in Lua for things that are commonly done in wikitext.

On the first point, I think we'll be pretty well set once Brad's TemplateSandbox is deployed. Maybe a few bug fixes or improvements to the debug console feature will be needed.

Let's go through some of the requirements and approaches for the second point.

Parser-like features
Parser "variables" like can be simulated simply with frame:preprocess('') etc. This is efficient and simple, it just makes some people say "ewww". The previous criticism of this technique, that it requires a frame object to always be available, was addressed by providing mw.getCurrentFrame. We could add to mw.lua:

etc., to reduce the "ewww" factor without complicating the PHP/Lua interface.

Although LuaSandbox provides a low CPU time overhead for function calls to PHP, it's not so low that you would want to be implementing complete a user-exposed Title class using PHP for every method call. And many external users will be using LuaSandbox which has a high overhead. Regardless of performance considerations, it's fairly awkward to implement Lua functions in PHP, since some data is lost in the transition (upvalues etc.).

Parser functions cannot easily be implemented using frame:preprocess, because:


 * Parameter substitution defeats memoization, so it's not so efficient.
 * Wikitext lacks an escaping format, so arbitrary strings cannot be fed to parser functions this way.
 * Constructing strings from frame.args to feed into frame:preprocess causes double-expansion, as discussed at Extension:Scribunto/Parser interface design.

It would probably be wise to implement a parser function equivalent of frame:expandTemplate, or to make frame:expandTemplate support the calling of parser functions. Then wrappers could be introduced in Lua to provide a tidy external interface:

Reusing the parser function interface means you automatically get resource limits.

On some specific parser functions:


 * urlencode is easily implemented in Lua, see https://test2.wikipedia.org/wiki/Module:Mw . It's probably better to import that code into Scribunto than to implement it as a parser function call.
 * lcfirst, ucfirst, lc, uc, formatnum, grammar, plural and language are implemented in my language module, soon to be submitted. I'm not sure where gender should go, since the parser function is not really a wrapper for the Language method.
 * #time should probably be added to the language module, but note that input length limits are required, similar to the existing #time implementation.
 * The following should probably be implemented as methods of a title class, not necessarily with the same names: localurl, localurle, fullurl, fullurle, canonicalurl, canonicalurle, namespace, namespacee, namespacenumber, talkspace, talkspacee, subjectspace, subjectspacee, pagename, pagenamee, fullpagename, fullpagenamee, subpagename, subpagenamee, basepagename, basepagenamee, talkpagename, talkpagenamee, subjectpagename, subjectpagenamee, #ifexist. PHP callbacks should be minimised, for performance.
 * padleft, padright: I would like an mw.utf8 module with some basic UTF-8 string handling functions. I don't favour squatting on global names like "ustring", "u" or "utf8" since those may in the future be used by Lua itself. The implementation options are pure Lua and PHP (mbstring) callbacks. I'd rather avoid having MediaWiki-specific C code unless benchmarks prove that it is absolutely necessary.
 * The following seem like a low priority to me, users can just use frame:callParserFunction or whatever: numberof*, pagesinnamespace, anchorencode, defaultsort, filepath, pagesincategory, pagesize, protectionlevel, displaytitle, formatdate


 * 1) tag is an interesting case, especially when it comes to construction of &lt;ref>. Module developers on test2 were writing code like:

This was extremely slow, it rapidly exhausts the Lua time limit. The arguments turn out to be triple-parsed: once when expanding frame.args.foo, again when expanding the arguments to the #tag, and a third time when &lt;references> calls Parser::recursiveTagParse. The situation would be slightly improved if this became:

Then the arguments would only be double-parsed, once for the input arguments and again by Parser::recursiveTagParse. That's basically the same situation as original wikitext. But to get it down to a single parse operation would be complicated: there's no Parser method which provides only a main parse with no preprocessing, so even if you could tell Cite that some input arguments are already preprocessed, there would be no way for it to convert them to HTML for inclusion in the &lt;references> output.

Anyway, CoreParserFunctions::tagObj would probably break if you tried to call it from Scribunto, so let's just blacklist it and have something like:

Coding style
Lua OOP is like JavaScript except without .prototype and without a new operator. When you write a class wrapping a PHP interface, I've found that it is convenient to use local constructor variables to establish a private state. That suggests a style like:

The disconcerting thing about this (for a PHP programmer anyway) is that there isn't any class name, and there is no obvious place to put static methods. And having the constructor in the mw module makes it hard to split the code into modules. You can kind of fake it with code along the lines of:

We now have modularity. Technically, there's still no class name, and no function called mw.widget:foo, but maybe we can pretend that that is the name of the function for the purposes of error messages.

In JavaScript, we use an initial capital letter on a member name to indicate that it is a class. By "class", we mean something that is meant to be fed into the "new" operator. Since there is no "new" operator in Lua and not any real class names assuming you follow the style above, it doesn't make sense to me to use initial capitals. Maybe others have different opinions.

I suppose you could use an initial capital to designate any table with a member called "new", but the distinction between a class and a module would then be so fine that we would be introducing a difficult memorization task for our users: "was the function I want a module function mw.language.isValidCode or a static method mw.Language.isValidCode?"