Extension:Scribunto/Tim's draft roadmap

From mediawiki.org


The main use case for Lua in the short term will be people porting existing wikitext templates to Lua. So what we want is:

  1. Enough development and debugging features that the developers don't become frustrated and quit.
  2. Rough equivalents in Lua for things that are commonly done in wikitext.

On the first point, I think we'll be pretty well set once Brad's TemplateSandbox is deployed. Maybe a few bug fixes or improvements to the debug console feature will be needed.

Let's go through some of the requirements and approaches for the second point.

Parser-like features[edit]

Parser "variables" like {{PAGENAME}} can be simulated simply with frame:preprocess('{{PAGENAME}}') etc. This is efficient and simple, it just makes some people say "ewww". The previous criticism of this technique, that it requires a frame object to always be available, was addressed by providing mw.getCurrentFrame(). We could add to mw.lua:

local currentPageName
function mw.getCurrentTitle()
    if currentPageName == nil then
        currentPageName = mw.getCurrentFrame():preprocess('{{PAGENAME}}')
    return mw.title.new( currentPageName )

etc., to reduce the "ewww" factor without complicating the PHP/Lua interface.

Although LuaSandbox provides a low CPU time overhead for function calls to PHP, it's not so low that you would want to be implementing complete a user-exposed Title class using PHP for every method call. And many external users will be using LuaSandbox which has a high overhead. Regardless of performance considerations, it's fairly awkward to implement Lua functions in PHP, since some data is lost in the transition (upvalues etc.).

Parser functions cannot easily be implemented using frame:preprocess, because:

  • Parameter substitution defeats memoization, so it's not so efficient.
  • Wikitext lacks an escaping format, so arbitrary strings cannot be fed to parser functions this way.
  • Constructing strings from frame.args to feed into frame:preprocess() causes double-expansion, as discussed at Parser interface design.

It would probably be wise to implement a parser function equivalent of frame:expandTemplate(), or to make frame:expandTemplate() support the calling of parser functions. Then wrappers could be introduced in Lua to provide a tidy external interface:

function mw.pageExists( name )
    ret = mw.getCurrentFrame():callParserFunction{ name = 'ifexist', args = {name, '1'} }
    return ret == '1'

Reusing the parser function interface means you automatically get resource limits.

On some specific parser functions:

  • urlencode is easily implemented in Lua, see https://test2.wikipedia.org/wiki/Module:Mw . It's probably better to import that code into Scribunto than to implement it as a parser function call.
  • lcfirst, ucfirst, lc, uc, formatnum, grammar, plural and language are implemented in my language module, soon to be submitted. I'm not sure where gender should go, since the parser function is not really a wrapper for the Language method.
  • #time should probably be added to the language module, but note that input length limits are required, similar to the existing #time implementation.
  • The following should probably be implemented as methods of a title class, not necessarily with the same names: localurl, localurle, fullurl, fullurle, canonicalurl, canonicalurle, namespace, namespacee, namespacenumber, talkspace, talkspacee, subjectspace, subjectspacee, pagename, pagenamee, fullpagename, fullpagenamee, subpagename, subpagenamee, basepagename, basepagenamee, talkpagename, talkpagenamee, subjectpagename, subjectpagenamee, #ifexist. PHP callbacks should be minimised, for performance.
  • padleft, padright: I would like an mw.utf8 module with some basic UTF-8 string handling functions. I don't favour squatting on global names like "ustring", "u" or "utf8" since those may in the future be used by Lua itself. The implementation options are pure Lua and PHP (mbstring) callbacks. I'd rather avoid having MediaWiki-specific C code unless benchmarks prove that it is absolutely necessary.
  • The following seem like a low priority to me, users can just use frame:callParserFunction() or whatever: numberof*, pagesinnamespace, anchorencode, defaultsort, filepath, pagesincategory, pagesize, protectionlevel, displaytitle, formatdate

CoreParserFunctions::tagObj() would probably break if you tried to call it from Scribunto, so we may have to blacklist it from callParserFunction and implement something like:

frame:makeExtensionTag{ name = ref, contents = frame.args.foo }

Coding style[edit]

Lua OOP is like JavaScript except without .prototype and without a new operator. When you write a class wrapping a PHP interface, I've found that it is convenient to use local constructor variables to establish a private state. That suggests a style like:

function mw.newWidget()
    local widget = {}
    local privThing = true

    function widget:foo()

    return widget

The disconcerting thing about this (for a PHP programmer anyway) is that there isn't any class name, and there is no obvious place to put static methods. And having the constructor in the mw module makes it hard to split the code into modules. You can kind of fake it with code along the lines of:

function mw.widget.someStaticMethod()

function mw.widget.new()
    local widget = {}
    local privateData

    function widget:foo()

    return widget

We now have modularity. Technically, there's still no class name, and no function called mw.widget:foo(), but maybe we can pretend that that is the name of the function for the purposes of error messages.

In JavaScript, we use an initial capital letter on a member name to indicate that it is a class. By "class", we mean something that is meant to be fed into the "new" operator. Since there is no "new" operator in Lua and not any real class names assuming you follow the style above, it doesn't make sense to me to use initial capitals. Maybe others have different opinions.

I suppose you could use an initial capital to designate any table with a member called "new", but the distinction between a class and a module would then be so fine that we would be introducing a difficult memorization task for our users: "was the function I want a module function mw.language.isValidCode() or a static method mw.Language.isValidCode()?"