Extension:Scribunto/Parser interface design

From mediawiki.org

Proposed interface summary[edit]

<!-- Invoke a Lua function from wikitext -->
{{ #invoke: module_name | function_name | arg1 | arg2 | name1 = value1 }}
local p = {}
function p.function_name( frame )
	-- Get arg1
	local arg1 = frame.args[1]

	-- Get name1
	local name1 = frame.args.name1

	-- Get name1 by preprocessing
	name1 = frame:preprocess( '{{{name1}}}' )

	-- Put all arguments into a real table
	local t = {}
	for name, value in frame:argumentPairs() do
		t[name] = value
	end

	-- Make a <ref> tag
	local s = frame:preprocess( '<ref>Note</ref>' )

	-- Call a template
	s = s .. frame:expandTemplate{ title = 'tpl', args = {foo = arg1} }

	-- Return expanded text
	return s
end
return p

Design principles[edit]

My design principles for the parser interface are:

  • It should appear to be native to Lua. It's desirable to map parser concepts from PHP, but not syntax details.
  • It should be flexible enough to allow for future developments. For example, greater integration with the current preprocessor, or close integration with Gabriel Wicke's proposed token-based parser.
  • It should encourage brief but readable code.
  • It should be efficient, or at least the interface should not preclude an efficient future implementation.

Lua overview and conventions[edit]

Lua has a single data structure called a table. It is similar to a PHP array in that it is a hybrid of an integer-indexed array and a hashtable. It is similar to a JavaScript object in that a table can be used as an object that contains both methods and properties in the same namespace. It provides JavaScript-like syntax for accessing elements: foo.bar is equivalent to foo['bar'].

Normal object method calls are written with a colon: obj:func(). Static method calls are written with a dot: obj.func().

Named arguments or some approximation to them are commonly encouraged by experienced software developers. Lua has syntactical support for a particular implementation of named arguments: some_function{foo = bar} is equivalent to some_function({foo = bar}). This example calls some_function with a single table argument. The table contains a single element with name "foo" and value "bar".

I propose that we use such named arguments in all functions that would otherwise accept two or more arguments, and in functions that accept a table as their only argument.

Named template arguments[edit]

Templates use a combination of named and numbered arguments: {{template | name = value }} and {{template | value }}. Parser functions have an internal interface which provides only positional arguments, and equals signs need to be interpreted by the parser function.

I propose to hide this implementation detail from Lua scripts, by providing template-like named and numbered arguments. Top-level Lua functions would receive a frame object representing the set of all arguments.

Parent frame access[edit]

One of the reasons we believe Lua implementations of metatemplates will be faster than the existing wikitext is because we can avoid handling large numbers of arguments in the wikitext parser entirely. In wikitext, every triple brace and every pipe has a substantial cost. By allowing Lua access to template arguments in the template the script is invoked from, we eliminate the need for large "proxy" invocations.

For example, imagine this template converted to Lua. If parent frame access is not allowed, then the template will be essentially the same, except with {{Citation/code replaced with {{#invoke:Citation/core. If we do allow parent frame access, then this template can be extremely short, with the task of mapping input argument names moved to Lua.

In the current parser implementation, the set of template arguments is called a frame, and the set of template arguments available to the caller of the current template is called the parent frame. If template arguments are essentially the same as Lua script arguments, then we can borrow this terminology.

So I propose to provide a getParent() method in the frame object that we pass to Lua.

To protect the consistency of the empty-frame expansion cache (Parser::$mTplExpandCache), I propose that we do not provide access to grandparent frames. The way the empty-frame cache works was changed in gerrit:135980, and it is now safe to expose grandparent frames.

Index metamethod[edit]

The current preprocessor has a kind of "dead branch elimination". The input text is converted to a tree, then if a subtree is not referenced, it does not have to be "expanded" to plain text. In particular, if you call a template like this:

{{ template1 | arg = {{template2}} }}

Then if you never write {{{arg}}} in template1, template2 never needs to even be loaded from the database.

If we provide access to arguments as a simple table with text in the values, then all arguments will need to be fully expanded prior to passing control to Lua.

It would be possible to provide a table with an "index" metamethod which expands the requested argument on demand. For example, frame.args.foo could provide access to the argument named "foo".

The major disadvantage to this is that iteration over frame.args is not possible using the normal Lua construct:

for k, v in pairs(frame.args) do
   ...
end

The loop body would not be executed at all. But perhaps this can be overcome with clear documentation. Users would instead be instructed to use a special iterator factory:

for k, v in frame:argumentPairs() do
   ...
end

I think the advantage in brevity outweighs this potential pitfall. It would even be possible to make a local alias:

local p = frame.args
if p.Surname1 then
   ...
end

Preprocessor expansion[edit]

It will be useful to allow Lua to expand templates and other preprocessor input text. Some use cases are:

  • As an alternative to a string literal, to include snippets of wikitext which are intended to be editable by people who don't know Lua.
  • During migration, to call complex metatemplates which have not yet been ported to Lua, or to test migrated components independently instead of migrating all at once.
  • To provide access to miscellaneous parser functions and variables.
  • To allow Lua to construct tag invocations, such as <ref> and <gallery>.

I propose providing an interface for recursive expansion called frame:preprocess(). This function would expand wikitext, and the arguments available via triple braces would be the arguments to #invoke. It would essentially be a wrapper around PPFrame::expand(Preprocessor::preprocessToObj()).

I considered providing an interface for preprocessing via the return value:

   return {expand = true, text = '<ref>hello</ref>'}

But it seems that such an interface would not provide any special benefits over recursion.

Avoiding double-expansion[edit]

The model for preprocessing that we try to present to the user is that every piece of text is expanded only once. This makes it possible to use {{((}}, which conventionally expands to a double-brace "{{", and expect the double brace to make it through to the output without it being interpreted as the start of a template invocation.

A Lua script will need to enforce this model itself, because its arguments will be already expanded (or an interface for expanding them needs to be provided), but it will be necessary to provide an interface for expanding unexpanded text, as explained above.

Inevitably, some community members will see it as convenient to break this model, and will provide an interface for double-expansion wrapped in a template. We have to be prepared for this. But there's no need to encourage it.

Say if a Lua script wishes to pass through some previously expanded text as an argument to a subsidiary template. One might do this:

return frame:preprocess('{{template | foo = ' .. foo .. '}}')

This causes double-expansion of the argument, and breaks if it contains a pipe character. Providing an interface for child frame generation (i.e. a PPCustomFrame wrapper) would allow this to be avoided:

local newFrame = frame:newChild( {foo = foo} )
return newFrame:preprocess( '{{template | foo = {{{foo|}}} }}' )

Perhaps it would be useful to include that interface, but it is verbose, and users might write code along the lines of the first example for convenience or brevity. We would be more likely to avoid the problem altogether if we provided a direct interface to template invocation:

return frame:expandTemplate{title = 'template', args = {foo = foo}}

Invoking Lua from wikitext[edit]

During previous discussions, it was judged to be undesirable to allow Lua expressions to be embedded in wikitext directly. Instead we have a parser function which calls a Lua function loaded from a page in the new Module namespace. The proposed syntax is:

{{ #invoke: module_name | function_name | arg1 | arg2 | name1 = value1 }}

Colon characters would not be allowed in the module name, they would be reserved for use in a future interwiki module repository feature:

{{ #invoke: commons:module_name | function_name | arg1 | arg2 | name1 = value1 }}