Extension talk:Scribunto/Victor's API proposal

A few nice-to-have things
--Tgr (talk) 09:04, 3 June 2012 (UTC)
 * an iterator for going through the characters of an UTF-8 string (you could do that via len+sub, but that is uglier and probably much less efficient)
 * higher-level Unicode functions: normalizations, character classes, sort keys
 * encoding conversion (this is useful when creating external links to search etc. services which expect non-utf8 input)
 * basic data structures (one thing I was missing immediately was a set for efficient whitelist/blacklist lookup) and better support for the built-in table structure (find, index, map etc.)
 * I've added an iterator to the specification. I'm not sure about Unicode normalization — I think we should just normalize all Scribunto output to NFC, which is what MediaWiki uses as a convention (or it is probably already done somewhere else in parser or OutputPage). Unicode character data is possible, but I am not quite sure we want to ship it with the extension (it's quite large). Data structures may be implemented as user libraries; we may ship them with Scribunto if many users find it useful. vvvt 12:40, 3 June 2012 (UTC)

Normalization with character class support provides a nice and language-independent way of stripping accents, which might be good for creating IDs and the like. (Admittedly, it might be a lot of effort compared to the benefits.) A standard data library is very important IMHO; you avoid people reinventing the wheel in every single wikipedia, and thus save a lot of time for more useful activities. Also, extending string or table is quite natural for a data library; if a zillion scripts will all have their own separate way of doing that, it will result in the Lua version of the DLL hell. --Tgr (talk) 20:35, 5 June 2012 (UTC)

ustring OOP
You could actually make ustring work with OO fairly easily: since all strings have their metatable.__index set to string by default, anything you add to that will show up as a method on all strings. So, something like this:

string.ufind = ustring.find

would enable this:

someUnicodeString:ufind(...)


 * That is a nice approach we should consider; I am wary of possible side effects though. vvvt 12:38, 3 June 2012 (UTC)


 * A jQuery-esque approach might be better for efficiency (you only need to parse the string to unicode characters once, then you can use array operations):

local str = U('abcdef')


 * Since parens are optional, you can even recreate the Python syntax:

local str = u'abcdef'


 * --Tgr (talk) 20:28, 5 June 2012 (UTC)


 * Yes, we have considered this, and this is basically not quite possible to do correctly in Lua. vvvt 21:28, 5 June 2012 (UTC)
 * By that, you mean it would not be treated as a string? I think the syntax still has merit. (You cannot use a jQuery object in place of a DOM object either, and they are still very comfortable tools.) If you plan to do a lot of string operations (which wouldn't be a rare thing for a template), then the cost of splitting into UTF-8 characters every single time might become non-trivial; plus you could use object-oriented syntax without worries. You will have to reimplement almost all string functions anyway, and you could teach all MediaWiki-specific APIs to understand it, so there would not be all that many places where you would need to convert back manually to string. --Tgr (talk) 16:19, 9 June 2012 (UTC)
 * That's not the only problem with that. Others are inability to do the length operator and the newindex operator correctly at the same time. Also, it is a reference type, which makes it impossible to use adequately as array keys (something people do very often). vvvt 00:59, 19 June 2012 (UTC)

Tim's comments
Capitalisation should be consistent: function and variable names should start with a lower-case letter.

I think we should follow the JS API where possible, so that people have less to learn. Specifically:
 * mw.config.get instead of mw.lang.UILanguage, mw.lang.contentLanguage, mw.site.siteName, mw.site.version, mw.url.server, mw.url.serverName, mw.url.scriptPath
 * mw.message instead of mw.lang.message
 * mw.Title(name):&lt;method> instead of mw.title.&lt;method>(name)
 * mw.Title(name):getUrl instead of mw.url.local(name)
 * By extension: mw.Title(name):getFullUrl instead of mw.url.full(name), mw.Title(name):getCanonicalUrl instead of mw.url.canonical(name)
 * mw.language instead of mw.lang
 * mw.language.convertNumber instead of mw.language.formatNumber
 * mw.Uri.encode instead of mw.url.encode

Where possible, we should simulate the native Lua API instead of inventing our own, specifically:
 * os.date('!*t') instead of mw.time.UTC
 * os.date('*t') instead of mw.time.local
 * os.time instead of mw.time.unixTimestamp
 * os.date(f,t) instead of mw.time.format(f,t)

mw.query.blockSize: a block is a thing that stops a user from editing, so this needs to be called something else. I think it may have to be smaller than 100 -- was that number derived from benchmarking?

mw.site.numberOf*: I don't think it's necessary to implement these.

This proposed ustring interface is certainly an improvement on the previous one.

mw.log needs to be added. That is the top priority in my opinion, since it can't be simulated by recursive parsing.

-- Tim Starling (talk) 07:04, 2 July 2012 (UTC)


 * Following MediaWiki JS API may be a good idea, but I do not think we should make it a key principle. I think following common Lua conventions and being consistent is much more important. PHP is a language which API was designed by principle "mimic other APIs" and that is one of the key reason it is considered a particularly badly designed API.


 * The thing is that JS does not have properties and Lua does, this is why mw.config.get interface would look clumsy here. I do not really think that imitating JS API will be a serious advantage. Once people will need to get the site name, they will most probably open the reference and not try "What if mw.config.get works here?". Even if there is an advantage, in a year everyone will be aware of how to find the site name, but people will be stuck with having to use  instead of  . The first one has three disadvantages:
 * "config" is not really a meaningful name. Site name is a piece of information about site in general, that's what "site" module name means; "config", on other hand, refers to an implementation detail, namely to the fact that site name is a configuration variable.
 * get is superfluous when you have properties.
 * "wg" prefix is also superflous.
 * The problem is, once you give users mw.config.get interface, it's going to be used forever. So, our only chance to get it right is not to include it from the beginning (otherwise, later I would feel bad about it, just as you felt about ParserFunctions).


 * So I'd prefer that we keep mw.site.siteName, etc. Also, I believe we should keep  at its place so we stick to our convention (mw.submodule.function). We could, however, provide some shortcuts for frequently used functions (like mw.message being copy of mw.language.message, since the earlier is easier to type).


 * Saying all that, I agree with you on mw.Title suggestion, except that I would just make it mw.Title(...).fullURL instead of mw.Title(...):getFullURL, since getX functions are bad when you can have proper properties.


 * I do not understand the reason for changing "formatNumber" to "convertNumber", since I can't find it in JS API. Neither I get where does "mw.Uri" name come from. We don't have it in JS API, and it contradicts the "start with a lower-case letter" convention you begin with.


 * I do not want to make MediaWiki time API look like native Lua date API. The reason for this is that MediaWiki time API is much more powerful and not exactly compatible. Date formatting here actually depends on current language and introduces important language-specific modifiers like xg. Date parsing is also more powerful because it can parse stuff like "+2 weeks" (and I did not find a way to do that in Lua). As a bonus, many people around here are already familiar with MW conventions through #time parser function. Those interfaces are different and we should not really mix them. We may implement os.date/os.time functions as a compatibility layer, but I think this is not a priority as for now.


 * mw.query.blockSize may be renamed to mw.query.batchSize. "100" was a number you suggested when I asked you about the limits for set queries back in Berlin. I could do some benchmarking if you give me some ideas about what I should be looking for.


 * Could you elaborate on mw.site.numberOf* properties? I can't see any problems with them, and they are really easy to implement (in fact, I began with them in my prototype).


 * What's mw.log? Is it a debug function? That may be useful, but I think before introducing random debug functions we should come up with some vision (preferably coherent) of how users are supposed to debug those scripts. I mean, we should probably design the whole debug interface, and then see what we need to implement as a part of API.


 * vvvt 00:45, 6 July 2012 (UTC)