Extension talk:Scribunto/Victor's API proposal

A few nice-to-have things
--Tgr (talk) 09:04, 3 June 2012 (UTC)
 * an iterator for going through the characters of an UTF-8 string (you could do that via len+sub, but that is uglier and probably much less efficient)
 * higher-level Unicode functions: normalizations, character classes, sort keys
 * encoding conversion (this is useful when creating external links to search etc. services which expect non-utf8 input)
 * basic data structures (one thing I was missing immediately was a set for efficient whitelist/blacklist lookup) and better support for the built-in table structure (find, index, map etc.)
 * I've added an iterator to the specification. I'm not sure about Unicode normalization — I think we should just normalize all Scribunto output to NFC, which is what MediaWiki uses as a convention (or it is probably already done somewhere else in parser or OutputPage). Unicode character data is possible, but I am not quite sure we want to ship it with the extension (it's quite large). Data structures may be implemented as user libraries; we may ship them with Scribunto if many users find it useful. vvvt 12:40, 3 June 2012 (UTC)

Normalization with character class support provides a nice and language-independent way of stripping accents, which might be good for creating IDs and the like. (Admittedly, it might be a lot of effort compared to the benefits.) A standard data library is very important IMHO; you avoid people reinventing the wheel in every single wikipedia, and thus save a lot of time for more useful activities. Also, extending string or table is quite natural for a data library; if a zillion scripts will all have their own separate way of doing that, it will result in the Lua version of the DLL hell. --Tgr (talk) 20:35, 5 June 2012 (UTC)

ustring OOP
You could actually make ustring work with OO fairly easily: since all strings have their metatable.__index set to string by default, anything you add to that will show up as a method on all strings. So, something like this:

string.ufind = ustring.find

would enable this:

someUnicodeString:ufind(...)


 * That is a nice approach we should consider; I am wary of possible side effects though. vvvt 12:38, 3 June 2012 (UTC)


 * A jQuery-esque approach might be better for efficiency (you only need to parse the string to unicode characters once, then you can use array operations):

local str = U('abcdef')


 * Since parens are optional, you can even recreate the Python syntax:

local str = u'abcdef'


 * --Tgr (talk) 20:28, 5 June 2012 (UTC)


 * Yes, we have considered this, and this is basically not quite possible to do correctly in Lua. vvvt 21:28, 5 June 2012 (UTC)
 * By that, you mean it would not be treated as a string? I think the syntax still has merit. (You cannot use a jQuery object in place of a DOM object either, and they are still very comfortable tools.) If you plan to do a lot of string operations (which wouldn't be a rare thing for a template), then the cost of splitting into UTF-8 characters every single time might become non-trivial; plus you could use object-oriented syntax without worries. You will have to reimplement almost all string functions anyway, and you could teach all MediaWiki-specific APIs to understand it, so there would not be all that many places where you would need to convert back manually to string. --Tgr (talk) 16:19, 9 June 2012 (UTC)
 * That's not the only problem with that. Others are inability to do the length operator and the newindex operator correctly at the same time. Also, it is a reference type, which makes it impossible to use adequately as array keys (something people do very often). vvvt 00:59, 19 June 2012 (UTC)

Tim's comments
Capitalisation should be consistent: function and variable names should start with a lower-case letter.

I think we should follow the JS API where possible, so that people have less to learn. Specifically:
 * mw.config.get instead of mw.lang.UILanguage, mw.lang.contentLanguage, mw.site.siteName, mw.site.version, mw.url.server, mw.url.serverName, mw.url.scriptPath
 * mw.message instead of mw.lang.message
 * mw.Title(name):&lt;method> instead of mw.title.&lt;method>(name)
 * mw.Title(name):getUrl instead of mw.url.local(name)
 * By extension: mw.Title(name):getFullUrl instead of mw.url.full(name), mw.Title(name):getCanonicalUrl instead of mw.url.canonical(name)
 * mw.language instead of mw.lang
 * mw.language.convertNumber instead of mw.language.formatNumber
 * mw.Uri.encode instead of mw.url.encode

Where possible, we should simulate the native Lua API instead of inventing our own, specifically:
 * os.date('!*t') instead of mw.time.UTC
 * os.date('*t') instead of mw.time.local
 * os.time instead of mw.time.unixTimestamp
 * os.date(f,t) instead of mw.time.format(f,t)

mw.query.blockSize: a block is a thing that stops a user from editing, so this needs to be called something else. I think it may have to be smaller than 100 -- was that number derived from benchmarking?

mw.site.numberOf*: I don't think it's necessary to implement these.

This proposed ustring interface is certainly an improvement on the previous one.

mw.log needs to be added. That is the top priority in my opinion, since it can't be simulated by recursive parsing.

-- Tim Starling (talk) 07:04, 2 July 2012 (UTC)