Extension talk:Scribunto/Victor's API proposal

A few nice-to-have things
--Tgr (talk) 09:04, 3 June 2012 (UTC)
 * an iterator for going through the characters of an UTF-8 string (you could do that via len+sub, but that is uglier and probably much less efficient)
 * higher-level Unicode functions: normalizations, character classes, sort keys
 * encoding conversion (this is useful when creating external links to search etc. services which expect non-utf8 input)
 * basic data structures (one thing I was missing immediately was a set for efficient whitelist/blacklist lookup) and better support for the built-in table structure (find, index, map etc.)
 * I've added an iterator to the specification. I'm not sure about Unicode normalization — I think we should just normalize all Scribunto output to NFC, which is what MediaWiki uses as a convention (or it is probably already done somewhere else in parser or OutputPage). Unicode character data is possible, but I am not quite sure we want to ship it with the extension (it's quite large). Data structures may be implemented as user libraries; we may ship them with Scribunto if many users find it useful. vvvt 12:40, 3 June 2012 (UTC)

Normalization with character class support provides a nice and language-independent way of stripping accents, which might be good for creating IDs and the like. (Admittedly, it might be a lot of effort compared to the benefits.) A standard data library is very important IMHO; you avoid people reinventing the wheel in every single wikipedia, and thus save a lot of time for more useful activities. Also, extending string or table is quite natural for a data library; if a zillion scripts will all have their own separate way of doing that, it will result in the Lua version of the DLL hell. --Tgr (talk) 20:35, 5 June 2012 (UTC)

ustring OOP
You could actually make ustring work with OO fairly easily: since all strings have their metatable.__index set to string by default, anything you add to that will show up as a method on all strings. So, something like this:

string.ufind = ustring.find

would enable this:

someUnicodeString:ufind(...)


 * That is a nice approach we should consider; I am wary of possible side effects though. vvvt 12:38, 3 June 2012 (UTC)


 * A jQuery-esque approach might be better for efficiency (you only need to parse the string to unicode characters once, then you can use array operations):

local str = U('abcdef')


 * Since parens are optional, you can even recreate the Python syntax:

local str = u'abcdef'


 * --Tgr (talk) 20:28, 5 June 2012 (UTC)


 * Yes, we have considered this, and this is basically not quite possible to do correctly in Lua. vvvt 21:28, 5 June 2012 (UTC)
 * By that, you mean it would not be treated as a string? I think the syntax still has merit. (You cannot use a jQuery object in place of a DOM object either, and they are still very comfortable tools.) If you plan to do a lot of string operations (which wouldn't be a rare thing for a template), then the cost of splitting into UTF-8 characters every single time might become non-trivial; plus you could use object-oriented syntax without worries. You will have to reimplement almost all string functions anyway, and you could teach all MediaWiki-specific APIs to understand it, so there would not be all that many places where you would need to convert back manually to string. --Tgr (talk) 16:19, 9 June 2012 (UTC)
 * That's not the only problem with that. Others are inability to do the length operator and the newindex operator correctly at the same time. Also, it is a reference type, which makes it impossible to use adequately as array keys (something people do very often). vvvt 00:59, 19 June 2012 (UTC)