Extension talk:Scribunto/Victor's API proposal

A few nice-to-have things
--Tgr (talk) 09:04, 3 June 2012 (UTC)
 * an iterator for going through the characters of an UTF-8 string (you could do that via len+sub, but that is uglier and probably much less efficient)
 * higher-level Unicode functions: normalizations, character classes, sort keys
 * encoding conversion (this is useful when creating external links to search etc. services which expect non-utf8 input)
 * basic data structures (one thing I was missing immediately was a set for efficient whitelist/blacklist lookup) and better support for the built-in table structure (find, index, map etc.)
 * I've added an iterator to the specification. I'm not sure about Unicode normalization — I think we should just normalize all Scribunto output to NFC, which is what MediaWiki uses as a convention (or it is probably already done somewhere else in parser or OutputPage). Unicode character data is possible, but I am not quite sure we want to ship it with the extension (it's quite large). Data structures may be implemented as user libraries; we may ship them with Scribunto if many users find it useful. vvvt 12:40, 3 June 2012 (UTC)

Normalization with character class support provides a nice and language-independent way of stripping accents, which might be good for creating IDs and the like. (Admittedly, it might be a lot of effort compared to the benefits.) A standard data library is very important IMHO; you avoid people reinventing the wheel in every single wikipedia, and thus save a lot of time for more useful activities. Also, extending string or table is quite natural for a data library; if a zillion scripts will all have their own separate way of doing that, it will result in the Lua version of the DLL hell. --Tgr (talk) 20:35, 5 June 2012 (UTC)

ustring OOP
You could actually make ustring work with OO fairly easily: since all strings have their metatable.__index set to string by default, anything you add to that will show up as a method on all strings. So, something like this:

string.ufind = ustring.find

would enable this:

someUnicodeString:ufind(...)


 * That is a nice approach we should consider; I am wary of possible side effects though. vvvt 12:38, 3 June 2012 (UTC)


 * A jQuery-esque approach might be better for efficiency (you only need to parse the string to unicode characters once, then you can use array operations):

local str = U('abcdef')


 * Since parens are optional, you can even recreate the Python syntax:

local str = u'abcdef'


 * --Tgr (talk) 20:28, 5 June 2012 (UTC)