Extension talk:Scribunto/Victor's API proposal

From mediawiki.org
Latest comment: 11 years ago by Tim Starling in topic Language module

A few nice-to-have things[edit]

  • an iterator for going through the characters of an UTF-8 string (you could do that via len+sub, but that is uglier and probably much less efficient)
  • higher-level Unicode functions: normalizations, character classes, sort keys
  • encoding conversion (this is useful when creating external links to search etc. services which expect non-utf8 input)
  • basic data structures (one thing I was missing immediately was a set for efficient whitelist/blacklist lookup) and better support for the built-in table structure (find, index, map etc.)

--Tgr (talk) 09:04, 3 June 2012 (UTC)Reply

I've added an iterator to the specification. I'm not sure about Unicode normalization — I think we should just normalize all Scribunto output to NFC, which is what MediaWiki uses as a convention (or it is probably already done somewhere else in parser or OutputPage). Unicode character data is possible, but I am not quite sure we want to ship it with the extension (it's quite large). Data structures may be implemented as user libraries; we may ship them with Scribunto if many users find it useful. vvvt 12:40, 3 June 2012 (UTC)Reply

Normalization with character class support provides a nice and language-independent way of stripping accents, which might be good for creating IDs and the like. (Admittedly, it might be a lot of effort compared to the benefits.) A standard data library is very important IMHO; you avoid people reinventing the wheel in every single wikipedia, and thus save a lot of time for more useful activities. Also, extending string or table is quite natural for a data library; if a zillion scripts will all have their own separate way of doing that, it will result in the Lua version of the DLL hell. --Tgr (talk) 20:35, 5 June 2012 (UTC)Reply

ustring OOP[edit]

You could actually make ustring work with OO fairly easily: since all strings have their metatable.__index set to string by default, anything you add to that will show up as a method on all strings. So, something like this:

   string.ufind = ustring.find

would enable this:

   someUnicodeString:ufind(...)
That is a nice approach we should consider; I am wary of possible side effects though. vvvt 12:38, 3 June 2012 (UTC)Reply
A jQuery-esque approach might be better for efficiency (you only need to parse the string to unicode characters once, then you can use array operations):
   local str = U('abcdef')
Since parens are optional, you can even recreate the Python syntax:
   local str = u'abcdef'
--Tgr (talk) 20:28, 5 June 2012 (UTC)Reply
Yes, we have considered this, and this is basically not quite possible to do correctly in Lua. vvvt 21:28, 5 June 2012 (UTC)Reply
By that, you mean it would not be treated as a string? I think the syntax still has merit. (You cannot use a jQuery object in place of a DOM object either, and they are still very comfortable tools.) If you plan to do a lot of string operations (which wouldn't be a rare thing for a template), then the cost of splitting into UTF-8 characters every single time might become non-trivial; plus you could use object-oriented syntax without worries. You will have to reimplement almost all string functions anyway, and you could teach all MediaWiki-specific APIs to understand it, so there would not be all that many places where you would need to convert back manually to string. --Tgr (talk) 16:19, 9 June 2012 (UTC)Reply
That's not the only problem with that. Others are inability to do the length operator and the newindex operator correctly at the same time. Also, it is a reference type, which makes it impossible to use adequately as array keys (something people do very often). vvvt 00:59, 19 June 2012 (UTC)Reply

Tim's comments[edit]

Capitalisation should be consistent: function and variable names should start with a lower-case letter.

I think we should follow the JS API where possible, so that people have less to learn. Specifically:

  • mw.config.get() instead of mw.lang.UILanguage, mw.lang.contentLanguage, mw.site.siteName, mw.site.version, mw.url.server, mw.url.serverName, mw.url.scriptPath
  • mw.message() instead of mw.lang.message()
  • mw.Title(name):<method>() instead of mw.title.<method>(name)
  • mw.Title(name):getUrl() instead of mw.url.local(name)
    • By extension: mw.Title(name):getFullUrl() instead of mw.url.full(name), mw.Title(name):getCanonicalUrl() instead of mw.url.canonical(name)
  • mw.language instead of mw.lang
  • mw.language.convertNumber instead of mw.language.formatNumber
  • mw.Uri.encode instead of mw.url.encode

Where possible, we should simulate the native Lua API instead of inventing our own, specifically:

  • os.date('!*t') instead of mw.time.UTC
  • os.date('*t') instead of mw.time.local
  • os.time() instead of mw.time.unixTimestamp
  • os.date(f,t) instead of mw.time.format(f,t)

mw.query.blockSize: a block is a thing that stops a user from editing, so this needs to be called something else. I think it may have to be smaller than 100 -- was that number derived from benchmarking?

mw.site.numberOf*(): I don't think it's necessary to implement these.

This proposed ustring interface is certainly an improvement on the previous one.

mw.log() needs to be added. That is the top priority in my opinion, since it can't be simulated by recursive parsing.

-- Tim Starling (talk) 07:04, 2 July 2012 (UTC)Reply

Following MediaWiki JS API may be a good idea, but I do not think we should make it a key principle. I think following common Lua conventions and being consistent is much more important. PHP is a language which API was designed by principle "mimic other APIs" and that is one of the key reason it is considered a particularly badly designed API.
The thing is that JS does not have properties and Lua does, this is why mw.config.get() interface would look clumsy here. I do not really think that imitating JS API will be a serious advantage. Once people will need to get the site name, they will most probably open the reference and not try "What if mw.config.get() works here?". Even if there is an advantage, in a year everyone will be aware of how to find the site name, but people will be stuck with having to use mw.config.get( "wgSitename" ) instead of mw.site.siteName. The first one has three disadvantages:
  1. "config" is not really a meaningful name. Site name is a piece of information about site in general, that's what "site" module name means; "config", on other hand, refers to an implementation detail, namely to the fact that site name is a configuration variable.
  2. get() is superfluous when you have properties.
  3. "wg" prefix is also superflous.
The problem is, once you give users mw.config.get() interface, it's going to be used forever. So, our only chance to get it right is not to include it from the beginning (otherwise, later I would feel bad about it, just as you felt about ParserFunctions).
So I'd prefer that we keep mw.site.siteName, etc. Also, I believe we should keep mw.langauage.message() at its place so we stick to our convention (mw.submodule.function()). We could, however, provide some shortcuts for frequently used functions (like mw.message() being copy of mw.language.message(), since the earlier is easier to type).
Saying all that, I agree with you on mw.Title() suggestion, except that I would just make it mw.Title(...).fullURL instead of mw.Title(...):getFullURL(), since getX() functions are bad when you can have proper properties.
I do not understand the reason for changing "formatNumber" to "convertNumber", since I can't find it in JS API. Neither I get where does "mw.Uri" name come from. We don't have it in JS API, and it contradicts the "start with a lower-case letter" convention you begin with.
I do not want to make MediaWiki time API look like native Lua date API. The reason for this is that MediaWiki time API is much more powerful and not exactly compatible. Date formatting here actually depends on current language and introduces important language-specific modifiers like xg. Date parsing is also more powerful because it can parse stuff like "+2 weeks" (and I did not find a way to do that in Lua). As a bonus, many people around here are already familiar with MW conventions through #time parser function. Those interfaces are different and we should not really mix them. We may implement os.date()/os.time() functions as a compatibility layer, but I think this is not a priority as for now.
mw.query.blockSize may be renamed to mw.query.batchSize. "100" was a number you suggested when I asked you about the limits for set queries back in Berlin. I could do some benchmarking if you give me some ideas about what I should be looking for.
Could you elaborate on mw.site.numberOf* properties? I can't see any problems with them, and they are really easy to implement (in fact, I began with them in my prototype).
What's mw.log()? Is it a debug function? That may be useful, but I think before introducing random debug functions we should come up with some vision (preferably coherent) of how users are supposed to debug those scripts. I mean, we should probably design the whole debug interface, and then see what we need to implement as a part of API.
vvvt 00:45, 6 July 2012 (UTC)Reply
I favor an api that is "lua like" over something forcefully resembling our JS api (though I think both should be key criteria in the API design). The best way to counter possible confusion, might be to have built in documentation (like api.php does, but nicer UI :D ) and possibly to extend CodeEditor with libraries for our private APIs and with Code completion.
Debugging is an interesting thing btw. I have found that due to the server side execution, a huge amount of trial and error writing is involved right now. We could build in a full debugger, but that is always going to be difficult due to server side execution I think. An alternative might be "live editing", with an indicator for wether or not module.unittests are succeeding (and if not, bring up something like the stack trace )? That would almost enforce an approach of test driven development. TheDJ (talk) 07:39, 6 July 2012 (UTC)Reply
See modules mediawiki.Uri and mediawiki.Title. About the use of uppercase, see also Thread:Talk:ResourceLoader/Default modules/Capitalization of module names. Helder 14:44, 6 July 2012 (UTC)

Enabled?[edit]

Are these packages enabled anywhere? Or is it just a plan? --Amir E. Aharoni (talk) 20:20, 4 September 2012 (UTC)Reply

Scribunto is enabled on mediawiki.org and on the test2 test wiki. We don't have a plan, yet, for when to enable it across other WMF wikis.
Tim just said: "it's not even a plan, just an idea" regarding the packages discussed on Extension:Scribunto/API specification -- so, just brainstorming. Sumana Harihareswara, Engineering Community Manager (talk) 00:18, 6 September 2012 (UTC)Reply

Language module[edit]

Functions like case conversion and plural formatting depend on language, so they belong in a language object, like in MediaWiki's PHP API. It doesn't make sense to make them static functions. The JS API exposes static functions which use the user language, and Victor's proposed API exposes static functions which use the content language. Instead, I would like to have an mw.getContentLanguage(), which returns an object with various methods more or less named after the Language class methods. It would correspond to $wgContLang.

We could also have an mw.getLanguage(code) function. This would help to support multi-language wikis such as commons, where $wgContLang doesn't necessarily match the language used on the page. -- Tim Starling (talk) 03:23, 19 November 2012 (UTC)Reply