User:Sumanah/Lua vs Javascript

From mediawiki.org

Forwarded conversation -- Subject: Lua versus JavaScript


On 20/08/11 12:08, Erik Moeller wrote:

Given that we're currently evaluating Lua vs. a custom scripting
language, one question that's on my mind is whether we've sufficiently
evaluated the role JS could play beyond gadgets -- both for
server-side gadgets, as suggested by Trevor, and for inline scripting.
The idea of having a single scripting language that can be used by our
tools/gadgets community is certainly appealing, and the ever-growing
ecosystem around both client-side and server-side JS is as well.

Tim's reply:

I agree that server-side JS would be ideal from a user perspective, which is why I've researched the various compiler/interpreter options for it in some detail. I've settled on Lua as my preferred solution despite its relative unfamiliarity among our users because:

  • All memory allocation, including stack space, is done via a configurable hook function. Thus memory accounting can easily be implemented (I've done it already). Infinite recursion does not lead to a segfault, and there's no chance of a user script sending a server into swap.
  • It's fast. The interpreter is fast, and there's a mature JIT compiler which is very easy to integrate. Code compiled with the Lua JIT compiler executes much faster than code written in PHP running under Zend. Execution speed is critical for the citation application, which has a lot of code executed very often.
  • It's designed to be integrated, so development time is very small. The C interface is well-documented.
  • It can be embedded in the same address space as PHP while still allowing memory and CPU limits. I've benchmarked PHP -> Lua and Lua -> PHP calls at around 0.5us on my laptop, which is at least an order of magnitude faster than any IPC solution.

Lua is intended to be easy to learn, and easy to use for short scripting tasks. It has a feature set very similar to JavaScript, with first-class functions and prototype-based OOP.

Of the potential JavaScript solutions, Rhino is the one I liked the most, and spent the most time talking with Brion about. The problem with it is that it's not feasible to embed it in the same address space as PHP, and startup time is very slow, so you'd have to implement it as a standalone daemon listening on a TCP or Unix socket. That architecture would be complex to develop, would have poor performance compared to Lua, and would probably make it impossible to implement memory limits via control of the Java heap size.

Neither SpiderMonkey nor V8 give you the ability to control memory and stack usage. They're both difficult to embed due to poor documentation and the relative scarcity of embedded implementations. The developers are very focused on the browsers that they primarily support, and less so on a broader ecosystem of end users.

-- Tim Starling


Trevor's reply:

These are all excellent reasons, and it's clear you've done some solid research, but I can't help but feel that the user experience would be taking a back seat to ease of integration of we use Lua. I feel strongly that having a one language system shouldn't really be a nice-to-have, it should be a requirement, especially if we are talking about introducing yet another language.

...Make the syntax of [any] template language identical to JavaScript....

There's also an abandoned but mostly functional JavaScript interpreter written in PHP out there[1]. I realize that it would not be as fast as running programs in V8 or Rhino, but if we are talking about replacing complex templates with JavaScript code, we are still likely to see a dramatic improvement even with a slower interpreter because people will be able to express themselves much simpler ways. An added bonus is that that MediaWiki doesn't add any dependencies because it's all done in PHP code.

I have use Node.js a lot, and know of modules such as node-sandbox[2] which provide some of the limitation and isolation that is needed. Some JavaScript to PHP bridging will need to be added of course, and it's not 100% clear how to best do that, but I'm sure it's possible to do. It's also very likely that JavaScript code running on Node.js is going to start being used on the server side in our infrastructure in the near future. Chat, real-time collaboration, and other websocket/long-polling communication services are natural uses of Node.js, and nearly impossible to do in PHP, even with a very small amount of users.

Hopefully we can come up with something that can take advantage of the rising popularity and familiarity of JavaScript to make our wiki easier to learn and use.

- Trevor

[1] http://j4p5.sourceforge.net/roadmap.php

[2] https://github.com/dominictarr/node-sandbox



Tim: ..... I think there's a risk that the citation templates would actually execute more slowly in an interpreter running on top of PHP, such as WikiScripts or this JavaScript implementation, than they do currently. The nature of the code is such that it puts quite a lot of performance pressure on the executor.

I'm anxious to see benchmarks of the citation templates converted to run in WikiScripts, because I think they'll be shockingly bad and will vindicate my approach. There's no memory limit. It's just a few lines of wrapper JavaScript and a wall clock timer. It's trivial to write a script that fills up all memory and sends a server into swap. That's not a failure mode we want, from a sysadmin perspective.

The stability of the site is the most important consideration. We have a reach of 400 million people and a metatemplate editor community of about 10. So I reckon approximately 99.99999% of our users care more about whether the site is up than whether we use Lua or JavaScript.

-- Tim Starling



Michael Dale: Some thoughts:

Has a php based interpreter for Lua been written? Would the template language make mediaWiki incompatible for vanilla php based installs? WMF would ( of course ) be running some native embeddable interpreter, but the idea of a php based fall-back seems attractive.

Has a JavaScript Lua interpret been written? Would browser based rich text editors need to include a Lua interpreter of some kind? Would the existing wikitext backwards compatibility be obtainable as "near hanging fruit" per the rich text editor efforts that ~has to~ run in JavaScript? Has the JavaScript wikitext parser work been compared to its php counterparts? Is there any possibility of crossover development efforts there?

How do Lua based libraries for CSS DOM / HTML / XML traversal and manipulation compare to JavaScript based libraries?

Is there really a risk that Rino, v8 or IonMonkey based JavaScript JIT would be slower than the existing php based template system? It may be the startup time and memory management of JS is less flexible in the current ecosystem of tools. Will that hold true into the future? If Lua has better embeddable performance characteristics right now, does that mean its "better suited"?

Do we want to preserve the class of "template editor community" to small numbers of individuals? Can we run tests of any kind to help compare ease of addressing traditional and foreseen needs of server side wiki scripting comparing JS to Lua?

While the accessibility and "ease of use" characteristics are harder to evaluate and test than direct embed time and memory constraints characteristics it seems performance characteristics should inherently play a back seat to user accessibility, and crossover development since performance constraints can be addressed with predictable engineering efforts while less accessible or less 'well understood' language can adversely effect contributions and development times in ways that are not easy to directly address on a mass scale by "just adding hardware".

If WMF needs to allocate X times as much RAM and Y times more CPU it seems like a more predictable cost than teaching of people Lua. Unless the development costs of implementing JavaScript wiki script somehow greatly outweigh all these variable accessibility and cross development costs for a larger set of individuals, it seem like JavaScript would be the preferred solution.

There seems to be so much momentum and network effects around JavaScript that I would think the debate would be around "how" to implement JavaScript as the next wiki script language not "if" it should be the next wiki script language.

peace, --michael


Tim: No. MediaWiki installations on shared hosts and the like can support Lua by installing the standard interpreter binary (say by FTP upload) and shelling out to it. Support for such a scheme is already implemented, it was done by Fran Rogers in 2008.

No. However, there is an incomplete Lua to JS translator.

I don't see why that would be necessary. There's no DOM manipulation involved in the target application, so I don't see why that would be a concern.

No. I said there was a risk that an interpreter written in PHP and running on top of Zend may be slower than the existing PHP template I'm not a futurologist. I am, however, tired of waiting for the perfect solution to magically appear.

No.

The design of the interface between the scripting language and PHP will have to be done with input from the people who write templates currently. The feature set of JavaScript and Lua is pretty much identical, so it's hard to imagine how testing could identify something that favours one language over the other.

It doesn't matter how much RAM you buy. If there's no limit on how much RAM a script uses, then it will be able to exhaust all available resources.

It doesn't matter how many cores you have: script execution will run on a single core, and users will have to sit around waiting while execution completes.

That assumes we have enough development time to spend on implementing the features we need inside some JavaScript compiler. We don't have an hour of development time to spend for every hour of editor time we save, because there are more editors than developers.

-- Tim Starling


Erik:

Tim, many thanks for the detailed explanation of why you chose to go for a Lua prototype implementation. This as well as the other comments in this thread has been hugely valuable to me.

I've looked at the prior wikitech threads and I haven't seen these specific arguments there, so I'm guessing this (as well as some of the other considerations mentioned in this thread) would be valuable to share either on-list or on-wiki. I'd also suggest that detailed technical discussion take place there so more folks have a chance to weigh in or write code to prove people wrong. :)

My tentative takeaways so far (which I'm happy to post to a relevant public thread as well):

1) IMO this would be useful to keep as a possible hacking and discussion project for New Orleans, depending on the state of the implementations at that time.

2) There's been agreement in this thread that JS would be preferable from a user/dev perspective, but it's also clear that Lua is the closest thing we have so far to a working implementation that can scale for the particular use case of inline-scripts (which, not to forget, really is a tough one since we need to have all template code in a page with hundreds of templates executed with minimal wait time for the editor on save or preview).

I'd love to see proof-of-concept implementations of inline-scripting in JS that could scale with acceptable performance/execution characteristics.

3) Given that it's not entirely clear that Brainfuck<Template programming or the other way around, it's pretty evident that any inline scripting solution that meets the real world use cases would be a huge improvement on current state. That doesn't mean we don't have an obligation to get things right -- but I'd be thrilled to see something deployed that gets us 80% of the way there. ;-)

One open question I have, which came up briefly in this discussion: Are there significant implications of this decision for the editor/parser work? My understanding is that the visual editor will never have to execute template code -- that it will only need to be aware of the template calls and the rendered output as delivered by the server. Is that correct? Are there cases where the client will have to execute these inline scripts, either in the context of the editor, or in some other specific future applications we or others may want to develop?

I'm fine with a solution that's imperfect for devs, but I want to make sure we're not accidentally painting ourselves into corners. :)

Thanks, Erik


Brion:

This is still a bit icky, and won't work at all on some hosts that disable or limit execution/shell-outs. That may be a decision we're willing to make, but it does up the dependency & installation requirements for anybody that actually wants to make use of these templates.


If based on a fully-compatible parser in the client side to do all the rendering, then yes. If communicating with a server-side parser to render out templates, then probably not.

Since we expect to start editor test deployments with client-side JS code this may or may not be something we need to worry about (since I presume it won't be a while yet before we have those JS or Lua templates running in production).


nod resource limits are a hard requirement here. It's trivial to write JS code to use up all your RAM:

function eat() {
  var s = '*', // start with one char!
       a = []; // keep all our data around so it can't even be GC'd
  while(true) {
    a.push(s);
    s = s + s; // keep making bigger strings forever
  }
}

I'm sure the equivalent Lua code is pretty straightforward as well. ;)

On node.js this crashes the process after just 29 iterations, as V8's heap size is currently locked at a relatively small 1 or 1.9GB and it doesn't take long to reach that; if you're more patient, you could easily stay under that limit and still lock up huge amounts of memory for as long as each script runs.


We need to be able to halt the embedded script after some amount of time (wall clock or opcode ticks, whatever) or on some amount of memory usage.

I know the JS systems have a way to halt on time -- browsers will pop up a "Do you want to stop this script?" dialog if something runs too long -- but don't know offhand how easy it is to limit on memory usage, especially if an IPC or network-based server handles multiple scripts from one process context.

If they're one-off spawned processes then you can of course use ulimit as we do for shelling out to convert, latex etc -- if using a networked server then it may be necessary to do some fancy footwork setting up process-wide limits (and/or setting heap limits explicitly, if possible) and respawning processes if/when they fail.

Just as a note -- SpiderMonkey (the C++-based JS engine in Firefox) at least has support for allocating data from different script domains in separate "compartments" which can at least be monitored separately in about:memory in the latest versions. Whether it's possible to have it actually cap those compartments separately I don't know.


I'm actually not exactly sure how to limit the Lua heap size either though; the documentation doesn't seem very clear on that...

I get the impression it requires implementing a custom allocator function that itself keeps track of how much has been allocated? http://lua-list.2524044.n2.nabble.com/New-behaviour-of-lua-Alloc-missing-in-list-of-API-changes-td6220112.html

-- brion


Tim:

[considering Extension:WikiScripts I believe ]

I think running untrusted JavaScript on the client side with no memory limit would be almost as bad as running it on the server side. The abstraction of memory allocation is the hard part. If they have that but not memory limits, we could probably add memory limits. Yes, and return NULL when the limit is exceeded. Here's my custom allocator:

<http://svn.wikimedia.org/viewvc/mediawiki/trunk/php/luasandbox/luasandbox.c?revision=94863&view=markup#l504>

Lua is able to tolerate having its allocator function return NULL, unlike most C programs which develop nasty bugs or crash when malloc() returns NULL. It does a longjmp() (or throws an exception when compiled under C++) to return control to lua_pcall(), which then returns an error message to its caller.

longjmp() safety is not entirely trivial. The calling code has to be aware of the possibility of a longjmp() so that it doesn't leak memory or corrupt the state in other ways. It's basically the same as exception safety in C++, except without the compiler support, and without the bulk of the participating developers being aware of the issue.

-- Tim Starling



Brion:

On Wed, Aug 24, 2011 at 5:24 PM, Tim Starling <tstarling@wikimedia.org> wrote:

[on crazy js code]


Indeed. :) One would want to do some sandboxing there as well... JS-on-JS sandboxing could be done with an intermediary layer like caja[1] or by separating into an isolated iframe context; in any case that's not a bridge that needs immediate crossing.

[1] http://code.google.com/p/google-caja/


[spidermonkey & memory compartments]


Good to keep in mind!


+1 awesome :D Certainly looks like we'll get good prototyping out of the lua end first.

-- brion