Topic on Talk:Release Management RFP/2013/NicheWork and Hallo Welt!

Release Management and Visual Editor

14
MarkAHershberger (talkcontribs)

Ori.livneh asks How could we make VisualEditor more usable for third parties? As release managers, what role would you play in this process?

The dependency on node.js is really problematic for tarball users. Installing an extension (any extension, even without a dependency on node.js) is really too much of a problem right now. We plan to address that, but that solution probably won't be able to take into account a dependency on node.js.

A WYSIWYG editor is one of the top requests and some third parties will be able to use VisualEditor. However, making it "more usable for third parties" would mean making installation as easy as installing a PHP-only extension. That means rewriting Parsoid in PHP -- something I just don't see happening by the WMF.

Bottom line: the user already has a PHP webserver set up and configured. Installing and configuring a node.js server "just" to enable an extension -- when lots of people have already installed MediaWiki without a WYSIWYG editor -- isn't going to happen.

MarkAHershberger (talkcontribs)

Forgot to address: As release managers, what role would you play in this process?

As release managers, our focus is on third party users. We would like to work with the WMF to raise concerns that we or those in the MediaWiki using community have. Perhaps the Foundation would be willing to help develop a PHP-alternative to the node.js dependency. That would be ideal.

But we could use this capacity to raise funds that focus are aimed at developing the PHP alternative or simply gather a group of volunteer developers and coordinate their work.

Jdforrester (WMF) (talkcontribs)

Just to clarify from the VisualEditor/Parsoid stand-point, we're keen to enable MW to run wikitext-less, instead storing pages in the HTML+RDFa spec format (and so have no need of Parsoid, but there would be need for a small post-processor that we want to split from Parsoid anyway, which could more trivially be written in PHP than nodeJS); this has been discussed as part of the much-mooted "MediaWiki 2.0" a few times, but nothing's decided yet, of course. :-)

MarkAHershberger (talkcontribs)

That sounds great, but if VisualEditor/Parsoid is only a shim while we wait for the great MediaWiki 2.0, it would be nice to support the users of the MediaWiki tarball itself while we wait for the transition to MW2.0.

Part of is a communication problem (which are what most problems on Open Source projects are). The idea of MW2.0 has been "in the air" but the MW tarball-using community doesn't have a clue what that would entail. And since the idea is only "in the air", there haven't been any focused discussions with third party users about their needs and uses.

Yes, developers would love HTML+RDFa -- it would give them a better toolset. Editors would like a more visual interface -- it would make working on the wiki easier. For those two needs, HTML+RDFa is wonderful.

But people have modified MediaWiki over and over again. That isn't anyone's fault but their own, of course. However, that means that people who patched 1.10 to their needs are still running it. So part of the work is engaging the community and figuring out how to get them too keep current.

It becomes harder to do that, of course, if they're relying on their hacks that are tied to wikitext and WMF suddenly creates MW2.0 with HTML+RDFa.

So, yeah, coming out and saying something that sounds like "The node.js dependency doesn't matter because we're going to be changing the page store" sounds much scarier than saying "Why don't we port Parsoid to PHP?"

(I understand that weren't saying that the node.js dependency doesn't matter, but I'm just telling you what it sounds like. This is probably a discussion that belongs somewhere else, though.)

Jdforrester (WMF) (talkcontribs)

Completely understand. And I look forward to the person-years' worth of work to make a dead-end port of Parsoid into PHP being done by someone who cares enough. But I think we should be clear and open to MW users - be they huge organisations that can manage or tiny operations that run off free PHP-only webserver tiers - don't have an upgrade path right now to VisualEditor, and that that's a significant potential issue for many. :-(

MarkAHershberger (talkcontribs)

Do Wikia and WikiHow fit the definition of "huge corporations that can manage"? They are certainly the two largest non-WMF installations that I know of.

WikiHow hasn't yet upgraded beyond 1.12 . Wikia did recently upgrade to 1.19.

They both have a significant base of code to support and it sounds like MW2.0 as you've envisioned it is essentially a fork that will be incompatible with anything done for MW1.x.

As a result, I don't think porting Parsoid to PHP would be a dead end. There is a huge installed base of wikis that could use such a thing.

One of my clients is a large multinational with significant IT resources. However, they were still running MW 1.11 until recently when I upgraded them to 1.19. I think most uses "in the wild" are more like that than someone who is ready now to upgrade to whatever the next new thing is.

They'd be happy to have a visual editor that works in MW 1.x (and that is what VisualEditor offers) instead of the something makes their current modifications unusable.

Jdforrester (WMF) (talkcontribs)
Do Wikia and WikiHow fit the definition of "huge corporations that can manage"? They are certainly the two largest non-WMF installations that I know of. WikiHow hasn't yet upgraded beyond 1.12. Wikia did recently upgrade to 1.19.

Yes, they were the organisations I was thinking of, alongside corporate internal users of MediaWiki like my former employers (I'm well aware of their current MW statuses; remember that I work with Wikia every day on VisualEditor :-)).

They both have a significant base of code to support and it sounds like MW2.0 as you've envisioned it is essentially a fork that will be incompatible with anything done for MW1.x.

You seem to be confused; I'm not proposing or particularly defending this course of action, merely mentioning it because it seemed like you were unaware of it in the nature of your comments. The architectural and future strategy decisions of MediaWiki are something in which I take a strong interest in a personal and professional capacity, but I am not part of any secret cabal that is making these decisions. Indeed, the biggest problem with MediaWiki's future I would highlight (as you did) is that there doesn't appear to be such a person or group, and decisions don't get made so much as stumbled into.

As a result, I don't think porting Parsoid to PHP would be a dead end. There is a huge installed base of wikis that could use such a thing.

Providing a nicer horse is still a dead end in the world of the motor car. The fact that most people currently have horses doesn't mean that they will forever. "Dead end" does not mean "bad", it means "a shrinking user base", in this case.

One of my clients is a large multinational with significant IT resources. However, they were still running MW 1.11 until recently when I upgraded them to 1.19. I think most uses "in the wild" are more like that than someone who is ready now to upgrade to whatever the next new thing is.

Absolutely, and that's why I'd be keen to see the model of LTS continued in the future. :-)

They'd be happy to have a visual editor that works in MW 1.x (and that is what VisualEditor offers) instead of the something makes their current modifications unusable.

… and if you're happy to run a NodeJS server in your mix, VisualEditor/Parsoid will likely support MW 1.x for a considerable period. However, if you want a PHP-only shop, I can't see WMF spending the quite large resources from donor funds to port Parsoid over from NodeJS to PHP, given that part of the point of Parsoid was to get away from PHP's slowness for a task of this sort. In that case, the choices would be to run MW in wikitext-less mode (so no need for Parsoid) or to try to get someone to port Parsoid to PHP or another language of your choice.

Hence my point - if not WMF, who will do this? We can't just tell people using MW 1.x that "it'll be all right" if we don't have a plan.

MarkAHershberger (talkcontribs)
You seem to be confused; I'm not proposing or particularly defending this course of action, merely mentioning it because it seemed like you were unaware of it in the nature of your comments.

You're right. I wasn't aware of anyone talking about MW2.0 in that way. I proposed naming the latest MW release "2.0" but didn't really get much support for that. No one pointed me to MediaWiki 2.0, either. I like my reasoning for 2.0 better.

And thanks for clarifying what you mean. We seem to have a similar understanding of the situation.

Providing a nicer horse is still a dead end in the world of the motor car.

In the late 19th century, the buggy whip business was still not a dead end even though the motor car was invented. We don't know yet if this redesigned MW with a new page store will even catch on or give people a reason to start thousands of new wikis.

Predicting a shrinking user base with MW2.0 as the cause is premature when the thing is still just a pipe dream.

I can't see WMF spending the quite large resources from donor funds to port Parsoid over from NodeJS to PHP [...] if not WMF, who will do this? We can't just tell people using MW 1.x that "it'll be all right" if we don't have a plan.

I totally agree. The WMF should not put any money into that. They already put money into Parsoid and it seems to be working well.

If this thing happens, it has to be done by people outside the Foundation. If we win this proposal, that is something that Markus and I will look to get funds for outside the Foundation. The Foundation has made it clear that they do not want to be the only organisation funding MW's release management, so I don't see a problem.

given that part of the point of Parsoid was to get away from PHP's slowness for a task of this sort.

I'm not familiar with any information in this area. I'm not saying it doesn't exist or that you aren't right -- I'm just completely ignorant. What is the task that PHP is slow at? How does node.js handle that task better?

GWicke (talkcontribs)

Just a note on the Parsoid plans: We are working towards HTML-only MediaWiki roughly following the roadmap at Parsoid/Roadmap. At some point, Parsoid will no longer be needed for new wikis without wikitext-based templates.

For existing wikis with wikitext-based templating (including Lua etc) we will however still need to have Parsoid around for a long time.

With the emergence of cheap (~ $7 / month for 1G RAM) VMs the market for hosting has shifted a lot in the last years. Most users of such VMs would be much better served with a proper Debian package ('apt-get install mediawiki') that installs and configures a basic set of dependencies including memcached, the PHP Lua extension, Parsoid and Varnish. Similar packaging could be done for popular one-click installers in admin panels. Specialized wiki hosters already have the resources and motivation to set all this up manually.

A PHP port of Parsoid would at best deliver a terribly slow parser that nobody would want to use, especially not on resource-limited shared hosting installs it would supposedly serve. In most benchmarks PHP is at least an order of magnitude slower than V8.

MarkAHershberger (talkcontribs)
Most users of such VMs would be much better served with a proper Debian package ('apt-get install mediawiki') that installs and configures a basic set of dependencies including memcached, the PHP Lua extension, Parsoid and Varnish.

Totally agree! This is why I started working with the Debian and Fedora packages instead of telling users "don't use the packaged MediaWiki". As long as there is a smooth pathway to do that, it shouldn't be a problem.

The problem is with the thousands of installed wikis that are already out there. If we can provide them with a supported WYSIWYG editor that they can install into their existing wiki, that would be a real win.

A PHP port of Parsoid would at best deliver a terribly slow parser that nobody would want to use.

I would like to see a side-by-side comparison. I haven't looked at benchmarks for PHP vs V8. I have heard a lot of good things about v8 and, yes, PHP is crap. That isn't the point, though.

Most people don't run sites as popular as Wikipedia and a slower parser won't matter. I have a client who runs on a VM with about 512M of memory. Speed isn't what he is concerned with. If you have a PHP parser that is slower, but doesn't take minutes to run, the installation and overhead of running node.js is going to be a problem.

MarkAHershberger (talkcontribs)
In most benchmarks PHP is at least an order of magnitude slower than V8.

After I went looking for benchmarks, I came across the one you probably meant when you said that. You'll find some benchmarks saying PHP is an order of magnitude slower when were talking about both languages performing the same operation 1,000,000 times in under one second. For most people running MW, these benchmarks are meaningless -- both ran more times than they need in under a second and the PHP version is easier to install on their webserver. For a site like WMF, though, these things make a big difference.

But you can also find benchmarks where PHP performs 50x faster than v8 because the PHP is using code written in C underneath.

Shootouts like this which are simply mathematical computations run multiple times are meaningless, though. Lets see how fast code written to do something we want done performs in each language.

If v8 is still better, then maybe we can skip node.js and just run the javascript the way we handle Scribunto: in a PHP module.

GWicke (talkcontribs)

Here is a benchmark that is very close to what we do in Parsoid: build and traverse a lot of data structures. V8 is 3.2 times slower than C, while PHP is 57 times slower. Here is a comparison between V8 and PHP across several benchmarks. Apart from pure efficiency, we also use asynchronous operations massively to perform hundreds of API requests in parallel on large pages. This effectively hides the API latency and distributes processing for a single page across many cores and machines. Despite all this, rendering a large page from scratch in Parsoid still takes up to 40 seconds.

In PHP, there is no decent support for highly asynchronous processing. For a large and complex page like en:Barack Obama you would simply get a timeout.

MarkAHershberger (talkcontribs)
In PHP, there is no decent support for highly asynchronous processing.

Now we're getting somewhere. This is a real problem with PHP. It is designed to process a request in the webserver that it can respond to immediately (or some approximation thereof).

Still, people are doing asynchronous things with PHP all the time. What is the communication model for asynchronous activities in node.js that PHP could not support?

MarkAHershberger (talkcontribs)

I did want to point out that I agree that those particular benchmarks are kind of shocking. I tried switching to an array instead of objects (thinking maybe the OO bit was causing the slowness) but that didn't improve things.

My next step is to look at HipHop.

But I'd still like to get an answer to my first question: what is the communication model that you're using for asynchronous actions that PHP could not support.

Reply to "Release Management and Visual Editor"