Topic on Talk:Offline content generator/Architecture

Simple set up for casual MediaWiki users

10
Saper (talkcontribs)

As one of the few people who actually cared about Extension:Collection bugs in some recent past:

I understand that some developers fell in love with node.js (is MediaWiki going to be rewritten in JavaScript for phase four?) but please please keep it simple to install for the Jimmy the casual MediaWiki user which has PHP and maybe Python installed on his Linux/Windows box.

I've seen some people reporting bugs against their own mwlib installations as some organisations don't want the public PediaPress renderer.

I understand that there are some performance issues right now - it would be certainly beneficial to explain in the beginning of the document "why are we doing this" and "what options do we have".

I also don't think that making Wikipedia to use the cool nodejs/redis solution and leaving Jimmy the casual MediaWiki user with mwlib is viable - users will be frustrated they don't get the same output as Wikipedia's on their wikis and there will be lots of unnecessary troubleshooting work. Not sure how going through replication of WMF setup will be possible for moderately advanced user who is not living in a Puppet world.

Mwalker (WMF) (talkcontribs)

For the class of user that would be installing and running their own mwlib installs; the nodejs and redis queue system would require either a similar amount of complexity or less. We're moving away from the mwlib system for several reasons -- one of which is how difficult it is to run in production.

The primary dependencies as proposed, Node and Redis, are available as Ubuntu packages (we're going to run node 0.10 in production only so that the WMF cluster is running a unified version). Installation is then to pull the git repository of the new renderer and start the node process -- we will most likely provide an upstart script so that daemonization is easy.

We will continue to use standard Ubuntu packages for additional binary dependencies like PhantomJS, pdftk, pngcrush, and latex. For those packages which must be backported (if any, we don't currently have one beyond node) the package will be in the gerrit repo for download. Ideally we will eventually provide a Debian package of this solution.

Maintaining compatibility with the existing internal API is a design requirement. That means that any user who chooses to use the mwlib solution provided by PediaPress will be able to do so. (And in fact the WMF will be using that same service to provide the on demand book printing.)

I choose not to use Python in this case for the render server because our backend renderer is PhantomJS which is controlled via Javascript. Using Node means that we have all our code in one language. Additionally; there is no render system which will be purely a drop in with just PHP/Python unless we pushed rendering down into the user's browser -- which we don't wish to do.

Mwalker (WMF) (talkcontribs)

You are correct though that this solution does require Parsoid. I feel that it is a reasonable requirement, more and more features for MediaWiki are requiring it (VisualEditor and Flow are the big ones.)

In the bigger context, something has to parse the wikitext into usable output. We can't just take the output from the api.php?action=render because it doesn't provide enough semantic information (I have no idea what that API is designed to be used for, but it's clearly not this.) Maybe in the future we will be able to use a similar native API call; but I only have till the 22nd to come up with something usable for the WMF.

Anomie (talkcontribs)

The API doesn't have an "action=render", so I'm not sure what you're talking about there.

Jeremyb (talkcontribs)

maybe index.php?action=render ?

Mwalker (WMF) (talkcontribs)

Ah, no; I mis-remembered -- it's action=parse. Which appears to give the HTML output of the PHP parser. Which is great; but it of course is missing the RDF markup -- and you have to traverse it looking for specific classes to remove things like edit links, and the table of contents,

Anomie (talkcontribs)

Of course, since the PHP parser doesn't generate RDF markup in the first place. That's something that was introduced in Parsoid due to the needs of VE.

GWicke (talkcontribs)

Not just VE. The intention was always to expose all semantic information so that any kind of client can easily extract and manipulate it.

Cscott (talkcontribs)

If the user has PHP and "one other scripting language" installed, it doesn't seem to make a compelling difference whether that "other scripting language" is python or node.

That said, currently the real barrier to Jane Wikipedia is actually all the *other* stuff needed for PDF rendering: fonts, LaTeX, python extensions, imagemagick, pngcrush, etc, etc. There are lots of issues here, but rest assured we're not deliberately trying to make the system harder to install.

Saper (talkcontribs)

As a person who actually needed to compile v8 to get node working right I disagree is easy.

node has also interesting way to plug its modules (quite cool in my opinion but sometimes confusing) and I don't believe that installing necessary npms via standard OS distribution means is something workable in the long term.

Reply to "Simple set up for casual MediaWiki users"