User:NeilK/Worklog/2011-01-25 to 2011-01-31

Projects: Upload Wizard, MediaStorage, Resource Loader

Status:

Last week:
 * Yak shaving with mw.Language.js
 * all the above from last week
 * Gathering stats to provide some minimum performance metrics for MediaStorage - it's amazing how little we know
 * Trying to land some backend changes into 1.17, although I realize this may be a lost cause now -- will be talking in person to Roan, Sam

This week:
 * landing major changes to mw.Language.js into trunk
 * again UploadStash backend changes for 1.17
 * working on multimedia agenda (w/Kaldari?)

January 25, 2010
Reconciling patches to mw.Language etc.

Trying to write better parser from scratch; think I will give up and use or write a simpler parser library

January 26, 2010
Michael Dale in the house!

Was trying to work on alternative parser. Michael may have talked me out of that...

Ended up pairing with Michael Dale on fixing his patch up -- we both agreed that the jQuery functionality was best broken out into a jQuery-specific module. Paired on making that work from about 12:30pm to 4:00pm, including Jasmine tests. Note: it is not much fun trying to extend the libs that Roan & Trevor have written, they are a bit privacy-happy (correction: or are they -- maybe Message is extendable after all?)

Should discuss, anyway.

So now gM only returns text, and to use jQuery parameters, you have to provide a selector and use $j.mwMessage;

Rationalized some aspects of this library. Managed to reduce several lines with the use of regular expressions.

Am still unhappy with mdale's use of the term 'swap', especially 'magic swap', which is pretty unclear to a developer. And how we basically have two parsers in operation, one in mwMessage does link parsing by itself and template argument substitution (including jquery object magic) BUT BUT BUT then calls out to mediawiki.language.parser to do hacky-regex-rec-descent-style parsing for PLURAL and all other such things.

Still would like a parser that gave us an AST, rather than all this hackery. It might take a bit more code though, although perhaps WikiEditor could share it. mdale points out that even in MediaWiki proper, there are 'phases' to parsing, like doing links before templates. However, having an AST does not preclude this; we could do two passthroughs.

Alolita strongly suggests briefing Kaldari on multimedia stuff before he goes on vacation.

need to commit mw.language stuff, but a number of minimum todos first TODO -- move Jasmine tests out of UploadWizard

January 27, 2010
Attended presentation re: editor retention. Interesting stats & research.

Sat in on an LQT interface presentation, and later intervened a bit with LiquidThreads design. Last night took Andrew (LQT dev) out to dinner & discussed his plans for a framework to allow him to version every object he was working with, as well as create "soft" parent-child relationships between practically any object. Seemed a bit dangerous/overkill to me (and would kill query performance if you had to do resolution in two or three queries) so I wanted to follow up in the morning.

Turns out this also involves another ambition of Brandon's to generate events for every imaginable action that happens on MediaWiki, which can be turned into talk page additions and/or emails. Generally a good idea for engagement but disagreed that this necessitated a particular db architecture. Suggested a strategy of using a message queue / pubsub, which would then decouple LQT database architecture from all these ambitious messaging plans. Brandon & Andrew agreed this design made sense. (That said, the LQT design is still extraordinarily abstract because they have plans to be able to add comments and threads to virtually any object tracked by MediaWiki, as well as the ability to "move" any of these discussions anywhere. Personally I don't quite see the use of this (while it mirrors one benefit of the Wiki -- others could reorganize discussions -- the concept is very alien unless you are used to MediaWiki). But it's not my project...)

Met with Kaldari re: multimedia agenda, discussed outstanding issues with UploadWizard, introduced him to buglist, etc. He's going on vacation so it was not a major in-depth thing. He's already very familiar with UploadWizard (was one of our best testers and suggesters-of-features).

Trying to find a way to commit all these changes we made yesterday. Difficulty making some of these libraries functional & working with tests. There are a number of WTFs that bug me, and I'd rather not commit yet:


 * an already-marked FIXME -- to commit as we have it now means that core has to depend on an optional lib... the concept of a 'format' property for .msg seems dumb to me, parser should just add new functionality or replace the old.
 * parser is stateful. To parse a new template, you have to create a new parser. And feed in all the options you had before (like magic, variable replacements, and other such setup). Is this REALLY necessary?
 * doing $n replacements has to happen before parsing. Kind of nuts. And that means it's done in several different ways since we want to support Michael's jquery & linking cases, so sometimes we replace the $n directly with a regex and other times we drop in a span element with fancy properties. Meh.
 * I was halfway to making a stateless parser to an AST when Michael talked me out of it, but really... getting sick of this
 * Since everyone hates looking up globals, perhaps would work this way:

var p = new parser.setLang('en').addMagicWords( { 'sitename' : wgSitename } ) ... ;  window.gM = function { p.msg(arguments) }; jQuery.plugins.msg = function { p.jqueryMsg(arguments) };

January 28, 2010
Morning -- trying to integrate code with mediawiki.js in core. Encountering some oddness...

mdale & I tried to get some answers about the more unusual aspects of mediawiki.js. For instance, whether 'mediaWiki.html.element' wasn't redundant with jQuery. Krinkle & Roan both admitted they weren't sure what the point was (Tim Starling wrote this) but it might have something to do with the CDATA container this uses.

Going to try to do a stateless parser, kind of sick of all this weirdness with the library.

Time passes...

Wrote a parser (transformer, really) from an imaginary Wikitext abstract syntax tree (AST) format to HTML. Wrote many tests. All seems to work, very well. Parser is stateless, and can substitute new parameters for a message with every invocation.

Now to write the next part: wikitext to AST.

January 31, 2010
Wrote the second half of the new AST scheme: A parser from Wikitext to this AST format.

Used PEG (Parser Expression Generators) as a base. Generated Javascript to parse them via an online PEG parser. Created a PEG format file, saved locally too. PEG is great because it naturally parses to an array-of-arrays and we can add little transformation clauses after each parser combination.

However, the PEG generator generates very dumb code, appropriate for C rather than Javascript that people will download. So looked through the source for patterns (also with help of some parsing textbooks) and managed to create combinators that reduced the size of the actual parsing code to just 6K or so (there is another 5K of boilerplate, possibly could be reduced).

Trevor saw this and suggested we move the wikitext-to-AST part into PHP anyway. Which is an obviously good idea, since only 10% of the strings even need it; they'll be bloated in size but it's nothing compared to the parser library. In fact that's what I wanted originally, but I thought of it in terms of instrumenting the existing parser (I tried to do this and failed a few months ago, there are no discrete parsing stages, so it's impossible.) But Trevor points out we can just have this special-purpose parser for emitting AST.

So, this may make it into 1.17...

Miscellaneous notes (carried forward from last week)
Ok. reconciling all these different implementations of parsing and language in MediaWiki.

We have:


 * Michael's patch in branch MwEmbedStandalone bug 20962.
 * my hacked version in trunk - extensions/UploadWizard/resources
 * the newly-namespaced version in trunk - resources

What we need to do:


 * bring the best of Michael's stuff into the mediawiki version (?) The patch affects quite a bit, in ways that are helpful to what Michael needs
 * mw.Parser is radically different
 * includes functionality to do fallback languages which the resource loader should itself be doing.

* verify that his slow parserTest still works * try the jasmine test (will have failures, but will mostly work) * bring THOSE changes into the trunk/resources version * verify slow ParserTest still works * try the jasmine test (will have failures, but will mostly work) * namespace out the languages as decided * try if that works in the jasmine test (should completely work) * work on rewriting basic parsing algorithm