Berlin Hackathon 2011/Notes/Saturday

Wheee, new pad! Have a nice day

Live feed

#                                              l Wifi was missing at the Comebackpackers hostel until 10am. That made some people to move to the betahaus quickly.

intro by Daniel Kinzler

Livestream will be wired, not wireless, anyway

17 bugs smashed on Friday! (including code reviews.) Mark your smashed bugs on the poster in the back...
 * FIXED Bug 28945 - Keyboard shortcuts on history page no longer work in 1.18

Trevor Parscal: why we should move section edit links
In lab testing:
 * People are confused about it, click to try to edit the text below vs above
 * Visibility -- people didn't ID that there was a section edit link to click! Huge deterrent to people even getting to edit link in the right place.
 * In 2009, after this testing, we hacked it so they would always be next to section heading (Wikia also did something like this) -- scrapped in favor of other things, lost in doing other things.
 * Patches went into Bugzilla & rotted.
 * Finally made the change.

We've done qualitative & quantitative testing.

Last month, tracked 4.7 million users - A/B testing. New one -- 116% more likely to actually use it! 9% increase in completed edits.

So if we move it to the (right? left?) - solve usability prob

today: sprint. Nontrivial -- need to change DOM output, and some wikis, like German Wikipedia, is hacking around it so we need to fix that. Working in a corner at betahaus today.

result at the end of the hackathon: no, this did not get done

Rob Lanphier:
MediaWiki release roadmap cooked up yesterday back of the envelope, quick-n-dirty 1st release: MediaWiki 1.17, any day now! a few last installer bugs

next release: 1.18 -- looking at code review backlog, about 2000 unreviewed commits in trunk & in recently branched 1.18 branch. Based on past history: if we really buckle down, 2 or 2.5 months. So -- hopefully, end of July for 1.18 release. That release -- not many features. Right now, some gender fixes for user page links, photo rotation... minor stuff in the branch, want to get into swing of doing releases on fairly frequent basis. backlog of commits from Dec 2010

1.19 - slated for Oct 2011. Session handling code to rework for that.... have not yet solicited other features to target for 1.19. Driven by needs for data center.

1.20 -- target HipHop compilation support. Potentially have HipHop, huge performance boost for the site.

That's the release roadmap as we are currently thinking about it. Please come by, talk to General Engineering Group for details.

(see MediaWiki roadmap for more)

Denny V.
2 topics.

1: Data Summit in Sebastopol from February -- data & Wikipedia projects

2 themes: 1st, how to use data within WM projects Denny is involved with Semantic MW thinking about how to make this happen, would like your input

Current state of Semantic MediaWiki -- we have a reputation of not being scalable, of being ugly & cluttering up code. Changed a lot in last few years! scalable under spec circumstances

improved UI a lot ...you don't have to change your wiki syntax. I can talk with you about it

2nd topic: WMDE express diversity within WP, find out if people have diff opinions metrics to measure how diff people edit, interaction, notice edit wars, understand diversity in opinions RENDER project

Nimish Gautam: Metrics
We're trying to be metrics-centric we have many sources for metrics

ad hoc research We don't have a good way to integrate especially for fundraiser

page views, ..

trying to set up a platform ...make it into small modules

Everyone in the office calls Nimish "Mishmish" because Trevor's daughter could not pronounce his name

Newspapers claim i'm from India, but actually born in

i18n / l10n guy translatewiki

Raymond Spekking makes daily commits for localization, which saves Siebrand time, and he can work on other things

Microsoft awhile ago open sourced WikiBhasha -JavaSript translation helper would love someone have a look at that extension, ..., take cruft out of UI ..maybe add support for Google MT ultimate goal: run on Wikimedia cluster

JavaScript is not his

there was a protracted conversation recently re parser

Wiki code currently under GPL -- common choice for new code viral - anything that links to it is automatically GPLd

a number of famous projects under GPL, inc Viral: in both Linux and Wordpress, benevolent dictator decided to interpret somewhat more liberal than FSF would

you might want other projects to be able to use the parser most people -- hard licensing that invokes viral -- don't want commercial use but we want everyone to use everything content, valuable, already licensed for commercial use!

her advice: pick LGPL or BSD ten years on the OSI board, can argue it until cows come home ;)

public knitting -- if you have ADD issues and can't be at a computer all the time, this helps! also very attractive to wome

small, new at WMDE - have a contractor for it Johannes Kroll is doing great job Toolserver WikiSense/CategoryIntersect ("CatScan"), next generation Who of you knows CatScan? (between 10 or 1/2 of us know) "not too many"

nasty hack DK wrote a few years ago politician & physician ....find intersection of articles in 2 categories but a lot of projects intersect. relational DBs are not a good tool for this. Tool times out, cuts off output, etc

Graph Processor core can load a graph structure into memory, traverse and intersect quickly (less than a second), spit out CSV list

next: TCP server, interface to multi graph processor cores connect to one core (for one wiki), ask for intersection, get back a list can ask for shortest path to root node, between 2 graph nodes

want to add other classical graph stuff, like closest common ancestor next: Python and PHP client libraries updater component: poll wiki databases, feed to Graph Proc Server, to get live representation of wiki structures on the toolserver want to integrate it with main search infrastructure (lucene) to search in specific topics areas on the wi

(previous work on E-Mail notification - now switched on for user talk pages Brion saved us from a mess.... fixed flaws and weird things -thanks-)

I activated OpenID on my own wikis, fixed several bugs. OpenID extension has several bugs which prevents it from running with contemporary MediaWiki and PHP versions. Code not being maintained for a while.

Underlying third-party library (Apache 2.0 license) needs maintainance patches for PHP 5.3.x - see  and, in general.

Will contact Danese about licensing... want to bring lib into our repo - but better avoid a fork of php-openid. (a "make" Makefile is now included in the OpenID extension which i) downloads and ii) applies patches to the library. This facilitates the installation.)

I will propose a few new parameters - knowing that no one likes too many new parameters in default settings (done in trunk version)

Those who don't know about OpenID -- a framework to allow single sign-on based on a cert server, which acts as an OpenID server. Wikimedia or MediaWiki would act as a server. Google, VeriSign, & other free service allow you to open an account there & use this ID for lockin for private MediaWikis.... a goal of Thomas is to convince hackers to allow Ope ZnID for account creation or for login

On my server, fine granular possibility to select what MediaWiki can allow (OpenID consumer, OpenID server and more: (solved and committed)(sc)(sc)(not yet implemented) list of all bugs:

Sunday early morning update: just got the current OpenID version running with my patched php-openid version on mediawiki trunk version rev. 88127. Please help me with

Neil K.: HackPad
Did a hack last week at Foundation Hackpad we don't know what would happen if we had this kind of editing interface for MediaWiki thought what is the simplest thing we could do to make this happen friends -- came in to work on simple protocol to make etherpad deriv work as edit

demo time! (ha, no time to set up)

simple user script, Neil will show it to you

this works, on MediaWiki.org at least

I wrote the first cut of the user script below - add it to $wikihostname/wiki/User:$YourUserName/common.js

importScriptURI("");

they have modifications to Etherpad that they are going to release soon.

Forgot to mention this while I was talking: I am working on an extension to generalize invoking a "remote" editor.

What we want to know from you: - What makes sense - one user owns the changes, has the sole power to submit? everybody can submit, make it a free-for-all? - would you like to join others in a live session? - would you like to see what sessions are currently live if you visit a wiki page? - Ideas, feedback! --> neilk@wikimedia.org

Wikitech-l post Hackpad blog post

Rob Halsell: Sailing
Danese insisted that I talk, and that non tech was ideal. Thus the greatest sport ever was discussed, SAILING.

team building been on a boat? or a sailboat? I got into sailing. You have ~ 7 people. Everyone has to be in total sync or you "crab" & sink. Everyone works in sync, and you have a skipper. You cannot go into the wind. Powerboating: you can just drive. But sailboat: tacking! Like community programming. [laughter] Everything is pretty open. Same tech we've been using for thousands of years. Skipper in back of boat -- everyone has to do what they say. So you don't have to think at all, which is a nice break! [RobH refuses to sing "I'm on a Boat"]

2 sails. 2 people per sail. Job: move sail 3 inches every few minutes.

very fun. Rob wants to sail, so if you are in SF, come sail with me! I just want an excuse to get on the water. (or in Washington DC, it all depends where I am that mont

used for small wikis but not for bigger wikis comments on talk pages, watchlists notif changes on user talk pages not done by users themselves ("foreign changes") Audience: Ops is talking about giving more user .... turning on email notification

Tim turned on this notification before we even finished talking about it. HUZZAH TIM STARLING! (Does he like Champagne?)

What about watchlist notifications? I need more statistic data. How many watchlist changes need to be notified per day ? The figure is certainly much higher than for changes on one's own user_talk page <after-talk edit

Hashtag game is basicaly everyone presenting the three hashtags they wrote on arrival on their presentation card. It is good laughing session :-)

(12:05pm local time) to get to know each other Danese is singing WE WANT LUNCH. <--- s/LUNCH/coffee/g

We are not going to add all the tags in etherpad #la

As of 12:49pm local -- general hacking & conversation, in main room & in "the arena" (separate room with stadium-style bench seating)

(as of 1:18pm local: Technical Liaison & Developer Relations group talks about responsibilities)

Parser lightning talks
What is your view on a start to a grammar based Wikitext parser created by a student of professor Dirk Riehle? (in the afternoon)

In MediaWiki, lightning concept is incompatible with parser!!

Inline Editor demo page (requested ;)):

(as of 2:34pm local: about to start more lightning talks)

Daniel introduces the session, Danese is going to lead this session of short presentations

PARSERRRRR (pirate version: parsARRR) 10 talks!

We have many conversations that have happened recently on wikitech-l re parser

visual editor is one of the two very high priorities for coming year, but not the only thing we want to do with parser. still doing evaluation on whether should use existing work or write from scratch, people talking today from both camps

Framing this....

We have many open questions re parser. But we need to do something because it does not currently support some goals, such as visual editor. That's the big push for this year.

We are evaluating what to write/build & what to lever

These talks: no decisions being made, listening to everyone.

Will be included in the short talks: 1. What did you do? 2. Why? 3. What did you learn about Parsing / RTE that we *need* to know? 4. Your best a

each talk: 10 min, incl optional de

feels partially responsible for the WMF going ton track with "we need to change the parser to achieve visual editing" has been doing research in this area

spent time in Berlin last time trying to rewrite the parser but not just so we can have formal grammar but he needed something out of the parser i.e. identify where the output is coming from + have a document structure

we have a serialization of the DOM wants to work on it in memory rather than moving chunks of text around

and impose restrictions on what wikitext is

some things are done with wikitext that shouldn't be done with it (e.g. one template opens table, another closes it, table rows, etc.)

probably need to reform make visual interfaces for all the wikitext possibilities?!?! well, time to refo

what we need to think about int he future:

editor end: drives many of those specs

Collab realtime editing: maybe not in our 1-yr plan, but it's in the future of the internet. Put our efforts into something that can become realtime collab?

talked to Etherpad, Google Wave guys, operational transformation transactional system

iFrame disaster 2009 --- iFrame for wiki editor.... display surface, same as Google Docs, not like Wave (short leash) .... rendering from scratch, includin

text selection & layout

no chart, so, diagram with waving hands

editing - rampup fast, but hit ceiling fast too.

display surface - make everything from scratch, but way higher ceiling because full control

Google Docs is actually a rendering of LaTeX!!!!! can we do LaTeX too? please? It's beautiful.

One point came up in talks NeilK and I had with both Etherpad and Google Wave ppl: right now, wikitext all inline annotation

we have to encode those marks before & after. Etherpad..... in another table, "from this offset to another offset is italic" offset annotation

use a wave brige, try to find a way for MediaWiki to speak to wave in a way it understands !hss

lot of tech benefits to what it allows us to do most powerful thing it would give us is semantic stuff could fundamentally change way we edit content

"that's all the

Slides:

a grant that GRNET got => usability testing and development presenting along with Dimitris Mitropoulos

has built an interface a year ago to edit in wikitext but you can instantly see the output when parsed

sentences are highlighted popup with a mini editor (including the toolbar) click a bar on the left = edit the paragraph / page / etc.

currently doable with the current parser would be easier with a new parser really easy to see what wikitext maps to what output

did usability testing edit mode: (text, references, lists, etc.) turns out users didn't really get that

each module will detect a sentence / template / etc. goal = only detect simple sentences, i.e. that don't contain complex templates etc. so that

Q: what happens when you try to edit a sentence with a lot of citations? A: there's a module for citations in the current implementation, but there are still a few problems ;)

the way they do it is sentence-matching module: adds spans for sentences, templates, etc. we don't actually parse the wikitext, but rather the spans? (to check)

still challenges, eg. misnesting, dependencies

Now Dimitris, talking about the UX test

parsing, implementation, data -- probs resolved. Also tested editor on mobile devices, went pretty well

compatibility testing: browsers, mobile devices, etc.

usability testing -- funny part. 14 users, 6 novice, 6 experienced

2 parts of test, easy + difficult used a copy of an existing entry on a suburb of Athens

asked user to delete a paragraph: some selected a para & pressed delete key swap 2 paras? most couldn't see how to edit the section as Adding external link: probs with syntax, people did not use example button

Video of usability experiments...

not only newbies could benefit, but alos experts editors who like to "hack into the wikitext"

with a new parser it won't be necessary to do such a h

switching to the "I wrote a parser, lemme tell you about it" part of the session

< this presentation has slides > < this presentation had Edward Tufte kill a few kittens >

open source

Parser in Java based on PEG (parsing expression grammar)
 * needs additional postprocessing in Java as well beyond the low-level PEG grammar
 * round-trip support: can fully recover original wikitext from the AST

Building a DOM on top of the AST which can help hide complexities like having two kinds of tables (HTML or wikitext)

half the MW .... have to run tidy.... rely on browser to fix nesting of HTML tags for them

parser cannot refuse a wrong nested document! (in wikitext every input document is valid... it just doesn't always mean what you want ;)

prefixes... comments.... part of doing roundtripping. cannot throw away XML if you want to go to XML)

what problems can we actually solve with Sweble?

we have 30+ implementations. Lots of people tried to implement parsers. Crashed & burned, or <100% compatibility, or don't produce AST

There will never be a formal spec or grammar for wikitext! (explanation in slide)

Wikitext.next?

Let's move on & see.

Like Trevor, he is convinced that most changes we need are small, authors won't mostly notice, espec if they just provide content & don't try to template.

Hard: experts will be able to handle new syntax.

Apostrophe syntax MUST be changed, and everyone will notice.

Fixing textual expansion process: hard. Other templates simply replaced textually. ...... makes WYSIYG really difficult. performancewise, make wikitext closer to programming langs .... AST with defined root node. store template pages as ASTs, don't have to repass everything due to change somewhere in the doc.

His proposal (slide) ...will have to convert all articles into new format...

Not enough time to demo. Go to

The proposal is that the community agree on a new (.next) Wikitext and the Sweble Wikitext parser can help you convert old articles to the new easier Wikitext format. Why invest anymore time into a Wikitext syntax that everyone wants to

< this presentation has slides >

TWX editor approach

new wysiwyg editor, still at the beginning running demo that ca be showed afterwards

our editor is divided into 2 parts: backend PHP, frontend JS

[BUNCH OF DIAGRAMS HERE]

editor control class, UI for the editor

reparser textbx lying underneath use normal conventional way of putting it back into MediaWiki database

it allows for intermediate interactions if user tries to do something they're not allowed...... prohibition, limit may be useful in future

Dojo framework....

doesn't parse whole wikitext, writes back edited elements in same way they came into front end

control class, element builder

recursion

of interest to us: this builder creates an XML doc that is sent to front end, creates a tree with 2 main branches. 1st: html branch displayed in editor, contains normal HTML syntax the editor can display. for each element/tag, there's a twx .....

then a second branch that contains for each tag complete wikitext syntax as existed on the page, & twxid, name of editor, so as not to interfere with the formatting, things done by normal IDs and classes

afterwards: these elements coming back from editor are compared. if nothing has changed, the original wiki syntax completely rewritten to back end.

actual class diagram is EVEN MORE COMPLEX [scary diagram here]

analyzer also uses .... element definition & other classes.... IDs elements, tells controller....

not a real parser per parser theory, but works well for our purposes some elements of it might be of interest as we consider writing a new MW parser. what Foundation is planning: supposed to have an intermediate format in the middle (between wikitext and HTML), so this might be an interesting architecture

analyzer builde

no live demo here -- still difficult to handle, but it works we cannot yet: handle templates (probably 1st question you'll ask!) can handle links, normal formats, other things, et

neil shows the upload wizard in German, and French pluralization - done in client (browser), not MW itself i18n experts say you should not have text, then a field, then text. but we can with this parser, easily

tooltips that contain links click handlers in jQuery combined with messages

Michael Dale had a way of doing this with jQuery + string hacking then Neil looked at how to do it with a real parsers PEG grammar - small, does everything we saw in demo + more

simple string

 neilk_: who's messaging you? neilk_: how's

EVEN MORE COMPLICATED string +jQuery expression -- parsed in

parses wikitext to s-expressions in JSON

"late binding" -- replacement texts, user options, etc. parsed structure is perfectly cacheable & easy to deal with parsing hundreds of strings fast!

we learned about parsing:

abstract syntax trees are awesome! will do more for you than you think they will

< insert lots of one grammar, multiple languages should use late binding: more static caching

conversation with Hannes, re Sweble

if wikitext were radically simplified, this would be enough --- put logic elsewhere, not in the parsing part. Computer scientists learned this in 1970s, we will learn it soon

parser creators created something awesome, why we are all here. This is my small contribution

Danese notes: there will be a discussion afterwards & parser-interested folks should physically move to front of ro



He knows only parsing, not preprocessing parse text from left to right in one parse

try to make parser that is as compatible with MW parser as possible

good framework for experimenting

implementation techniques used: fallback.... target for the parser is to create valid HTML valid HTML has simple structured grammar, so target grammatical structure is fairly simple in antlr, divided lexical analysis & parser into 2 parts

lexer....

parser part.... builds tree structure..... simple rules, pretty clean

but needed to push more & more parser logic into the lexer to make a compatible pars

2 reasons why lexer rules complicated: 1) similar syntax for internal links and images but need to be parsed differently 2) ...

made heavy use of "semantic predicates"

!hss

tokens -- pairs, openers & closers? inline? block? nest?

from this table, generate code from logical machinery that turns tokens on & off

by using approach, conjecture that you can write a parser that is compatible with MW parser Take strange edge cases (bold/italic...) & handle them exactly as MW parser does!

[laughter at an edge case]

Make it extremely compatible with current parser

2nd advice: look at xwiki engine..... impressed by their architecture, how they do visual editing + syn

(example links may not work) -- 13:19, 15 June 2011 (UTC)

Maciej works for Wikia's Engineering team based in Poznan, Poland.

The talk is about handling templates in WYSIWYG editors.

Wikia uses placeholders for some templates where it cannot make an RTE editable widget used Content-editable, with some parts marked non-editable. (A feature added in IE 5.5, which "is not really a browser").

First implementation was a placeholder in the form of a green puzzle icon. By using Content-editable we can tell the browser that these placeholders are not to be "editable". So we can finally display *some* templates visually.

demo at:

Q. how do you deal with templates that break HTML A: none, right now

Second demo (hack):

This is a fresh piece of code from our Wikia All-Hands hackathon yesterday. The hack is about in-line editing. *Not* guarnateed to work under IE. ;)

based on the inline editor template: offers fields "in-place" same for images -- can click to see the properties which generate the "uneditable" bit of content -- does cool HTML animation to "flip" the element to reveal a f

translate MW interface messages we need more language tagging in mediawiki

mix of different languages in the interface part of MW some from target language, content language, some from fallback language

ways for screen readers to know what is in which language, so it's not gibberish

another drawback: when looking for something with google search, can restrict language, but.. <- ability for parser to keep language tagging info through to output would be helpful, by letting the output actually tag Der BlahBlah

some messages are translated, but not completely: some words are not translatable / exist in multiple languages (example: "e-mail"), but in that case, you still need to pronounce it right according to la

4:10pm

+ implementations. Lots of people tried to implement parsers

parsing happens on two levels: preprocessor (which Tim put a lot of time into some years ago): well-defined, has specifications divides things in templates, parser functions, comments, etc.

+ another poorly-defined component (Parser class)

do we need a new parser?

not really, but it'll help

saw the example of inline editor, but would also be nice to be able to dive into a table or infobox (template)

situation has been kind of embarrassing - everyone writes their own parser that partly works but having a firm exact specification is really important to take our millions of pages of content, and facilitate reuse same evolution in HTML with HTML5: explaining how you should parse a document that is not correctly formed (?)

the people doing templates don't read our instructions saying "don't do that"

a new parser will make templates more portable ("hey, that thing looks cool on Wikipedia, I want the same on my wiki!!1!") .."template libraries"

don't need much new features in our syntax, have a lot already which are well-defined

Read my lips: NO NEW SYNTAX

no real reason for it, and too much data already in the existing format no reforming the syntax, instead focus on having a good abstract syntax tree that is compatible with our current syntax, and can output to a visual editor and other channels

will still use the regular parser for everything - at some point in the future, swap it out. use gadgets instead for...

big issue: open/close templates (e.g.,  , etc.) one way is to make the parser smart enough to handle them, rather than eradicate them

they are structured like a bullet list ("* ..* ..." turned into a list in parser)

happening now:

try out "parser playground" gadget in your preferences on mediawiki.org (showing XML tree with various templates, ...)

implementing a really crappy test parser that breaks text into paragraphs... 3 steps: 1) abstract syntax tree ... 3) convert into HTML will turn that into somewhat more usable things in near future, check(ed?) into SVN

return to slides: "what's happening now?"

Etherpad...

timeline: next weeks, more experimentation by August some cool demos for Wikimania, an actual editor that can do something later live on some WPs ... go find what breaks (weird scary templates / pages etc.)

questions for