Berlin Hackathon 2011/Notes/Friday

media wiki hiphop MediaWiki hackathon 2011 organized by Wikimedia Deutschland

Berlin Hackathon 2011 2011/05/13/wikimedia-tech-crowd-and-mediawiki-developers-gathering-in-berlin/live video streaming http://www.livestream.com/wikimediaberlin bugs to smash: https://bugzilla.wikimedia.org/buglist.cgi?keywords=bugsmash
 * 1) mwhack11 on Twitter/Identi.ca
 * 2) mwhack11 IRC channel

Notes from Saturday: mwhack11Sat Notes from Sunday: mwhack11Sun

Friday, 13 May 2011, first day

2:15pm -- Pavel Richter giving the introduction & thanks 2:17 - Daniel K notes

there will be signs
 * not enough chairs
 * ladies' toilets 1 floor below
 * elevator (lift) broken. Close doors after you use it! (once it works again)
 * tomorrow & sunday cafe downstairs is closed -- we must enter through the courtyard
 * we have a wardroom in first room to your left (coming through hallway?)
 * If you don't want your photo taken & published, put a red sticker on your nametag, & don't stand in front of a camera
 * we have 24-hr access to the venue, but someone needs to volunteer to take responsibility for the keys

Wiki loves monuments folks leaves

2:20 - Danese notes

Who's been to an unconference before?... & DC hackathon? we'll run it like that

Danese notes who wants to talk. All the parser talks will be tomorrow (Saturday), along with visual editing [Parser stuff on Saturday because Wikia's Poland team won't be in until then.]

Friday talks
(summarizations of presentations are on the bottom of this pad.)

Mark Bergsma -- New datacenter (presented) + IPv6 (not yet presented)

Emmanuel -- Kiwix + offline (Presented) Purodha -- Narayam (Presenting) Hay Kranen (Husky) -- photocommons (Presented) Kolossos (Tim Alder) - OSM integration Ryan Kaldari -- Make Wikilove not wikiwar (Presented (Now with kittens! And sausages!!!1!)) hashar -- PHPUnit Patrick -- Mobile gateway Ryan Lane -- community-oriented test & dev Krinkle (Timo Tijhof) - Creating QUnit tests and distributing JavaScript testing (TestSwarm)

Saturday talks
Domas -- High performance trolling Tim Starling: HipHop Brandon -- Identity

Parser talks for Saturday: Hannes, Trevor (with editor), Jan Paul (Inline editor), Brion, NeilK, Andreas, Wikia, Achim, Purodha (see note above regarding wikia folks)

Planned sprints
New data center (mark Bergsma) bugsmash (Mark H. + Sumana) -- https://bugzilla.wikimedia.org/buglist.cgi?title=Special%3ASearch&quicksearch=bugsmash&list_id=9974 CiviCRM Audit tool (Arthur) Https (Ryan Lane) Wikilove (Ryan Kaldari) - done some sprinting already! Custom upload wizard for WLM + API (Hay) Section edit link productization (Trevor) Mirror Brain content repository (Tomasz)

"Write something nice" whiteboard on the wall near the registration desk: bug count.

Ok, can we start any sprints TODAY, NOW? Mark, after his talk.

(need more whiteboards!)

Danese declares us ready to get started. She'll be running a timer to ensure talks are only 10 minutes.

Mark Bergsma: The new datacenter
We have a 2nd primary data center now! Ashburn, Virginia Racked over 50 pallets of servers. Each pallet consisted of anywhere from 6 to 14 servers, depending on weight of servers. Over 375 pieces of equipment were installed. applause for the ops team not online yet - awaiting connectivity between old & new datacenters delays. will happen next week or week after [diagram!] pmtpa (Tampa, Florida) - old equiad (Virginia) - new next week: 1 link up 1 router in each DC, 10-gigabit router in each, they can route traffic together as 1 network we'll have squid serve both of our traffic. user will get traffic from closest dc. cache as much as possible locally....

at the start, will not have both data centers active primary & secondary, swap all the time, every few months do a swap.... if we switch over, big push. caches will be completely empty, problems until refilled send about 90% to a primary, 10% to the other just for testing SQUIDs are easy that way. We already have the setup... app servers in mult places.... we tried this in Korea, but we only sent a few wikis to the cluster. this time, want to replicate completely. data consistency problems

databases will be replicated as they are now within 1 dc

[more diagrams] (note for later: get the slides of the presentations)

big question: memcache. contains objects. how to deal with this? only 1 data center? but then the other one would have to act as host....every req would take 60 ms. so no. a small portion of requests will be slow. might replace memcached with membase (sp?) which supports replication

We will start replicating data pretty much now. We bought a NetApp storage sys, so we can do backups & share deployment sys. home/fs .... can have existing scripts avail. Working on that now.

Hay Kranen: Photocommons
PhotoCommons a WordPress plugin to include images from Wikimedia Commons hard to include images in WordPress facilitate attribution (username, backlink, license, etc.) demo: basic WP blog with standard theme there's a little WM Commons logo in the wysiwyg editor. uses the API in the back-end Hay demoes the search for "Berlin" search textbox gives you suggestions as you type (using the search API) when selected, the plugin adds a wordpress shortcode to the edit window with file name & width links to the original WMCommons page in the future: username + license mw:PhotoCommons

Check it out from Subversion! File bugs in Bugzilla! Question: how do you do suggestions when you search? Answer: API! Easy! Q: what do to at hackathon? A: Use it! Improve it! File bugs! Break it! Import to another CMSes!

Ryan Kaldari: WikiLove
First image: Negative templates vs praise chart https://secure.wikimedia.org/wikipedia/en/wiki/File:Praise_versus_Negative_templates,_English_Wikipedia_2004-2011.png per2011/05/02/neweditorwarnings/new editors are being driven away. In the old days, people might say "your user page is dull" & that was the worst of it if you look at a brand-new user's talk page, it's just templates about things they did wrong Observation: Lots of tools like Twinkle to leave automated warnings on user talk pages, but no equivalent tool for positive messages so what can we do that's simple, to fix this? it could be easier to give praise! So - WikiLove http://www.mediawiki.orfr.2011-05.14090130g/wiki/WikiLove User:Kaldari/WikiLove Q: maybe users are worse now? A: no, look at this stat. https://secure.wikimedia.org/wikipedia/commons/ Q: so now if a user is vandalizing my user page, are we supposed to praise them? A: no, but look at good first contributions, & longtime contributors who never got a word of thanks. There's live userscript you can use on the live wikipediaing almost ritt. Kaldari uses it on himself - demo. User: talk pages -- you now see a cute icon, next to History etc. Choose, for example, a kitten! so easy. Didn't have to look up a template, parameters or syntax. So go out and give people kittens if you have seen them & never given them anything positive. We're taking this idea & want to make it even more useful to WMF - we want to use it to collect data about how people are using it if a certain user gets lots of positive reinforcement & barnstars, does it affect their editing?

Q: I made .... people do not thank others because they don't care.... barnstars are subjective (unclear question/comment) A: yes, and some wikis don't have barnstars. We are developing an extension to see how people are using it, but also to make it configurable & localisable to each project. Russian Wikipedia, people want to give each other meat German Wikipedia -- beer?

Demo of new version -- not totally built out yet many types of awards, incl Barnstar original.... message, such as "good work!", get a preview, button (send WikiLove), & you see the new revised page! Brandon Harris did design work, Jan Paul Posma did coding this extensions needs love, so please work on it!

Danese says -- people who are downloading & torrenting, cut it out! Guy doing streaming is on wifi, we want a good framerate. Tomorrow: wired connection.

Danese reminisces about the different lengths of lanaguage, as you experience when being live-translated... Spanish is long, French is quick and short, and if you tell a joke, you can hear the punchline arrive at different times around the room.

Who came here from furthest away? Tim Starling wins again (Sydney).

Emmanuel Engelhart (Kelson): Kiwix & offline
(3:04pm local)

http://www.kiwix.org/index.php/Main_Page Slides: http://www.kiwix.org/images/a/a9/Hackathon_2011.odp

Kiwix software Kiwix, software for the desktop kiwix-serve, https server offering similar features DVD launcher, installers, etc set of scripts to select, mirror, & build ZIM files biggest ZIM files library Focus is on software, but we also have partnerships for distributing content

co-founder of openZIM

Kiwix reads ZIM -- openZIM is to define ???? & provide reference implementation. openZIM is a big thing & essential! founded a few years ago by 2 German guys, now financed by WMCH

Kiwix mostly takes the best from other projects e.g. Mozilla's gecko, Xapian for fulltext search ((c)lucene doesn't work) glue code - JavaScript & C++

(slide explains how we get from an XML dump to a ZIM file)

WMF UX improvement effort improve search, fulltext & inplace introduce a content manager improve some other points, such as localisation

new content manager more than a HTTP client upload/download based on Metalink (FTP, HTTP, bittorrent) Mirrorbrain

(look at the slides, he didn't get to finish them)

Q: Danese got a request from director of GNOME Foundation, who wants offline Spanish Wikipedia on some computers directions on making your own Wikipedia extract in Language X? A: not so easy right now. we have a script.... teamfile in Spanish that's not so old, right now

Tomorrow there will be work on the offline reader. Any non-technical folks in Berlin who want to help out with the user testing should show up 4-6 pm Saturday at the Betahaus & find Sumana Harihareswara or Ryan Kaldari or Nicole Ebber

Purodha: Narayam
keyboard mapping for Indic languages mw:Extension:Narayam about a dozen languages that the scripts supports special JavaScripts that watch the keyboard buffer, run regexps on it this is a new extension and we need to think about design decisions Purodha is interested in getting this stuff going, needs testers -- hackathon participants, help!

(talks end 3:25pm local)

Then: hacking
Some folks are already hacking & bugwhacking. Mark Hershberger, Bugmeister, to lead bugsmashing. Track what's been closed, keep number written up, be the whip!

FIXED10201WORKSFORME:24261FIXED28634NEEDS TESTING -- FIXED15461needs testing on IE8 and IE9

HipHop discussion


possble Varnish deployment strategy

1.18 first adding Varnish at the same time as HipHop we have more ops people now ops project (Varnish), deployment project (compiling, etc.), MediaWiki side

domas: deployment of a new feature: you need to support the previous and the new version in the same binary, and then do a soft switch when we deploy, in order to be able to revert + it helps identify what broke

robla: timeline? we want to start now RobLa: do we know enough to plan and have a timeline? CT: milestones?

domas: our goal with HipHop: not to reduce the number of servers, but to improve the user experience, because it sucks (general agreement)

robla: compilation infrastructure not dedicated hardware, but some compilation platform Ryan Lane: maybe we can use the virtualization cluster for this, so we don't buy hardware for every specific task [AMEN BROTHER]

Chad: a bunch of people are in a corner talking about HipHop -- but guillom says they're almost done.... RobLa has told me that he'd like for me to get help packaging HipHop for different distributions (RHEL, Debian, etc) Chad: Well for Debian, it works almost out-of-the-box except for 2 custom package builds. Chad: I'd really like to push the hiphop devs to push those changes upstream to their respective developers, so those customizations can disappear. Chad: Especially with libcurl, the changes are *trivial* Chad says: "On the hiphop note, I was going to repurpose project2 (our BZ test host) as a hiphop test box." Guillaume: robla and priyanka say project2 has already been repurposed (for many things), right now it's apparently an analytics box used by nimish
 * (Chad) Can we get a new box then? It doesn't even have to be new--I'd just like us to have a dedicated host to test on.
 * [can we get a nice .deb that can be installed on Ubuntu boxes? Can have that for both devs & Wikimedia deployment]

HipHop is a superset of Zend, so Zend support should be easy.

Milestones (by tim)
 * 1.17
 * 1.18
 * working build environment to build the MW core and get it to work
 * then extensions that we're using on Wikimedia (~between 80 and 100)

tim: build script: list of files, autoloader build script for extensions

domas: MW as we run it has many conditional includes all the extensions have to be autoloaded you don't want the script to be included if it has side-effects

domas: in order to do HipHop, we need organizational support (meaning consensus throughout engineering)

-> fixing the standard extension template layouts, making sure that core & extensions remain consistent. A linter to check for common errors on commit might be nice; at least a regular scan that whinges would be very useful to avoid regressions that might get missed by humans.

See about running a regular HipHop test run to warn?

[UtfNormal needs a damn fix ;)]

Future HipHop stuff can make use of HipHop-specific features as extensions to MediaWiki's standard behavior:
 * parallel execution
 * post-send execution
 * etc

HipHop: Sumana will target Fedora and Ubuntu for recruiting packagers, maybe CentOS (RPMs anyway) (Sumana to help get contributors to package HipHop for different *n*x distros, to make it easier to work with)

Multi-data center serving & squid conversations
[via Mark] there's a fix coming in for Squid to keep serving old pages if horrible errors happen -- this is usually what we want and will make some types of temporary errors and outages less disruptive. Yay!

Varnish.... what's the status?

[Domas recommendation] Easiest thing we can do now is to prep for quick failover: Then when it's time to fail over, the core cached data (parser cache, front-end HTTP caches) are already filled and ready to go. Sessions currently are in the main memcached which there's no good way to replicate, but we could probably switch them to DB or something else.
 * front-end caches maintain local HTTP caches
 * replicate databases
 * replicate parser cache <- high-scale-ready DB-backed parser cache is ready to go in 1.17 (check w/ Tim for the setting that should be used before deployment)
 * either drop sessions, or replicate them too

[Consider security issues w/ database replication of sessions; default is to store them in DB but we haven't really thought about them in that use for production in a long time.]

[Tim, domas, mark] various ideas about memcached & parser cache size, eviction strategy <- someone fill in details please

Mark interjected the requirement that we need to rework how we do session handling. Currently, we rely on PHP built-in sessions, which in turn rely on Memcached.
 * target 1.19
 * what actually needs changing?

Possible new timeline: 1.18 - minimal Het Deploy - July 2011 1.19 - new session handling code - October 2011 1.20 - HipHop deployment - January 2012


 * New side project: Continuous external storage recompression

Discussion about XMLDumps and replication: (Tim, Mark, Domas, Rob, Ashar F, CT, Danese)

Tim suggests that we rethink our XML dump strategy to make it possible to have replication done by anyone.

localtime 5:31 -- some discussions & work happening are: git's suitability php, performance localization tools extensionizing WikiLove

discussion of post- vs pre-commit review we don't have any control of the pace at which patches come in! we are seeming hostile if someone commits a rewrite, vs .... branch.... any sort of action if we have a small review backlog all the time, we can do ..... branch & release, get it out quicker

mdale, neil, firefogg dude -- chunked uploading protocol improvement plans based on one of Google's protocols. Sounds pretty good; mdale handling the server implementation (?). We think this avoids the issues with file concatenation that worried Tim about its predecessor, and Neil's got a good idea of how to do it on a non-filesystem file store (save the chunks as individual files in private temporary store; when complete merge them together into a single object in the public file store.)

[This should work for any file uploads from FileAPI-enabled browsers as well as for specialized stuff like FireFogg. UploadWizard could start uploading your files immediately after you've selected them, and reconnect if your net goes out! --brion]

added sausages to WikiLove smashed 7+ bugs JavaScript error

started around 6:40pm local

Ryan Lane's presentation: testing
Community- oriented test and development

improve staff/volunteer collaboration goal = environment where people can work together

< this presentation has slides > -- http://wikitech.wikimedia.org/index.php?title=File:Community_Oriented_Test_and_Development.pdf&page=1

people will be able to have more permissions -- do more stuff without being root including developers & volunteers

OpenStack -- virtualization software, like Amazon EC2 make virtual machines, ad dIP addresses, DNS entries, add SSH keys, log in... one project = one security zone

clone of our production environment everyone with shell access in one particular project test core changes, extensions

our puppet config will be checked in a git repository. Developers can experiment in their testing environment, then make merge requests

(Ryan sings a song about how this could work, spinning a tale of DevOps paradise)

(rainbows & unicorns too?) invisible pink rainbows?

[here is a diagram] (ASCII art?) puppet is represented by a star because it is magic

Ryan Lane's MediaWiki extension: mw:Extension:OpenStackManager

Q: more people get root access mult instances of prod clones? so, it'll break relatively often? A: you have shell access to projects you're working on. You can use Projects for more than building new architectures -- security groups inside a default project

deployment mirrors -- we should have these.... expectation for keeping this up?

[not clear on what kind of access is being discussed here]

Hashar: PHPUnit
< this presentation has slides > -- http://noc.wikimedia.org/~hashar/pres/mwtesting/

introducing PHP unit install from PEAR; don't install from distros, they have old versions

for MediaWiki: phase3/test/phpunit everything written in PHP, start tests with "make" tests are run against your live database -- (Chad: We need to fix this)

FAILURES!!!!11!!

coverage system to find code uncovered by the tests

70% executed during current tests for MediaWiki for includes: ~30%

"And you will end up in CR wall of shame."

"For those not sleeping yet..."

colored lines of code to see the coverage

Tim Alder (Kolossos): OSM Integration
< the audience roars with hunger >

Slides: File:Fossgis2011-WP-GEO.pdf (starting at page 12)

Postgres, PostGIS, hstore, Mapnik

client side: mostly with OpenLayers

Wikipedians like that we have multilingual maps

Live, transparent trials -- can also use on other maps.... map of surveillance cameras, maps of smoking & no-smoking restaurants..... different styles

some toolserver URLs

We want integration inside Wikipedia. We have an OSM map inside....? Yes: w:de:Hilfe:OpenStreetMap/en

in the German wikipedia, on the right side of the coordinates, we have a link to "map" or "Karte," so if you click, in the article you have the map, & in map, wikipedia objects linked!

choose your language in the map.... on the right, options -- background, overlay

translation done via translatewiki maps activated for all users on half a dozen wikis nearly 10% of the OSM main server

In the future, we want to show more than just points! We want to show something like rivers, & details.... could make a query to OSM, but we can also go a different way, perhaps better

(diagram)

Give objects in OSM a Wikipedia tag....

interwiki names use interwiki links from the Wikipedia articles

lines! complex areas! point clouds!

So we need more OSM objects that have Wikipedia tags right now, about 100K..... should be more like a million -- we have 1M+ coordinates in WP

tool to connect objects in OSM with WP articles (?) http://wiki.openstreetmap.org/wiki/JOSM/Plugins/RemoteControl/Add-tags so we have an add-tags tool

Patrick: Mobile portal (mobile site rewrite)
< this presentation has very very wordy slides > current implementation: in Ruby Patrick is rewriting the gateway in PHP as a MediaWiki extension Patrick just had Ed Tufte kill a kitten: http://markandrewgoetz.com/blog/index.php/2009/11/my-new-wallpaper/

Rewrite rational: ecology is PHP (Sumana's summary)

PatchOutputMobile -- Patrick's MW extension

What's next? proper device detection with WURFL & testing on various mobile devices. the latter once Patrick gets further along! wants people testing everywhere, will want lots of help Q from Mark Hershberger: why not a skin instead of an extension? (per wikitech-l question) A:mobile output .... ended up using skins a little, looked at Wikia, seemed like it would give him less flexibility wrt DOM. Will look again & see. Q: We are depending on the DOM. libxml ... he suppresses warnings and thinks it is fine (discussion of the fact that we put our HTML through Tidy so we do not have to worry about warnings, parsing bad HTML, etc. [libxml2's HTML parsing mode works very well for well-formed code, and its extra warnings are easily suppressed. I've seen horrible parsing failure modes with unterminated list tags etc -- but this never happens in MW's output so it's not an issue. --brion])

jQuery

Now a demo of what he has so far!

examples in Japanese, Wiktionary

Q. What about Arabic? RTL? A: Thank you for listening to this presentation!

Timo Tijhof a.k.a. Krinkle
has been working on an automated javascript testing tool for MediaWiki using Testswarm qunit = JS testing suite like Selenium / PHPunit I <3 the computer interface in Dutch

http://toolserver.org/~krinkle/testswarm

SWARM!! COOOOL

QUnit

"This is so cool"

about using QUnit which is a test suite wrote by the same authors that made JQuery JQuery being the javascript framework we use ql the goal eventually is to have it plugged with TestSwarm which is a way to distribute testing to thousands of users all with differents OS / browsers :) guillom & I are too amazed at the awesomeness of Swarm to take notes in Etherpad! hehe It has colors! And animations! And a countdown!

This is just awesome.

a browser can come back and catch up on tests that it missed

I think Timo wins the applaudimeter test today.

Do we still need Selenium Grid? UNANSWERED!! > NO!

Logistics
Wifi will improve this weekend -- other Betahaus people are not here this weekend

Mark Hershberger will monitor & possess the key

we have a small sound system .... the arena is a nice place to hang out

the betahaus will have the borrowed Apple dongles through the weekend, near projector

be back by 11am Sat for t-shirts!

Recap of Friday
added sausages in WikiLove (Ryan) & extensionized it smashed bugs JavaScript lazy load fix DC planning HipHop planning translate extension progress git planning release plan for MediaWiki Indic languages localization progress

organizational stuff by Daniel Kinzler tonight: no closing time, but someone needs to be responsible for the key (Mark Hershberger aka hexmode) no definite dinner plans

Saturday: doors will open around 10am, sessions will start at 11am we'll have t-shirts; arrive before 11am to get one

Have a great day guys!