Berlin Hackathon 2011/Notes/Friday
media wiki hiphop MediaWiki hackathon 2011 organized by Wikimedia Deutschland
- mwhack11 on Twitter/Identi.ca
- mwhack11 IRC channel
Berlin Hackathon 2011 WMFBlog:2011/05/13/wikimedia-tech-crowd-and-mediawiki-developers-gathering-in-berlin/live video streaming http://www.livestream.com/wikimediaberlin bugs to smash: https://bugzilla.wikimedia.org/buglist.cgi?keywords=bugsmash
Friday, 13 May 2011, first day
2:15pm -- Pavel Richter giving the introduction & thanks 2:17 - Daniel K notes
- not enough chairs
- ladies' toilets 1 floor below
- elevator (lift) broken. Close doors after you use it! (once it works again)
- tomorrow & sunday cafe downstairs is closed -- we must enter through the courtyard
there will be signs
- we have a wardroom in first room to your left (coming through hallway?)
- If you don't want your photo taken & published, put a red sticker on your nametag, & don't stand in front of a camera
- we have 24-hr access to the venue, but someone needs to volunteer to take responsibility for the keys
Wiki loves monuments folks leaves
2:20 - Danese notes
Who's been to an unconference before?... & DC hackathon? we'll run it like that
Danese notes who wants to talk. All the parser talks will be tomorrow (Saturday), along with visual editing [Parser stuff on Saturday because Wikia's Poland team won't be in until then.]
- 1 Friday talks
- 2 Saturday talks
- 3 Planned sprints
- 4 Mark Bergsma: The new datacenter
- 5 Hay Kranen: Photocommons
- 6 Ryan Kaldari: WikiLove
- 7 Emmanuel Engelhart (Kelson): Kiwix & offline
- 8 Purodha: Narayam
- 9 Then: hacking
- 10 HipHop discussion
- 11 Multi-data center serving & squid conversations
- 12 Ryan Lane's presentation: testing
- 13 Hashar: PHPUnit
- 14 Tim Alder (Kolossos): OSM Integration
- 15 Patrick: Mobile portal (mobile site rewrite)
- 16 Timo Tijhof a.k.a. Krinkle
- 17 Logistics
- 18 Recap of Friday
Friday talks[edit | edit source]
(summarizations of presentations are on the bottom of this pad.)
Mark Bergsma -- New datacenter (presented) + IPv6 (not yet presented)
Saturday talks[edit | edit source]
Domas -- High performance trolling Tim Starling: HipHop Brandon -- Identity
Parser talks for Saturday: Hannes, Trevor (with editor), Jan Paul (Inline editor), Brion, NeilK, Andreas, Wikia, Achim, Purodha (see note above regarding wikia folks)
Planned sprints[edit | edit source]
New data center (mark Bergsma) bugsmash (Mark H. + Sumana) -- https://bugzilla.wikimedia.org/buglist.cgi?title=Special%3ASearch&quicksearch=bugsmash&list_id=9974 CiviCRM Audit tool (Arthur) Https (Ryan Lane) Wikilove (Ryan Kaldari) - done some sprinting already! Custom upload wizard for WLM + API (Hay) Section edit link productization (Trevor) Mirror Brain content repository (Tomasz)
"Write something nice" whiteboard on the wall near the registration desk: bug count.
Ok, can we start any sprints TODAY, NOW? Mark, after his talk.
(need more whiteboards!)
Danese declares us ready to get started. She'll be running a timer to ensure talks are only 10 minutes.
Mark Bergsma: The new datacenter[edit | edit source]
We have a 2nd primary data center now! Ashburn, Virginia Racked over 50 pallets of servers. Each pallet consisted of anywhere from 6 to 14 servers, depending on weight of servers. Over 375 pieces of equipment were installed. applause for the ops team not online yet - awaiting connectivity between old & new datacenters delays. will happen next week or week after [diagram!] pmtpa (Tampa, Florida) - old equiad (Virginia) - new next week: 1 link up 1 router in each DC, 10-gigabit router in each, they can route traffic together as 1 network we'll have squid serve both of our traffic. user will get traffic from closest dc. cache as much as possible locally....
at the start, will not have both data centers active primary & secondary, swap all the time, every few months do a swap.... if we switch over, big push. caches will be completely empty, problems until refilled send about 90% to a primary, 10% to the other just for testing SQUIDs are easy that way. We already have the setup... app servers in mult places.... we tried this in Korea, but we only sent a few wikis to the cluster. this time, want to replicate completely. data consistency problems
databases will be replicated as they are now within 1 dc
[more diagrams] (note for later: get the slides of the presentations)
big question: memcache. contains objects. how to deal with this?
only 1 data center? but then the other one would have to act as host....every req would take 60 ms. so no. a small portion of requests will be slow. might replace memcached with membase (sp?) which supports replication
We will start replicating data pretty much now. We bought a NetApp storage sys, so we can do backups & share deployment sys. home/fs .... can have existing scripts avail. Working on that now.
Hay Kranen: Photocommons[edit | edit source]
PhotoCommons a WordPress plugin to include images from Wikimedia Commons hard to include images in WordPress facilitate attribution (username, backlink, license, etc.) demo: basic WP blog with standard theme there's a little WM Commons logo in the wysiwyg editor. uses the API in the back-end Hay demoes the search for "Berlin" search textbox gives you suggestions as you type (using the search API) when selected, the plugin adds a wordpress shortcode to the edit window with file name & width links to the original WMCommons page in the future: username + license mw:PhotoCommons
Check it out from Subversion! File bugs in Bugzilla! Question: how do you do suggestions when you search? Answer: API! Easy! Q: what do to at hackathon? A: Use it! Improve it! File bugs! Break it! Import to another CMSes!
Ryan Kaldari: WikiLove[edit | edit source]
First image: Negative templates vs praise chart https://secure.wikimedia.org/wikipedia/en/wiki/File:Praise_versus_Negative_templates,_English_Wikipedia_2004-2011.png perWMFBlog:2011/05/02/neweditorwarnings/new editors are being driven away. In the old days, people might say "your user page is dull" & that was the worst of it if you look at a brand-new user's talk page, it's just templates about things they did wrong Observation: Lots of tools like Twinkle to leave automated warnings on user talk pages, but no equivalent tool for positive messages so what can we do that's simple, to fix this? it could be easier to give praise! So - WikiLove http://www.mediawiki.orfr.2011-05.14090130g/wiki/WikiLove User:Kaldari/WikiLove Q: maybe users are worse now? A: no, look at this stat. https://secure.wikimedia.org/wikipedia/commons/ Q: so now if a user is vandalizing my user page, are we supposed to praise them? A: no, but look at good first contributions, & longtime contributors who never got a word of thanks. There's live userscript you can use on the live wikipediaing almost ritt. Kaldari uses it on himself - demo. User: talk pages -- you now see a cute icon, next to History etc. Choose, for example, a kitten! so easy. Didn't have to look up a template, parameters or syntax. So go out and give people kittens if you have seen them & never given them anything positive. We're taking this idea & want to make it even more useful to WMF - we want to use it to collect data about how people are using it if a certain user gets lots of positive reinforcement & barnstars, does it affect their editing?
Q: I made .... people do not thank others because they don't care.... barnstars are subjective (unclear question/comment) A: yes, and some wikis don't have barnstars. We are developing an extension to see how people are using it, but also to make it configurable & localisable to each project. Russian Wikipedia, people want to give each other meat German Wikipedia -- beer?
Demo of new version -- not totally built out yet many types of awards, incl Barnstar
original.... message, such as "good work!", get a preview, button (send WikiLove), & you see the new revised page!
Brandon Harris did design work, Jan Paul Posma did coding this extensions needs love, so please work on it!
Danese says -- people who are downloading & torrenting, cut it out! Guy doing streaming is on wifi, we want a good framerate. Tomorrow: wired connection.
Danese reminisces about the different lengths of lanaguage, as you experience when being live-translated... Spanish is long, French is quick and short, and if you tell a joke, you can hear the punchline arrive at different times around the room.
Who came here from furthest away? Tim Starling wins again (Sydney).
Emmanuel Engelhart (Kelson): Kiwix & offline[edit | edit source]
Kiwix software ----
Kiwix, software for the desktop kiwix-serve, https server offering similar features DVD launcher, installers, etc set of scripts to select, mirror, & build ZIM files biggest ZIM files library
Focus is on software, but we also have partnerships for distributing content
co-founder of openZIM
Kiwix reads ZIM -- openZIM is to define ???? & provide reference implementation. openZIM is a big thing & essential! founded a few years ago by 2 German guys, now financed by WMCH
Kiwix mostly takes the best from other projects
(slide explains how we get from an XML dump to a ZIM file)
WMF UX improvement effort
improve search, fulltext & inplace introduce a content manager improve some other points, such as localisation
new content manager more than a HTTP client upload/download based on Metalink (FTP, HTTP, bittorrent) Mirrorbrain
(look at the slides, he didn't get to finish them)
Q: Danese got a request from director of GNOME Foundation, who wants offline Spanish Wikipedia on some computers directions on making your own Wikipedia extract in Language X? A: not so easy right now. we have a script.... teamfile in Spanish that's not so old, right now
Tomorrow there will be work on the offline reader. Any non-technical folks in Berlin who want to help out with the user testing should show up 4-6 pm Saturday at the Betahaus & find Sumana Harihareswara or Ryan Kaldari or Nicole Ebber
Purodha: Narayam[edit | edit source]
(talks end 3:25pm local)
Then: hacking[edit | edit source]
Some folks are already hacking & bugwhacking. Mark Hershberger, Bugmeister, to lead bugsmashing. Track what's been closed, keep number written up, be the whip!
HipHop discussion[edit | edit source]
<no notes taken during this period>
possble Varnish deployment strategy
1.18 first adding Varnish at the same time as HipHop we have more ops people now ops project (Varnish), deployment project (compiling, etc.), MediaWiki side
domas: deployment of a new feature: you need to support the previous and the new version in the same binary, and then do a soft switch when we deploy, in order to be able to revert + it helps identify what broke
robla: timeline? we want to start now RobLa: do we know enough to plan and have a timeline? CT: milestones?
domas: our goal with HipHop: not to reduce the number of servers, but to improve the user experience, because it sucks (general agreement)
robla: compilation infrastructure not dedicated hardware, but some compilation platform Ryan Lane: maybe we can use the virtualization cluster for this, so we don't buy hardware for every specific task [AMEN BROTHER]
Chad: a bunch of people are in a corner talking about HipHop -- but guillom says they're almost done.... RobLa has told me that he'd like for me to get help packaging HipHop for different distributions (RHEL, Debian, etc) Chad: Well for Debian, it works almost out-of-the-box except for 2 custom package builds. Chad: I'd really like to push the hiphop devs to push those changes upstream to their respective developers, so those customizations can disappear. Chad: Especially with libcurl, the changes are *trivial* Chad says: "On the hiphop note, I was going to repurpose project2 (our BZ test host) as a hiphop test box." Guillaume: robla and priyanka say project2 has already been repurposed (for many things), right now it's apparently an analytics box used by nimish
- (Chad) Can we get a new box then? It doesn't even have to be new--I'd just like us to have a dedicated host to test on.
- [can we get a nice .deb that can be installed on Ubuntu boxes? Can have that for both devs & Wikimedia deployment]
HipHop is a superset of Zend, so Zend support should be easy.
Milestones (by tim)
- working build environment to build the MW core and get it to work
- then extensions that we're using on Wikimedia (~between 80 and 100)
tim: build script: list of files, autoloader build script for extensions
domas: MW as we run it has many conditional includes all the extensions have to be autoloaded you don't want the script to be included if it has side-effects
domas: in order to do HipHop, we need organizational support (meaning consensus throughout engineering)
-> fixing the standard extension template layouts, making sure that core & extensions remain consistent. A linter to check for common errors on commit might be nice; at least a regular scan that whinges would be very useful to avoid regressions that might get missed by humans.
See about running a regular HipHop test run to warn?
[UtfNormal needs a damn fix ;)]
Future HipHop stuff can make use of HipHop-specific features as extensions to MediaWiki's standard behavior:
- parallel execution
- post-send execution
HipHop: Sumana will target Fedora and Ubuntu for recruiting packagers, maybe CentOS (RPMs anyway) (Sumana to help get contributors to package HipHop for different *n*x distros, to make it easier to work with)
Multi-data center serving & squid conversations[edit | edit source]
[via Mark] there's a fix coming in for Squid to keep serving old pages if horrible errors happen -- this is usually what we want and will make some types of temporary errors and outages less disruptive. Yay!
Varnish.... what's the status?
[Domas recommendation] Easiest thing we can do now is to prep for quick failover:
- front-end caches maintain local HTTP caches
- replicate databases
- replicate parser cache <- high-scale-ready DB-backed parser cache is ready to go in 1.17 (check w/ Tim for the setting that should be used before deployment)
- either drop sessions, or replicate them too
Then when it's time to fail over, the core cached data (parser cache, front-end HTTP caches) are already filled and ready to go. Sessions currently are in the main memcached which there's no good way to replicate, but we could probably switch them to DB or something else.
[Consider security issues w/ database replication of sessions; default is to store them in DB but we haven't really thought about them in that use for production in a long time.]
[Tim, domas, mark] various ideas about memcached & parser cache size, eviction strategy <- someone fill in details please
Mark interjected the requirement that we need to rework how we do session handling. Currently, we rely on PHP built-in sessions, which in turn rely on Memcached.
- target 1.19
- what actually needs changing?
Possible new timeline: 1.18 - minimal Het Deploy - July 2011 1.19 - new session handling code - October 2011 1.20 - HipHop deployment - January 2012
- New side project: Continuous external storage recompression
Discussion about XMLDumps and replication: (Tim, Mark, Domas, Rob, Ashar F, CT, Danese)
Tim suggests that we rethink our XML dump strategy to make it possible to have replication done by anyone.
localtime 5:31 -- some discussions & work happening are:
git's suitability php, performance localization tools extensionizing WikiLove
discussion of post- vs pre-commit review we don't have any control of the pace at which patches come in! we are seeming hostile if someone commits a rewrite, vs .... branch.... any sort of action if we have a small review backlog all the time, we can do ..... branch & release, get it out quicker
mdale, neil, firefogg dude -- chunked uploading protocol improvement plans based on one of Google's protocols. Sounds pretty good; mdale handling the server implementation (?). We think this avoids the issues with file concatenation that worried Tim about its predecessor, and Neil's got a good idea of how to do it on a non-filesystem file store (save the chunks as individual files in private temporary store; when complete merge them together into a single object in the public file store.)
[This should work for any file uploads from FileAPI-enabled browsers as well as for specialized stuff like FireFogg. UploadWizard could start uploading your files immediately after you've selected them, and reconnect if your net goes out! --brion]
started around 6:40pm local
Ryan Lane's presentation: testing[edit | edit source]
Community- oriented test and development
improve staff/volunteer collaboration goal = environment where people can work together
< this presentation has slides > -- http://wikitech.wikimedia.org/index.php?title=File:Community_Oriented_Test_and_Development.pdf&page=1
people will be able to have more permissions -- do more stuff without being root
including developers & volunteers
OpenStack -- virtualization software, like Amazon EC2 make virtual machines, ad dIP addresses, DNS entries, add SSH keys, log in... one project = one security zone
clone of our production environment everyone with shell access in one particular project test core changes, extensions
our puppet config will be checked in a git repository. Developers can experiment in their testing environment, then make merge requests
(Ryan sings a song about how this could work, spinning a tale of DevOps paradise)
(rainbows & unicorns too?) invisible pink rainbows?
[here is a diagram] (ASCII art?)
puppet is represented by a star because it is magic
Ryan Lane's MediaWiki extension: mw:Extension:OpenStackManager
Q: more people get root access mult instances of prod clones? so, it'll break relatively often? A: you have shell access to projects you're working on. You can use Projects for more than building new architectures -- security groups inside a default project
deployment mirrors -- we should have these.... expectation for keeping this up?
[not clear on what kind of access is being discussed here]
Hashar: PHPUnit[edit | edit source]
< this presentation has slides > -- http://noc.wikimedia.org/~hashar/pres/mwtesting/
introducing PHP unit install from PEAR; don't install from distros, they have old versions
for MediaWiki: phase3/test/phpunit everything written in PHP, start tests with "make" tests are run against your live database -- (Chad: We need to fix this)
coverage system to find code uncovered by the tests
70% executed during current tests for MediaWiki for includes: ~30%
"And you will end up in CR wall of shame."
"For those not sleeping yet..."
colored lines of code to see the coverage
Tim Alder (Kolossos): OSM Integration[edit | edit source]
< the audience roars with hunger >
Slides: File:Fossgis2011-WP-GEO.pdf (starting at page 12)
Postgres, PostGIS, hstore, Mapnik
client side: mostly with OpenLayers
Wikipedians like that we have multilingual maps
Live, transparent trials -- can also use on other maps.... map of surveillance cameras, maps of smoking & no-smoking restaurants..... different styles
some toolserver URLs
We want integration inside Wikipedia. We have an OSM map inside....? Yes: w:de:Hilfe:OpenStreetMap/en
in the German wikipedia, on the right side of the coordinates, we have a link to "map" or "Karte," so if you click, in the article you have the map, & in map, wikipedia objects linked!
choose your language in the map.... on the right, options -- background, overlay
translation done via translatewiki maps activated for all users on half a dozen wikis nearly 10% of the OSM main server
In the future, we want to show more than just points! We want to show something like rivers, & details.... could make a query to OSM, but we can also go a different way, perhaps better
Give objects in OSM a Wikipedia tag....
interwiki names use interwiki links from the Wikipedia articles
lines! complex areas! point clouds!
So we need more OSM objects that have Wikipedia tags right now, about 100K..... should be more like a million -- we have 1M+ coordinates in WP
tool to connect objects in OSM with WP articles (?) http://wiki.openstreetmap.org/wiki/JOSM/Plugins/RemoteControl/Add-tags so we have an add-tags tool
Patrick: Mobile portal (mobile site rewrite)[edit | edit source]
< this presentation has very very wordy slides > wikitech:File:Mobile site rewrite.pdf current implementation: in Ruby Patrick is rewriting the gateway in PHP as a MediaWiki extension Patrick just had Ed Tufte kill a kitten: http://markandrewgoetz.com/blog/index.php/2009/11/my-new-wallpaper/
Rewrite rational: ecology is PHP (Sumana's summary)
PatchOutputMobile -- Patrick's MW extension
What's next? proper device detection with WURFL & testing on various mobile devices.
the latter once Patrick gets further along! wants people testing everywhere, will want lots of help
Q from Mark Hershberger: why not a skin instead of an extension? (per wikitech-l question) A:mobile output .... ended up using skins a little, looked at Wikia, seemed like it would give him less flexibility wrt DOM. Will look again & see. Q: We are depending on the DOM. libxml ... he suppresses warnings and thinks it is fine (discussion of the fact that we put our HTML through Tidy so we do not have to worry about warnings, parsing bad HTML, etc. [libxml2's HTML parsing mode works very well for well-formed code, and its extra warnings are easily suppressed. I've seen horrible parsing failure modes with unterminated list tags etc -- but this never happens in MW's output so it's not an issue. --brion])
Now a demo of what he has so far!
examples in Japanese, Wiktionary
Q. What about Arabic? RTL? A: Thank you for listening to this presentation!
Timo Tijhof a.k.a. Krinkle[edit | edit source]
Logistics[edit | edit source]
Wifi will improve this weekend -- other Betahaus people are not here this weekend
Mark Hershberger will monitor & possess the key
we have a small sound system .... the arena is a nice place to hang out
the betahaus will have the borrowed Apple dongles through the weekend, near projector
be back by 11am Sat for t-shirts!
Recap of Friday[edit | edit source]
organizational stuff by Daniel Kinzler tonight: no closing time, but someone needs to be responsible for the key (Mark Hershberger aka hexmode) no definite dinner plans
Saturday: doors will open around 10am, sessions will start at 11am we'll have t-shirts; arrive before 11am to get one
Have a great day guys!