Berlin Hackathon 2011/Notes/Sunday

From mediawiki.org

Wheee, new pad! Have a nice day

Live feed http://www.livestream.com/wikimediaberlin

Notes from Friday: etherpad:mwhack11Fri Notes from Saturday: etherpad:mwhack11Sat

Bug smash https://bugzilla.wikimedia.org/buglist.cgi?keywords=bugsmash&resolution=---&list_id=6729


                                               SUNDAY   

Code Sprint[edit]

Daniel opens the session at 11:25am

There's a survey sheet in the attendance kit -- please fill it out, it will be collected Nicole has more copies at the desk

What did we do yesterday?

  • crash the cluster at 4am (not really, it was just to have some fun this morning)
  • bugsmash: 65 bugs fixed so far!
  • HTTPS & IPv6: work still going on
  • usability testing for Kiwix offline Wikipedia testing (easy to find content via search, few oddities in Kiwix vs. other browsers with regards to tabs, more analysis to follow)
  • joke bug 28984 (patch is 5.6MB!)
  • implementing global watchlist - partial implementation (people have been asking for it for 10 years) -- thanks, Victor!

A few talks, some short sprinting -- bulk of afternoon to explore town! (or just stay here and code more)

Danese's observations, for use by people

  • tech days tomorrow
  • 12:30pm lunch
  • 1:30 work starts
  • ask Carrie for special meal requests
  • Management group -- meet up at 11am Monday -- Alolita, Tomasz, RobLa, CT, Mark B.

(we also need a list of people who want their picture taken by Guillaume for the staff page)


Wiki Loves Monuments.... API requests -- Neil K took some requirements for uploading interface, not aware of any API requests

Brandon's talk: Identity[edit]

http://livestre.am/MkVg < this presentation has slides >

With the WMF yesterday - talked re parser WMF is focusing on 2 projects - parser, & -1 to 100 Edits team mostly about editor retention ER is very sticky prob no 1 silver bullet many small things we can do, many of which will be very controversial big fear: we will turn Wikipedia into Facebook (boos) many reasons to fear that but FB does right -- create community community is comprised of individuals every person creates their own ID, ID is important you decide who you are, & you have conversations with others, make connections more people will join your conversations .... until you have what is actually a community. Standard way of building, indiv upward: ID + conversation = community. But WP did it backwards. Focused on community first, not ID

Community is fracturing, some are not coming back. Part of the reason: They don't necessarily feel engaged. People aren't seen as individuals, don't feel like individuals, don't want to get involved.

So we are going to start focusing a little more on that,like allowing you to create ID. want to be able to say "Ah, i know you, you have these interests, in common with me, I can invite you to foo project"

WE ARE NOT FACEBOOK and will never be

Diff social networks have diff reasons. Facebook: espec to get laid. [laughter] We can't do that

We have a cycle -- we will use the community to define our identity. You are a Wikipedian, or a Wikimedian, or a person on WMCommons, part of this wikiproject, interested in spires. We will use that, reinforce your identity. "I am part of this community. I am an editor. I am a dev. I am a Wikipedian." That in turn will reinforce community, driving communication, trust

you will see many small things come forward that will be about this type of stuff..... probably won't affect you in the beginning, at first mostly affecting new users, small changes, account creation project (add interest -> structured data -> profiles, which then shows up in watchlists so that new page patrollers don't kill these people).

no big changes at first, but over time: sum total == something completely different.

It's time to achieve Zen acceptance that we will add these things, because in order to save the projects, we're going to actually "kill" the community side.

Q: I have been editing for 6 years, I come to conferences & hackathons & chapters, & I still have an I vs them feeling! so many of us have this, even veterans. A: veterans will also be affected, not just new users...... discussion re making it easier to get involved in relevant WikiProjects, find relevant users with common interests

way to look at structure of online communities: warrens vs plazas warrens: small groups of people, hard to find & join, easy to make a big impact once you have joined plazas are opposite: huge groups, easy to police because you can see anything that goes on, but hard to do any major work WP/WM projects are almost completely warrens: difficult to find new things to do that aren't immediately obvious to you outside of your immediate interest group. Easy to think "those people on WikiSource are weird, we at WikiNews are cool" GOAL: facilitate movement around projects You write news & WP about spiders, you should write a textbook on spiders!

Q: Gerard M.: re chapters: possible to for people to say they are member of a chapter? A: absolutely. Q (GerardM): translatewiki.net part of the ecosphere, movement... would you like to have things like translatewiki be part of this community & ID? A: I don't feel qualified to have an opinion on that. Q (GerardM) re Facebook: I love .... fan page ... originally, outside WMF because separate entities, chapters.... A. implementation....... I am a member of the following groups? [inaud] Q (GerardM) Babel templates.... I am a member of FSF.... makes sense to have that as well A. that is were to come to the fine line between us & FB.... I am a member of this chapter, or translatewiki, ok. But "I work for [job]" is too far -- we are not LinkedIn. A million diff associations..... we shall have to have this discussion Q (GerardM). Is this community thing Wikimedia, or movements? CC, etc. is part of what we do? A. we will start with simple "tell us 5 things you are interested in" and later get into associations


Q. (victor) Some will be controversial, example "I am part of anticommunist movement" ..... A. ex. republican/democrat. There will always be drama around diff people being part of opposed orgs. It will be controversial (for indiv users).... we will help you structure that, not just a template? (unclear)


Domas[edit]

http://livestre.am/MkXv works for Facebook.

Here to talk about performance have worked on making WP sites a bit faster for the past few years. Overview - concerns.

Generally: spending money building data centers we don't have .... to build servers?

all the web boxes that have mediawiki ...

40% -> 80% on busy days might look like we have headroom, but now perf problems -- mult parts what if someone edits a template used on every page on website might send cluster into death spiral

more important for retention: what experience does a logged in user get? sluggish & nonresponsive: bad I remember when an average thing on the website: 10 seconds. Then we got to ... 1/2 second (fantastic) eventually down again, gotten to ... page needs reparsing, oter things happening, other process... viewing pages for diff settings... painful exp for user back in the day, 3 second response times or 0.5 seconds faster, more resources, more users come in! industry belief: if user comes to site & it is slow they will not stay if fast, they'll stay

  1. 1 frustration -- if you hit a parser miss on a heavily edited page

20 or 30 seconds first few .... browser to another window

parser cache work can solve this for viewers but if an editor has to wait 30 seconds to see if edit went through, very expensive

prob has mult components major component: 80% of parse, 90% is parsing references & citations ref block at bottom is 50%+ of parsing workload one thing .... optimization, extensions, better complaining layers more efficient languages is another way

MediaWiki is becoming complex software. gender caches, user pref mgmt, so many forms

bulk load of that..... work is loading datability???? baseline is extensive 30milliseconds to send a basic page to user squid, just a few ms

a very easy way to decrease pageload & editing times: a better runtime like hiphop but we have had transition in the mediawiki code to the fact that heavy PHP code is no longer.... mult optimizations, autoloader gone, etc made PHP codebase faster

we have to look at what is the slow part in the code? things like .... how do you make efficient software?

don't do what you don't have to do don't do esperanto l10n on a random extension if you're on a en.wp page

feature rot

RobLa: how did we arrive at identifying references as a big performance hook? force profile = true action=purge you see what gets executed

We have a profiler where we can see that references formatting is a few dozen percent Profiling data: http://noc.wikimedia.org/cgi-bin/report.py an see that references formatting is a few dozen p?db=all

domas & tim looked logic is extremely complex one of the major costs to move forward is the cost is template identify expensive templates guessing game there's no way right way to know which template is causing the most load. No tools.

in DB optimization, our DB is in a reasonable shape because we try not to do stupid things e.g. in RC it's reasonable to fetch 500 lines every feature that needs to fetch millions of lines to show something is expensive

"show me the last changes" is a complex issue every reporting feature e.g. show user contributions by namespace = requires more data, is more expensive

our data access is usually so efficient that new features add more complexity and goes more expensive

what DB performance optimization has to be done


HipHop = hackathon project at facebook prototype in a few days, then worked 6 months on it, then he got a team then it was deployed little by little, first 10% of requests, then 50%, etc.

more optimization going on

Tim Starling is known in the internal PHP community as the Wikipedia guy who looks for and reports inefficiencies


Tim Starling: Hip Hop PHP to C[edit]

http://livestre.am/MkYS Tim presents code from parser.php as transformed by HipHop

there's a lot of C++ from scratch, but it's much much faster than in PHP

Facebook: gained about a factor 3, but it varies

initial benchmark by Tim: about a factor 5

Wikimedia is looking to switch to this going to work on both HipHop and Zend

deployment to Wikimedia in late 2011 / early 2012

That's the schedule we're looking at for the most part, engineers don't need to know about it, don't need to install HipHop on your laptop, as long as you're writing code that works like most extensions, things will keep working as long as you use the autoloader - that's the main thing so, I have some notes on MW ... WM....

we have a few dev guidelines avoid reparse definitions in the same file.... that's the main restriction for regular developers

64-bit operating sys patches are not really ready for it yet there's a 16-bit port? Domas asks.

implementation is incomplete -- replaced everything from Zend, not just executor. replaced everything from PHP. PHP is wrapper around C libraries, & they're shared Zend & HipHop

pretty rudimentary ???

plenty of bugs in it yet, but it's in a state where we can use it in dev process

installer works!

Q.: we at Wikia have been looking for 3 months, good results ??? in most cases. Added extensions we have to add, down to about 2x?? don't know why.... encourage you to run with full extension set A. still TODO. Obv thing to check: make sure extensions are compiled. don't accidentally switch into interp mode HipHop site: "drop a few zend features no one uses such as eval" -- drop most lang features? we told them not to use it most works like ???? - get defined in var names -- all those dynamic features in PHP are there in hiphop. it just works. make sure code is compiled, tho


may distribute VM that has everything compiled & installed.

45 minutes to compile a s Dmall thing on this laptop. omas can do in 15. But .... [xkcd joke] http://xkcd.com/303/ this is probably the 1st reason to switch to HipHop, excuse for "it's compiling"  :-)

Roan: I have a serious Q..... reading TODO lists. security bugs, compression, services crashing..... how ... generally nonbroken is MW in HH? A. don't think it is security sensitive..... abort..... stack trace.... binary garbage...... written in C++ so not that broken really. Q. functionality.... how useful is MW right now? A. it works. don't have binary here to show you? doesn't work now because someone broke profile Agile sys breaks when someone moves files, need to fix up a few little things, bugs we'll hit occasionally, but FB is using HH exclusively now, & they've been very responsive to our bug reprots..... complained on mailing list a few months ago, they're all fixed now [inaud joke re Zend] Domas says: I added stuff so I could compile it? HH - told them, over a weekend they added these extensions Tim: you file a bug in Zend, some guy closes it as bogus, you msg someone on IRC, you are told to go to Mailing List, maybe someone fixes it, maybe not with HH, file in bug tracker, Domas .... [inaudlble]

Danese: "so finally a good reason for Domas to be at FB"


Q. Why go to all this trouble of writing in PHP and then converting to C++, why not just write MW in C++ in the first place? A. Domas: Not everyone knows C++.

 FB has more coding in a week than MW in a lifetime

you don't want to force everyone to write C++. If they can quickly prototype and then .... reasonable efficiency Tim: There are tens of thousands of MediaWiki installations, perhaps hundreds of thousands. If WMF decided to rewrite in C++, the PHP codebase would still have to continue, perhaps maintained as a community fork, to support those installations. Developing in PHP means that we can share development effort with the non-WMF user community.


RobLa: if anyone is a good packager for Ubuntu or Fedora, please help us package HipHop. Domas: anyone trying?

Domas: it's a compiler..... not easy, so end result is an app server that has mult... mode.....

RobLa: having a set of packages that will set up a dev env for someone -- better than now

Tim: there are CentOS pkgs, but -- we use Macs & then Ubuntu on the servers.

Domas: many dependencies: .... centos 5 .... Ubuntu... easy to hack a VirtualBox image?.... SSH in.... [completely inaudible] Domas says: there's more to say than this


Mark Bergsma on IPv6[edit]

http://livestre.am/MkYS <this presentation has slides> http://www.nedworks.org/~mark/tmp/Wikimedia%20IPv6.pdf

Ipv4: intro to why it is needed 32 bits, we are running out in the next few months, some stocks will run out completely

we as WMF, may have IPv4 addresses, but some new users in Asia or Africa will not, so they will be unable to access sites over IPv4. We don't have that many addresses in IPv4. Just opened a new data center, doubled our IP address usage. new proejcts -- WM labs

IPv6: not many differences: addresses are just way longer. 128 bits instead of 32. hex notation

incompatible with IPv4.

migrating to IPv6: not many users have IPv6 config last few months: improvement still, great majority doesn't even at this hackathon! some users & their computers THINK they have IPv6 broken connectivity! timeout, broken UX we are testing..... enable IPv6 on our site, about 0.30% of users, WM will be broken

testing on en.wikipedia JavaScript, some images serve via IPv6 stat from 2 days ago.... 0.26% cannot reach WP

Anon users IDd by IP address. Banning, etc. done by that. most websites don't show IPs to their users, we do as soon as we turn it on, most users will need to work with IPv6 addresses we'll have to have docx & support espec re banning will need help with that as well

in 2005 it was briefly enabled for < 1 day IPv6 tunnel broke for too many users, so we disabled it. River worked on this? 2008 -- IPv6 enabled on entire network. Europe people - good - upstreams [inaud]

we held off for a while....

our IPv6 network connectivity is now better, IPv6 enabled for some misc services not that many

as of 2 years ago, an experiment, like whitelist at Google, some businesses invited to use IPv6 as long as they promise to fix things that break IPv6 DNS results same on smaller scale at WM 40 Europe networks.... uploads of images & media over IPv6 + JS testing

many of you heard, June 8 world IPv6 day! things will break for everyone equally. Insurance policy, I guess for 24 hours. We don't know whether we will have test page, participate

TODO: 3 weeks left Infrastructure

   proxies reachable over IPv6
   squid - does not support in our version, so we'll put another proxy in front to bridge.  not hard, needs to be done, Ryan Lane & Mark working over last few days

MediaWiki -- mostly done -- we'll find out on June 8th!

   would be nice if General Engineering would pay attention, fix bugs as they come up
   test wiki
   many users using MW over IPv6 over last few years

User doc/support/communication

   how to work, how to deal with..... hope the community will step in -- Sumana to coordinate
   anyone who can help, write doc, work with the community

Mark happy to answer questions, help


Q: last IPv6 day, Maytag let people see whether they can receive IPv6 ... IPv4 image telling them they ..... IPv6 -- tell them not to display the image. We could do that to increase visibility without breaking? tell them to contact us. A. JavaScript testing we'vbe already been doing... could easily implement that.


Now, last presentation of day!

Lang Committee language issues in a diff room - please talk to them -- he figures that's a better way to do it

Mathias Schindler: WebP[edit]

http://livestre.am/Ml1z <this presentation has slides>

WebP is a codec for lossy image compression released by Google to make web faster. It is related to the WebM video codec.

Google has bought the company On2 which produced the VP8 codec and has released the sourcecode and has made a patent pledge. Some of you might know the VP2 codec which is now Ogg Theora video.

WebP is basically the same a single frame of a WebM video. Technically, every browser that can display a WebM video should have has no other probs displaying WebP, there is a JS hack available to demo it.

Google has released sw to convert images into WebP. The performance promise is up to 40% smaller in size compared to JPEG compatible quality.

On Wikipedia pages with lots of thumbnails (for example Albrecht Dürer), the page can easily be larger than 2 MB in size, with 75% of it being thumbnail.

Browsers which support WebP right now: Google Chrome, Opera Firefox waiting to officially support it. MSIE only via Google Chrome Frame No sign of Safari support yet, but signs of support in the webkit renderer.

Magnus Manske has proof of concept for English Wikipeda add a line into your personal JS file User:[your name]/common.js

Add this line:

importScript('User:Magnus Manske/webp.js');

redirects links to thumbnails.... toolserver..... <slide> try it yourself, add Magnus' tool in your personal user JS registered users who opt in to receive them.

Q. from mdale: normal WebM support for videos -- are people interested? talk to Mdale

Q: Tomcat???? [inaud]: The question was about the quality of compression for png files A. PNG file for good reasons on commons, serve as PNG file. If just png because the uploader did not understand use case, serve as JPEG. still smaller but ... depending on compress again? [inaud] a fundamental issue with WMCommons does not deal with raw upload well, so people attempt TIFF or JPEG instead of raw and then deal with thumbnails in a wrong format..... we recompress & add digital artifacts :-(


Danese: OK, that is the end of the talks

Who goes to tech days? WMF employees & 1 soon-to-be employee. Other folks: it will be boring if you do not work for us!

Thanks for making it out here. We are not kicking you out because we're stopping talking. Daniel says they can stay till about 6pm. Nicole says: clean up by 6pm. So, stay until 5pm., or help us clean till 6 :-) Or you can go into city. Daniel wants suggestions.

if you want to go out and explore the city museum of computer games? people can get together < show of hands >

   Sumana?

can we take pictures in the museum? :) [didn't catch other ideas]

Danese: lunch will be shortly (yay!)


please fill out survey so they know it's useful & can decide whetehr to do it again 1 time of year to get together & solve stuff

hacker lounge tonight at C-base starting at like 8 starting at 8pm really nice place, with flying saucers and all!!1!


Amir - one of the Haifa wikimania organizers before Wikimania, 2-day hackathon, you are all welcome there

   what do you want it to be like?  tell me

we

Sumana has a few ibuprofen pills left :-)

< group photo now > Group photo time! https://secure.wikimedia.org/wikipedia/commons/wiki/File:Wikimedia_Hackathon_Berlin_2011_group_photo.jpg

lunch?


Daniel K wants to talk to people who can do a JavaScript project


RobLa, Erik Z, Asher, Nimish, & others talked about analytics

Sumana, Joachim, & Markus Glaser talked about Selenium, PHPUnit, QUnit, Cruise Control, & testing infrastructure & process

Trevor, Roan, Krinkle, & Mark worked on ResourceLoader Wikia was working on language-variant caching?


Language committee had a meeting

Danny B. aiming to publicize the need for content creators to create accessible content -- talked with Sumana & Guillaume

Ops worked through many shell bugs, changed the hetdeploy schedule to concentrate on doing less & smaller at first


Hashar notes: Kaldari working with Jan-Paul, good example of collaboration use code review to give positive & negative feedback