Wikimedia Platform Engineering/MediaWiki Core Team/Quarterly review, July 2014/Notes

The following are notes from the Wikimedia Foundation's MediaWiki Core team quarterly review meeting, July 16, 2014, 1PM - 2:30PM PDT. (Meeting overview page)

Present: Lila Tretikov, Erik Möller, Rob Lanphier, Tomasz Finc, Ori Livneh, Brion Vibber, Gabriel Wicke, Aaron Schulz, Tilman Bayer (taking minutes), Chris Steipp, Chad Horohoe, Howie Fung, Toby Negrin (from 1:45)

Participating remotely: Arthur Richards, Greg Grossmeier, Nik Everett, Bryan Davis, Brad Jorsch, Giuseppe Lavagetto, Sumana Harihareswara, Mark Bergsma, Chris McMahon

''Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material''



Team allocations
Rob: welcome

Allocations in the past quarter:
 * lots of projects in parallel, getting better at focusing
 * HHVM/Performance (Ori, Tim, Aaron)
 * SecurePoll/SUL (Dan, Brad, Chris)
 * Search (Chad, Nick)
 * Architecture-RfCs (Sumana)
 * Deployment Tooling (Greg, Antoine, Sam, Bryan)


 * HHVM was biggest project last q: Ori, Tim, Aaron
 * Securepoll, SUL: Dan was PM for these, Chris did prep work on SUL

Apart from that, Chris does lots of security reviews

Search: Chad and Nik working on these for past year or so. deployed to most wikis now

Nik: joined 1y+1m ago, worked exclusively on search

Sumana, team: facilitate architecture discussions, RfC (here meaning: someone planning big arch change will document it, ask for feedback)

Tim spent some time on this in this q too

Release Engineering & QA: shared with MW Core team

Upcoming allocations:

SUL: Dan, Chris, Bryan with others (Kunal coming in)

Data & Developer Hub: Sumana will focus on this

HHVM
Ori:

team: me, Tim, Aaron, with help from Ops

and contributions from across Engineering, plus Facebook

high-level goal (not this q): cut time for saving an edit in half, wildly out of bounds with current user expectations

caveat: won't affect cached views (ca. 95%)

but reduce current "penalty" for logged in users - think of it as a negative customer loyalty program ;)

Lila: this is a great goal. Timeline?

Ori: Q1 goal is for backend processing time (not yet for delay experienced by users)

in e.g. VE, part of delay is network latency

why we can still hit goal for actual save time:

optimizing (other things)

Lila: goal for when users will see this?

Ori: at end of this q, users should see benefit of HHVM

upstream contributions: Tim etc. touched >17k lines of code

got other developers involved in HHVM (platform?), generated a lot of bug reports

app server stack redo (including e.g better monitoring):

far more productive relationship between Core and Ops

HHVM - Q1 Performance Targets
not yet articulated as user perceived performance, unfortunately

reduce wall time for WikiPage::doEditContent and EditPage::getPreviewText by 50%

Erik: what is the 50th percentile currently?

Ori: just over 5 sec for both of these

Gabriel: there's also an architectural issue that we can address by removing parsing from the critical path, enabling async saves

HHVM - Plan
upgrade Ubuntu etc.

start with testwiki, gradually deploy to more from there

Performance - benchmarks
Had planned to establish set of key perf indicators, whose meaning is understood by all engineers, and which are credible to them

and can be tracked on weekly timeframe

postponed to Q2

RobLa: qestions about HHVM?

Lila: Performance benchmarks goal is great. sad to see this postponed, really important

Rob: we have some indicators already, e.g. last year we disabled ULS (fonts) based on them, reenabled later

or TimedMediaHandler: made some changes based on these indicators

it's more about establishing KPI for everyone, clearly visible for all

Ori: a year ago, didn't have any at all

now on 1/1000 sampled basis - operational and working

need a lot of focused work on rigorous definitions

Arthur: what percentage of non-MW Core people's work needs to be spent on HHVM work?

Ori: official answer: none, unofficial: as much as I can get out of them ;)

haven't formalized that yet, would be useful to codify it

give folks a sense that they are permitted to work on it

but leveraging enthusiasm is valuable too ;)

team is small, e.g. FB has order of magnitude more staff for this task

Lila: this is backend, so probably more a project management thing than product management

make clear how people are split

Erik: Some casual work is fine, e.g. people have natural impulse to improve the work environment [here: for HHVM] once they see it

but if it become a longer time commitment...

let's chat a bit more about that later

Gabriel: how are you trying to achieve logged-in user performance goal? will you still fall through to PHP? Ori: yes, HHVM speed-up

RobLa: e.g. Tim's time is hard to account for

my tenure at the Foundation has basically been focused on making sure not too many things for which Tim is the only one that can do them ;)

Ori: long track record of voluntaristic contributions to MW development

existing pattern of social interaction on non-coercion and consensus building

challenging at times, but rewarding

this perhaps explains some of the resistance

Lila: sure, if it works and milestones are hit, that's fine

Chris: thought about separate perf test environment, do you see value in that?

Ori: highly relevant, tested on machine in prod environment

FB has tiers, they can draw boundary around a set of machines, tell load balancers to put a certain load on these machines, deploy code specifically on that tier, observe effect

Gabriel:... systematic perf testing in Parsoid has been very valuable in highlighting perf regressions before deploy

Greg: ...

Single User Login Finalisation (SUL)
Product Owner: Dan

Engineers: Bryan Davis, Kunal Mehta, Chris Steipp

Community: Keegan Peterzell, Rachel diCerbo

all wikis maintain separate user account tables

most people have global account, but some people grandfathered in with local accounts

causes a lot of problems with global feature development, e.g. Flow

or: going from enwiki to Commons for image upload, separate login ruins workflow

SUL finalisation work required
6 pieces of work:
 * Big rename script (the finalization proper)
 * pre-finalization: merge "easy" accounts (those without clashes. done several 1000 already)
 * pre-finalization: request a rename (e.g. send message warning about forcible rename). Need tool so they don't create even more accounts to file these requests on Meta, and to support stewards who are going to do the actual renaming work. Will talk to stewards about their needs
 * post finalization: log in with old credentials (covers people who haven't received or read the notifications)
 * post finalization: global account merge
 * post finalization: global rename user

E.g. user "Axel" - should not end up with 10 global accounts

Erik: are we going to create a tool for these users?

Dan: probably not

Lila: all of this planned this q?

Dan: yes, the engineering work. not yet the community-facing work, but will set a date for that before end of Q1. And need communications plan

broken accounts are still being created, Kunal fixing this now

Chad: even if it's only the easy accounts, that still shrinks the problem's scope

CirrusSearch
Team
 * Engineers: Nik Everett, Chad Horohoe
 * Others: Andrew Otto (Ops), Dan Garry (product)

Timing
 * Targeting migration of all remaining Wikipedias this quarter, but might be next quarter

Hardware
 * Servers we have now were left overs
 * Just got ops traction to buy more (this morning!)

Ops report
 * Need more

Once we migrate all the wikis what next?

Nik:

rationale: old system not well understood, hard to fix

e.g. 3 months ago, something went wrong, took out search on enwiki for 20min

We chose rebuilding with something else. Share more code with rest of world - easier to debug, get support etc.

and, state of the art moved on twice since we implemented Lucene-search

Lila: based on Elasticsearch? (yes) Scaling?

Chad: We've done a lot of tuning

Nik: ElasticSearch works for document sets both smaller and larger than ours, the difficult thing is that we want to provide a rich query set that regular users do. It's turned out to be a good choice for that. We've done some things to scale it

I have become the # 3 upstream contributor to Elasticsearch (who doesn't work for them)

Lila: got it :)

Nik: I'm on Java side, Chad on MW side

also: Andrew Otto (about 10% of his time), Dan Garry (Product)

Got power users to try it out, tweaked it based on their needs

Dan: main goal was feature parity

Nik: yes, but we snuck in a few nice things for Commons

no Ops support beyond me

hardware issues: old machines, took as much as we could from Tampa, shipped

but expect new machines soon

Elasticsearch's recommendation is usually to throw more hardware at it, but talking with them to find different solutions

Targeted completion of deployment this q, but might become next

Lila: how far deployed now?

Chad: around 98% of wikis, but in terms of traffic, about 50% (not on enwiki yet)

Nik: then work on features for power users

Architecture/RFC
Team:
 * MW Core: Tim Starling
 * Product Owner: Sumana Harihareswara
 * Others: Brion Vibber …

…

Issues
 * What's in scope?
 * logistical help needed

Proposed changes
 * Expanding the group reviewing RFCs
 * Rechristen "architecture committee", or "team"

RobLa:

a fair amount of time from Tim (besides HHVM), Sumana, Brion, Mark B

big changes to MW architecture, in the hope it will spur more change

prior to that, lots of people who wanted to make big changes, but didn't know how to get there

had summit earlier this year

documented security, perf requirements

Sumana: many wanted guidance on basic principles

Lila: need high level goals guiding architecture changes. It's like an oil change on a car ;)

quantify and qualify goals, after say 3 years

many different possibilities to box this

Erik: had some of that conversation at summit

specifically, service-oriented architecture (SOA) Requests for comment/Services and narrow interfaces

enable developers to work without deep knowledge of core

e.g. Parsoid: loosely coupled to applications that use it

Sumana: documents aim to be more descriptive than aspirational

often say "see architecture vision discussion"

Rob: this org is a bit different from other orgs in that elsewhere, people are more happy to tear everything down and rebuild

here, deep seated pragmatism (almost to the point of conservatism)

for many years, light staffing meant keeping site running was a huge achievement

these habits have persisted beyond point of usefulness

Lila: not questioning whether we should, just asking for definition of goals

Erik: will have summit again in January, attached to All Staff meeting

also including conversation with 3rd party users

Rob: there is stuff that has been laying dormant because of this

e.g. Tim's focus on HHVM: important and correct decision, but still means he's not working on these things

not a good plan yet how to move RfCs forward, want to talk about this

general logistical help needed

expand group of people working on this (beyond Brion, Tim and Mark - which was kind of an arbitrary decision based on their experience and seniority within the org)

Erik: seems straightforward for you guys to nominate new members

Tomasz: also, articulate what architects are responsible for

Lila: I agree

Erik: unusual here is that a lot of RfC proposals come from community, more open process

Lila: I see

Erik: goals could help prioritize RfCs

Sumana: I have done that as the person facilitating RfCs

but that didn't enable the respective developers to prioritize it in their work

Erik: ...

Tim: we should indeed enlarge the group, call it "arch committee" instead

public + private meetings, 5-6 people

Brion: rotating memberships, rather than fixed people as "big arbiters of arch"

also to prevent burnout - could become high bandwidth thing

need better idea of what is in scope, and what are smaller issues

seen a wide range of proposals

concentrate on high-end arch changes

define reason, expected effect, timeline, resourcing level that WMF can provide

Mark: agrees expanding would be good, if only for bandwidth concern (I haven't been able to spend much time on it)

Sumana: suggested some delegates already

Arthur: talked about this with Rob

Lila: doesn't all need to be discussed in this meeting, but ...

Erik: the above is more an improvement of what we are already doing

should have separate conversation on defining priorities

Sumana: but need to discuss in this meeting who will take over once I stop facilitating in a few weeks

Gabriel: overarching direction might just emerge as more stakeholders come in and give input; convergence needs time

Erik: yes, some specific goals might need time. but near term, can get consensus on SOA goal in general

Lila: defining outcomes as part of process is fine, but...

--> plan some future discussions on this (Wikimania in-person maybe?)

Data & Developer Hub
Sumana:

Team:
 * MW Core: Brad Jorsch
 * Product Owner: Sumana Harihareswara
 * Others: Juliusz Gonera, Moiz Syed

Details: https://www.mediawiki.org/wiki/Data_%26_Developer_Hub

nutshell: WMF staff (& external community) need better guidance to use our APIs

Sumana: In order for us to be able to rapidly develop new features, and rapidly onboard new developers - both at Wikimedia Foundation and in our open source community - we need documentation. And our current documentation for the main MediaWiki web API is not good enough, which causes a lot of redundant time-wasting questions in IRC and the mailing lists. As we move into a more service-oriented architecture approach, it gets even more important to bootstrap internal developers better. Right now, we have experienced engineers who end up duplicating effort or writing lower-performance code because they don't know what our APIs can do, and of course better documentation would also help our new hires a lot as well.

So we're creating a Data & Developer Hub to centralize a bunch of scattered resources we have in a lot of different places. This will also help external developers in our community use our APIs, such as our mobile-centric APIs, Wikidata, Upload, Geodata, and so on, and our data dumps and other data sources.

problem to be solved:

lot of redundant communication

not bootstrapping our internal devs well

collect scattered resources in one place

will also help external developers

me in (at least) 6 months: find out which APIs need work most

Q1 goals:
 * developer hub prototype
 * landing page, 3 projects showcased, 3 apis documented
 * api sandbox functional prototype
 * clear process for future improvements

Ori: is RC stream (recent changes stream) among the projects to be showcased in this quarter?

Sumana: it's a strong candidate, probably first thing we use as example. (PS, after the meeting: See http://juliuszgonera.com/wddh/ prototype which will probably examine RCstream as example )

Ori: I think it should be. It's a combination of something that users have used repeatedly (e.g. "Listen to Wikipedia" got a lot of traction), developers and researchers will use this a lot

Erik: +100, will get used a lot

Sumana: will have "Inspired" section, Listen to Wikipedia will feature there; http://seealso.org/ as part of inspiration for this

but also will make sure we support internal needs for work on priority goals

so, will consider this API among others, holistically

SecurePoll redesign
Dan:

We do a lot of elections in community (e.g. ArbCom, Board)

currently need to hand-write XML specifying setup for SecurePoll each time

Lila: is this an extension or part of MW core? (an extension)

can we use this for internal polling instead of e.g. SurveyMonkey?

Dan: not designed for that use case, but not much of a stretch

Chad: could be worse - imagine we would install Limesurvey again ;)

Team offsite
Rob:

historically team has been less process-oriented, will discuss with Arthur on whether/how to change team process
 * Chad: We're going to either come back with a process or kill each other trying

Detailed allocations - next quarter
(slide)

Questions?

Gabriel: Should perhaps have a separate conversation, but am still wondering about your plan on how to achieve authenticated user perf goals?

Ori: SPDY is very important, haven't had bandwidth for that yet, but would supports efforts

Gabriel: in services we are aiming for speeding up authenticated page views by making those page views static / fully cached as well; currently reimplementing a lot of user preference stuff with APIs & client side code to make this possible

Erik: this calls for separate conversation

RobLa: annual goals: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2014-15_Goals#MediaWiki_Core

Q3/Q4 still drafty