Wikimedia Platform Engineering/MediaWiki Core Team/Quarterly review, July 2014/Notes

From mediawiki.org

The following are notes from the Wikimedia Foundation's MediaWiki Core team quarterly review meeting, July 16, 2014, 1PM - 2:30PM PDT. (Meeting overview page)

Present: Lila Tretikov, Erik Möller, Rob Lanphier, Tomasz Finc, Ori Livneh, Brion Vibber, Gabriel Wicke, Aaron Schulz, Tilman Bayer (taking minutes), Chris Steipp, Chad Horohoe, Howie Fung, Toby Negrin (from 1:45)

Participating remotely: Arthur Richards, Greg Grossmeier, Nik Everett, Bryan Davis, Brad Jorsch, Giuseppe Lavagetto, Sumana Harihareswara, Mark Bergsma, Chris McMahon

Please keep in mind that these minutes are mostly a rough transcript of what was said at the meeting, rather than a source of authoritative information. Consider referring to the presentation slides, blog posts, press releases and other official material

Presentation slides from the meeting

Team allocations[edit]

Rob: welcome
Allocations in the past quarter:

  • lots of projects in parallel, getting better at focusing
    • HHVM/Performance (Ori, Tim, Aaron)
    • SecurePoll/SUL (Dan, Brad, Chris)
    • Search (Chad, Nick)
    • Architecture-RfCs (Sumana)
    • Deployment Tooling (Greg, Antoine, Sam, Bryan)
  • HHVM was biggest project last q: Ori, Tim, Aaron
  • Securepoll, SUL: Dan was PM for these, Chris did prep work on SUL

Apart from that, Chris does lots of security reviews
Search: Chad and Nik working on these for past year or so. deployed to most wikis now
Nik: joined 1y+1m ago, worked exclusively on search
Sumana, team: facilitate architecture discussions, RfC (here meaning: someone planning big arch change will document it, ask for feedback)
Tim spent some time on this in this q too
Release Engineering & QA: shared with MW Core team

Upcoming allocations:
SUL: Dan, Chris, Bryan with others (Kunal coming in)
Data & Developer Hub: Sumana will focus on this

HHVM[edit]

Ori:
team: me, Tim, Aaron, with help from Ops
and contributions from across Engineering, plus Facebook
high-level goal (not this q): cut time for saving an edit in half, wildly out of bounds with current user expectations
caveat: won't affect cached views (ca. 95%)
but reduce current "penalty" for logged in users - think of it as a negative customer loyalty program ;)
Lila: this is a great goal. Timeline?
Ori: Q1 goal is for backend processing time (not yet for delay experienced by users)
in e.g. VE, part of delay is network latency
why we can still hit goal for actual save time:
optimizing (other things)
Lila: goal for when users will see this?
Ori: at end of this q, users should see benefit of HHVM
upstream contributions: Tim etc. touched >17k lines of code
got other developers involved in HHVM (platform?), generated a lot of bug reports
app server stack redo (including e.g better monitoring):
far more productive relationship between Core and Ops

HHVM - Q1 Performance Targets[edit]

not yet articulated as user perceived performance, unfortunately
reduce wall time for WikiPage::doEditContent and EditPage::getPreviewText by 50%
Erik: what is the 50th percentile currently?
Ori: just over 5 sec for both of these
Gabriel: there's also an architectural issue that we can address by removing parsing from the critical path, enabling async saves

HHVM - Plan[edit]

upgrade Ubuntu etc.
start with testwiki, gradually deploy to more from there


Performance - benchmarks[edit]

Had planned to establish set of key perf indicators, whose meaning is understood by all engineers, and which are credible to them
and can be tracked on weekly timeframe
postponed to Q2

RobLa: qestions about HHVM?
Lila: Performance benchmarks goal is great. sad to see this postponed, really important
Rob: we have some indicators already, e.g. last year we disabled ULS (fonts) based on them, reenabled later
or TimedMediaHandler: made some changes based on these indicators
it's more about establishing KPI for everyone, clearly visible for all
Ori: a year ago, didn't have any at all
now on 1/1000 sampled basis - operational and working
need a lot of focused work on rigorous definitions

Arthur: what percentage of non-MW Core people's work needs to be spent on HHVM work?
Ori: official answer: none, unofficial: as much as I can get out of them ;)
haven't formalized that yet, would be useful to codify it
give folks a sense that they are permitted to work on it
but leveraging enthusiasm is valuable too ;)
team is small, e.g. FB has order of magnitude more staff for this task
Lila: this is backend, so probably more a project management thing than product management
make clear how people are split
Erik: Some casual work is fine, e.g. people have natural impulse to improve the work environment [here: for HHVM] once they see it
but if it become a longer time commitment...
let's chat a bit more about that later
Gabriel: how are you trying to achieve logged-in user performance goal? will you still fall through to PHP? Ori: yes, HHVM speed-up
RobLa: e.g. Tim's time is hard to account for
my tenure at the Foundation has basically been focused on making sure not too many things for which Tim is the only one that can do them ;)
Ori: long track record of voluntaristic contributions to MW development
existing pattern of social interaction on non-coercion and consensus building
challenging at times, but rewarding
this perhaps explains some of the resistance
Lila: sure, if it works and milestones are hit, that's fine

Chris: thought about separate perf test environment, do you see value in that?
Ori: highly relevant, tested on machine in prod environment
FB has tiers, they can draw boundary around a set of machines, tell load balancers to put a certain load on these machines, deploy code specifically on that tier, observe effect
Gabriel:... systematic perf testing in Parsoid has been very valuable in highlighting perf regressions before deploy
Greg: ...


Single User Login Finalisation (SUL)[edit]

Product Owner: Dan
Engineers: Bryan Davis, Kunal Mehta, Chris Steipp
Community: Keegan Peterzell, Rachel diCerbo

all wikis maintain separate user account tables
most people have global account, but some people grandfathered in with local accounts
causes a lot of problems with global feature development, e.g. Flow
or: going from enwiki to Commons for image upload, separate login ruins workflow

SUL finalisation work required[edit]

6 pieces of work:

  • Big rename script (the finalization proper)
  • pre-finalization: merge "easy" accounts (those without clashes. done several 1000 already)
  • pre-finalization: request a rename (e.g. send message warning about forcible rename). Need tool so they don't create even more accounts to file these requests on Meta, and to support stewards who are going to do the actual renaming work. Will talk to stewards about their needs
  • post finalization: log in with old credentials (covers people who haven't received or read the notifications)
  • post finalization: global account merge
  • post finalization: global rename user

E.g. user "Axel" - should not end up with 10 global accounts
Erik: are we going to create a tool for these users?
Dan: probably not
Lila: all of this planned this q?
Dan: yes, the engineering work. not yet the community-facing work, but will set a date for that before end of Q1. And need communications plan
broken accounts are still being created, Kunal fixing this now
Chad: even if it's only the easy accounts, that still shrinks the problem's scope

CirrusSearch[edit]

Team

  • Engineers: Nik Everett, Chad Horohoe
  • Others: Andrew Otto (Ops), Dan Garry (product)

Timing

  • Targeting migration of all remaining Wikipedias this quarter, but might be next quarter

Hardware

  • Servers we have now were left overs
  • Just got ops traction to buy more (this morning!)

Ops report

  • Need more

Once we migrate all the wikis what next?

Nik:
rationale: old system not well understood, hard to fix
e.g. 3 months ago, something went wrong, took out search on enwiki for 20min
We chose rebuilding with something else. Share more code with rest of world - easier to debug, get support etc.
and, state of the art moved on twice since we implemented Lucene-search
Lila: based on Elasticsearch? (yes) Scaling?
Chad: We've done a lot of tuning
Nik: ElasticSearch works for document sets both smaller and larger than ours, the difficult thing is that we want to provide a rich query set that regular users do. It's turned out to be a good choice for that. We've done some things to scale it
I have become the # 3 upstream contributor to Elasticsearch (who doesn't work for them)
Lila: got it :)
Nik: I'm on Java side, Chad on MW side
also: Andrew Otto (about 10% of his time), Dan Garry (Product)
Got power users to try it out, tweaked it based on their needs
Dan: main goal was feature parity
Nik: yes, but we snuck in a few nice things for Commons
no Ops support beyond me
hardware issues: old machines, took as much as we could from Tampa, shipped
but expect new machines soon
Elasticsearch's recommendation is usually to throw more hardware at it, but talking with them to find different solutions
Targeted completion of deployment this q, but might become next
Lila: how far deployed now?
Chad: around 98% of wikis, but in terms of traffic, about 50% (not on enwiki yet)
Nik: then work on features for power users

Architecture/RFC[edit]

Team:

  • MW Core: Tim Starling
  • Product Owner: Sumana Harihareswara
  • Others: Brion Vibber …


Issues

  • What's in scope?
  • logistical help needed

Proposed changes

  • Expanding the group reviewing RFCs
  • Rechristen "architecture committee" , or "team"

RobLa:
a fair amount of time from Tim (besides HHVM), Sumana, Brion, Mark B
big changes to MW architecture, in the hope it will spur more change
prior to that, lots of people who wanted to make big changes, but didn't know how to get there
had summit earlier this year
documented security, perf requirements
Sumana: many wanted guidance on basic principles
Lila: need high level goals guiding architecture changes. It's like an oil change on a car ;)
quantify and qualify goals, after say 3 years
many different possibilities to box this
Erik: had some of that conversation at summit
specifically, service-oriented architecture (SOA) Requests for comment/Services and narrow interfaces
enable developers to work without deep knowledge of core
e.g. Parsoid: loosely coupled to applications that use it
Sumana: documents aim to be more descriptive than aspirational
often say "see architecture vision discussion"
Rob: this org is a bit different from other orgs in that elsewhere, people are more happy to tear everything down and rebuild
here, deep seated pragmatism (almost to the point of conservatism)
for many years, light staffing meant keeping site running was a huge achievement
these habits have persisted beyond point of usefulness
Lila: not questioning whether we should, just asking for definition of goals
Erik: will have summit again in January, attached to All Staff meeting
also including conversation with 3rd party users
Rob: there is stuff that has been laying dormant because of this
e.g. Tim's focus on HHVM: important and correct decision, but still means he's not working on these things
not a good plan yet how to move RfCs forward, want to talk about this
general logistical help needed
expand group of people working on this (beyond Brion, Tim and Mark - which was kind of an arbitrary decision based on their experience and seniority within the org)
Erik: seems straightforward for you guys to nominate new members
Tomasz: also, articulate what architects are responsible for
Lila: I agree
Erik: unusual here is that a lot of RfC proposals come from community, more open process
Lila: I see
Erik: goals could help prioritize RfCs
Sumana: I have done that as the person facilitating RfCs
but that didn't enable the respective developers to prioritize it in their work
Erik: ...
Tim: we should indeed enlarge the group, call it "arch committee" instead
public + private meetings, 5-6 people
Brion: rotating memberships, rather than fixed people as "big arbiters of arch"
also to prevent burnout - could become high bandwidth thing
need better idea of what is in scope, and what are smaller issues
seen a wide range of proposals
concentrate on high-end arch changes
define reason, expected effect, timeline, resourcing level that WMF can provide
Mark: agrees expanding would be good, if only for bandwidth concern (I haven't been able to spend much time on it)
Sumana: suggested some delegates already
Arthur: talked about this with Rob
Lila: doesn't all need to be discussed in this meeting, but ...
Erik: the above is more an improvement of what we are already doing
should have separate conversation on defining priorities
Sumana: but need to discuss in this meeting who will take over once I stop facilitating in a few weeks
Gabriel: overarching direction might just emerge as more stakeholders come in and give input; convergence needs time
Erik: yes, some specific goals might need time. but near term, can get consensus on SOA goal in general
Lila: defining outcomes as part of process is fine, but...

--> plan some future discussions on this (Wikimania in-person maybe?)


Data & Developer Hub[edit]

Sumana:
Team:

  • MW Core: Brad Jorsch
  • Product Owner: Sumana Harihareswara
  • Others: Juliusz Gonera, Moiz Syed

Details: https://www.mediawiki.org/wiki/Data_%26_Developer_Hub
nutshell: WMF staff (& external community) need better guidance to use our APIs

Sumana: In order for us to be able to rapidly develop new features, and rapidly onboard new developers - both at Wikimedia Foundation and in our open source community - we need documentation. And our current documentation for the main MediaWiki web API is not good enough, which causes a lot of redundant time-wasting questions in IRC and the mailing lists. As we move into a more service-oriented architecture approach, it gets even more important to bootstrap internal developers better. Right now, we have experienced engineers who end up duplicating effort or writing lower-performance code because they don't know what our APIs can do, and of course better documentation would also help our new hires a lot as well.

So we're creating a Data & Developer Hub to centralize a bunch of scattered resources we have in a lot of different places. This will also help external developers in our community use our APIs, such as our mobile-centric APIs, Wikidata, Upload, Geodata, and so on, and our data dumps and other data sources.

problem to be solved:
lot of redundant communication
not bootstrapping our internal devs well
collect scattered resources in one place
will also help external developers
me in (at least) 6 months: find out which APIs need work most


Q1 goals:

  • developer hub prototype
  • landing page, 3 projects showcased, 3 apis documented
  • api sandbox functional prototype
  • clear process for future improvements

Ori: is RC stream (recent changes stream) among the projects to be showcased in this quarter?
Sumana: it's a strong candidate, probably first thing we use as example. (PS, after the meeting: See http://juliuszgonera.com/wddh/ prototype which will probably examine RCstream as example )
Ori: I think it should be. It's a combination of something that users have used repeatedly (e.g. "Listen to Wikipedia" got a lot of traction), developers and researchers will use this a lot
Erik: +100, will get used a lot
Sumana: will have "Inspired" section, Listen to Wikipedia will feature there; http://seealso.org/ as part of inspiration for this
but also will make sure we support internal needs for work on priority goals
so, will consider this API among others, holistically

SecurePoll redesign[edit]

Dan:
We do a lot of elections in community (e.g. ArbCom, Board)
currently need to hand-write XML specifying setup for SecurePoll each time
Lila: is this an extension or part of MW core? (an extension)
can we use this for internal polling instead of e.g. SurveyMonkey?
Dan: not designed for that use case, but not much of a stretch
Chad: could be worse - imagine we would install Limesurvey again ;)

Team offsite[edit]

Rob:
historically team has been less process-oriented, will discuss with Arthur on whether/how to change team process

  • Chad: We're going to either come back with a process or kill each other trying


Detailed allocations - next quarter[edit]

(slide)

Questions?

Gabriel: Should perhaps have a separate conversation, but am still wondering about your plan on how to achieve authenticated user perf goals?
Ori: SPDY is very important, haven't had bandwidth for that yet, but would supports efforts
Gabriel: in services we are aiming for speeding up authenticated page views by making those page views static / fully cached as well; currently reimplementing a lot of user preference stuff with APIs & client side code to make this possible
Erik: this calls for separate conversation
RobLa: annual goals: https://www.mediawiki.org/wiki/Wikimedia_Engineering/2014-15_Goals#MediaWiki_Core
Q3/Q4 still drafty