Talk:Architecture Summit 2014/Service-oriented architecture

From mediawiki.org

Service Oriented Architecture session

Proposed Agenda:

  1. 5-20 minutes - various lightning talks about the different proposals on the table
  2. 40-60 minutes - discussion of big questions

Service-oriented architecture, a.k.a. Services and narrow interfaces (Slides)

Gabriel Wicke & Faidon Liambotis
  • Nik Everett: is it ok to expand the list of MediaWiki dependencies?
Mark Hershberger:
Yuri Astrakhan:
  • Tim Starling: Part of the motivating factor of this is wanting to rewrite some bits of this in Node for example. Could we do this more quickly implement some aspects of SOA by just splitting out classes?
    • Gabriel: Many things that Parsoid has needed were possible via the existing API and some API extensions
  • Brad Jorsch: I'm a little worried that we'll have A->B->C->A calling loops with a complicated loop that doesn't work well
    • FL: on the first point, other projects have been doing this successfully. this is pretty standard
    • OpenStack etc work this way and have healthy dev communities
    • Katie Filbert: map rendering?
    • GW: FL: (nod)
  • Chris Steipp: be careful where drawing lines in security, other concerns -- can be hard to split these things out cleanly.
    elasticsearch was awesome, but we suffered from lucene for years without great java dev interest
  • Matt Flaschen: I agree we should have Debian packages, but it would be nice to have other ways. In some ways, we can get the benefit of just disentangling the PHP code.
  • Tim: I think it's going to be more difficult to set up a development instance. When I had to do lsearchd maintenance, I had to set up 3 different VMs. Swift, I think, required 4. This may argue in favor of PHP services, because we can make the split optional.
    • GW: completely agree. we can have a sqlite backend
    • Nik: I can't rebut because it's true, but we tried this with Cirrus. At least in that case, it's not toooo bad to set up. But it doesn't make it ok to easy.
    • Mark B: we can solve this by getting better setup.
    • Trevor: VE team has Parsoid service running. I can't help but hear a lot of fear of the unknown.
    • Tim: I'm not proposing that, I think that we can refactor our PHP services. Other comment not related to performance, I was concerned about performance when we added more to hook calls. we're going from a 20 microsecond hook call to 20 millisecond
    • GW: well we have no hook call at all because we're caching these things. PHP has a huge startup cost, but we can mitigate by caching things.
  • Brion: general question: anyone who thinks doing things as services is a bad idea?
(one maybe)
  • Chad: I think we need to tread carefully. Storage makes sense. Authentication may not make sense for reasons that Chris said. There's also a certain value in being able to download mediawiki and start hacking at it.
  • GW: how about better packaging?
  • Chad: I think packaging is a pain., but it might help
  • Brion: are VMs a good tool for this?
  • Brion: hands up if you've used Vagrant?
most hands
  • Hands up if you've gotten really confused
many hands
  • Hands up if you still like it
many hands
  • Mark H: we need to dedicate resources to support Vagrant
  • GW: we're in denial about packaging and that we may need to support that better
  • Ori: Debian may be in denial about modern software distribution
  • Faidon: (I didn't catch what he said)
  • Owen: good idea to have a simple implementation and more complicated implementation to prove quality of the interface. As a service, there's 20ms-60ms delay which happens with every request due to initialization overhead; this makes it difficult to use MW itself as a service component
  • GW: I think we can start by moving some things out such as Storage where we have a pilot project.
  • Owen: (missed comment)
  • Yuri: we have a number of services implemented. First step: set up the procedure so that we have 3rd party and Wikimedia all follow similar path so that they all use something like Puppet. Once it can be proven it can work with all three use cases, we can know that it works. we should minimize impact on small 3rd party users
    1. large cluster deployment
    2. small deployments
    3. developers
    try to cover these cases well
  • Roan: support what Yuri said. value in starting with smaller pieces that are smaller that are less controversial. deploy, see what happens. I initially got up here to talk about Tim/GW conversation. GW was talking about Parser Hooks, Tim was talking about general hooks. Everything should be hookable, but you generally shouldn't have to have cross service hooks...no crossing service boundaries. If you have to do something to a service that needs something from another service, you are probably doing wrong.
  • Dan: I think I'm in the minority as nonPHP dev. Developer access to service oriented architecture. I'd be much more likely to contribute if there were smaller services I could contribute to. Bottleneck isn't how hard the dev environment is to set up.
    • Trevor: I'd like to offer if you fire up MW that 1. didn't mean you had to fire up stuff on localhost just to get started. Most dev environments try to mimic Wikipedia *a little bit* .... with services you can use real ones and it's a more similar setup
    • Markus Glaser: I agree with the idea of Trevor. I see a lot of use cases for NGOs and they have very little technical skills. If I tell them to use Vagrant or Debian package, they just won't do it. I'm afraid if we do this, there won't be anyone to make sure that small use cases can still be supported.
  • Max S: while I generally agree with the stuff here, but I don't think SOA makes sense for everything. I think it's all case-by-case.
  • F: we are discussing whether it's a sound concept,
    • GW: whether to keep doing this stuff in core....
    • Max: present section is a bit misleading (referring to second to last slide). we had a service before with NFS
  • Katie: translation should be a service that can be used outside of MediaWIki, possibly splitting out that code for other PHP developers
  • Tim: talk again about PHP services. A few other people mentioned problems (Roan, others) Trevor said you could use faux requests, I don't think that's an answer.
    Thinking more an RPC than a REST arch -- call is agnostic to whether local or not; send necessary configuration with the data so you don't have to configure the other side where possible
  • GW: in the Java land it was not very successful. (RPC calls can hide where errors occur)
  • Tim: using an unconfigured MW for the services would cut down on startup time
  • GW: Separate issue -- startup time
    External vs internal interfaces -- try to keep API surface area small
  • Tim: another possibility to avoid startup overhead: persistent MW process (easier under HipHop?)
  • Mike Schwartz: this is the best option we have. Best way to break up 3 million line code base. There are other websites nearly as big as Wikipedia that are operating sucessfully with SOA. Gives us freedom to evolve. Our cost in bringing up new developers is mastering the monolith. Ease of developers for current devs is not what we should be optimizing for. We should be optimizing for the next 50 or 100 developers we bring up to speed. Let's do this. We're not breaking new ground here...we're following ground that has been broken 5 years ago. Let's be bold.
  • Tyler: I agree with Mike. PHP is focused on things that you start up and die, it's not for persistence. I guess what I'm trying to say is that an SOA is the best option for a PHP based program.
  • Bryan: As new person to MW development, but has been doing software devlopment for some ... amount of time. I think the things that Tim said and Gabriel said aren't at odds, but seeing each others point needs to happen. To build a system, you need to draw the lines as a first step. You don't have to restrict yourself to one particular implementation. The discussion is just that we have to make it scale from enwiki to bob's train codebase.
  • Brad: while it may be easier
  • Erik: Is anyone proposing specific services that will be *required* separately, making MW dependent