User:GWicke/tmp/services talk

Hear hear! Yeah, we like this one... see my own comments/proposal about Mediawiki as a Service here: User:Owyn/Mediawiki_As_Service I was going to submit that myself but I think rather than cluttering up the RFC page with a second proposal which seems very similar, we could work on merging these together? I'd be happy to help contribute more to this proposal. Owyn (talk) 00:37, 19 December 2013 (UTC)


 * There is indeed a lot of overlap here. I think we are propagating something in between your API and functional decomposition alternatives. Each service should have a clearly defined API, but I don't see a need to have a single API service only.
 * Lets try to merge the two. -- Gabriel Wicke (GWicke) (talk) 00:19, 3 January 2014 (UTC)

Seems like what we're doing already
There isn't really much I can say about this. Yes, we have always used external services, and will continue to do so. The decision as to whether something should be a service or should be integrated will continue to be made on a case-by-case basis. An RFC is meant to be a change proposal, but the picture painted here seems to be pretty similar to what we're doing and what we've always done.

There's not much difference between an HTTP service which takes POSTed data in some application-specific format and an application-specific binary protocol like Memcached or MySQL. Both approaches are limited in the same sense. One has the advantage of a common toolset, the other has better efficiency. Wirth's Law states that we will eventually migrate to the solution which is less efficient but more abstract. This is fine.

The RFC states that the move from NFS to Swift was the first migration of an existing service of MediaWiki to an HTTP service. This is incorrect, the migration of search from MySQL to Lucene.NET in 2005 was the first. But it doesn't seem like an important landmark to me, since we had similar services before that, they just used protocols other than HTTP.

As for SOAP, WSDL, etc., well, we've never used those and nobody is saying we should start. We generally haven't used REST in the past because, strictly defined ( etc.) its applications are very limited. Swift calls itself REST, but it is unclear to me whether it would qualify as REST under the HATEOAS constraint. Luckily the RFC adds "plain HTTP" as an implementation option in addition to REST, which I think covers everything we've done in the past and everything we're planning on doing in the future.


 * "Currently, each team needs to handle the full stack from the front-end through caching layers and Apaches to the database."

Well yes, on some level this is true, but the situation is mitigated by a great deal of existing modularity. It's not necessary for every team to be aware of all the details of how MySQL or Varnish work, and each team is not required to reimplement those components. And there are plenty of small teams working on internal services, with little need for awareness of the frontend.

There has been some debate as to whether we should have product teams made up of various kinds of specialist, or whether we should have specialist teams which collaborate across teams to produce products. There is something to be said for the product team approach, even if it does require a broad view of the system architecture. The workload for product teams would be reduced by having well-documented modules provided by internal service teams.

Regardless of the level of modularity, software developers will always be faced with the problem of understanding and integrating several different modules.


 * "This tends to promote tight coupling of storage and code using it, which makes independent optimizations of the backend layers difficult. It also often leads to conflicts over backend issues when deployment is getting closer."

This is a good argument to use for splitting out a particular service, but I don't see any general principle. Sometimes a prospective backend layer can be identified for abstraction, sometimes not. Sometimes it makes sense to split the backend layer out into a separate network service, sometimes it doesn't.

-- Tim Starling (talk) 06:03, 20 December 2013 (UTC)


 * I do not think that it is far-fetched to state that current MediaWiki core is not structured as a set of services (local or remote) with narrow interfaces. The reasons are varied, but a large chunk can safely be explained historically and with concerns about third-party shared hosting use and packaging as described in this RFC.


 * "Both approaches are limited in the same sense. One has the advantage of a common toolset, the other has better efficiency. Wirth's Law states that we will eventually migrate to the solution which is less efficient but more abstract. This is fine."


 * I share your concern about efficiency, but actually believe that raising the level of abstraction in interfaces will prove beneficial for performance by enabling important macro-optimizations. Providing a high-level API to ask for a bit of information rather than writing SQL queries describing how to retrieve it lets us switch to the most efficient back-end implementation without modifying all consumers of this information. A higher-level interface also typically means that much fewer requests are necessary to perform the same task. This makes fixed per-request overheads less important. At the micro-optimization level SPDY / HTTP 2.0 and relatively efficient service platforms are bringing per-request overheads down to a point where the dichotomy between efficient low-level protocols and inefficient high-level protocols is disappearing.


 * "The workload for product teams would be reduced by having well-documented modules provided by internal service teams."


 * Agreed, especially for lower-level interfaces. For high-level interfaces between services on the other hand I prefer REST style interfaces for their promotion of narrowness, support for distribution and external use, and an easier integration in a common parallel IO abstraction. Simplifying parallel IO is good for performance.


 * Regarding team structures, I believe that we can benefit from more emphasis on cross-cutting concerns and shared interfaces between products, and less vertically siloed teams operating in relative isolation. Dividing the responsibility for the design and implementation of a product between sub-teams or just members of different groups (core/ services for backend vs. features / product for front-end for example) can help to add a stronger horizontal axis to our communication and thinking. With sub-teams starting to negotiate the interface definition early in the development process the result is more likely to strike a good balance between the needs of say the front vs. back-end. Cross-cutting concerns and patterns at a given interface layer are more likely to be picked up with more horizontal communication. This has worked well for some teams, and I believe that embracing services at the core of MediaWiki can make this kind of division of labor more widely available to other teams.


 * "Sometimes a prospective backend layer can be identified for abstraction, sometimes not. Sometimes it makes sense to split the backend layer out into a separate network service, sometimes it doesn't."


 * I have a hard time coming up with a significant feature that would not be easier to build and maintain with a better back-end infrastructure, if only to avoid having to deal with details of storage and its optimization. Whether that service is implemented as a network service or local code can be an implementation / optimization question transparent to the consuming code.


 * None of these ideas are terribly earth-shattering or original. The contribution this RFC is intended to make is to more coherently describe this architectural option and its advantages and disadvantages so that we can consider it along with the ideas described in the Architecture guidelines‎. -- Gabriel Wicke (GWicke) (talk) 03:07, 3 January 2014 (UTC)