Talk:Requests for comment/Services and narrow interfaces

From MediaWiki.org
Jump to navigation Jump to search

Hear hear! Yeah, we like this one... see my own comments/proposal about Mediawiki as a Service here: User:Owyn/Mediawiki_As_Service I was going to submit that myself but I think rather than cluttering up the RFC page with a second proposal which seems very similar, we could work on merging these together? I'd be happy to help contribute more to this proposal. Owyn (talk) 00:37, 19 December 2013 (UTC)

There is indeed a lot of overlap here. I think we are propagating something in between your API and functional decomposition alternatives. Each service should have a clearly defined API, but I don't see a need to have a single API service only.
Lets try to merge the two. Can you add the content you think is missing? -- Gabriel Wicke (GWicke) (talk) 00:19, 3 January 2014 (UTC)

Seems like what we're doing already[edit]

There isn't really much I can say about this. Yes, we have always used external services, and will continue to do so. The decision as to whether something should be a service or should be integrated will continue to be made on a case-by-case basis. An RFC is meant to be a change proposal, but the picture painted here seems to be pretty similar to what we're doing and what we've always done.

There's not much difference between an HTTP service which takes POSTed data in some application-specific format and an application-specific binary protocol like Memcached or MySQL. Both approaches are limited in the same sense. One has the advantage of a common toolset, the other has better efficiency. Wirth's Law states that we will eventually migrate to the solution which is less efficient but more abstract. This is fine.

The RFC states that the move from NFS to Swift was the first migration of an existing service of MediaWiki to an HTTP service. This is incorrect, the migration of search from MySQL to Lucene.NET in 2005 was the first. But it doesn't seem like an important landmark to me, since we had similar services before that, they just used protocols other than HTTP.

As for SOAP, WSDL, etc., well, we've never used those and nobody is saying we should start. We generally haven't used REST in the past because, strictly defined ([1] etc.) its applications are very limited. Swift calls itself REST, but it is unclear to me whether it would qualify as REST under the w:HATEOAS constraint. Luckily the RFC adds "plain HTTP" as an implementation option in addition to REST, which I think covers everything we've done in the past and everything we're planning on doing in the future.

"Currently, each team needs to handle the full stack from the front-end through caching layers and Apaches to the database."

Well yes, on some level this is true, but the situation is mitigated by a great deal of existing modularity. It's not necessary for every team to be aware of all the details of how MySQL or Varnish work, and each team is not required to reimplement those components. And there are plenty of small teams working on internal services, with little need for awareness of the frontend.

There has been some debate as to whether we should have product teams made up of various kinds of specialist, or whether we should have specialist teams which collaborate across teams to produce products. There is something to be said for the product team approach, even if it does require a broad view of the system architecture. The workload for product teams would be reduced by having well-documented modules provided by internal service teams.

Regardless of the level of modularity, software developers will always be faced with the problem of understanding and integrating several different modules.

"This tends to promote tight coupling of storage and code using it, which makes independent optimizations of the backend layers difficult. It also often leads to conflicts over backend issues when deployment is getting closer."

This is a good argument to use for splitting out a particular service, but I don't see any general principle. Sometimes a prospective backend layer can be identified for abstraction, sometimes not. Sometimes it makes sense to split the backend layer out into a separate network service, sometimes it doesn't.

-- Tim Starling (talk) 06:03, 20 December 2013 (UTC)

Current MediaWiki core is not structured around a set of services with narrow interfaces. The reasons are varied, but a large chunk can safely be explained historically and with concerns about third-party shared hosting use and packaging (for networked services) as described in this RFC.
"Both approaches are limited in the same sense. One has the advantage of a common toolset, the other has better efficiency. Wirth's Law states that we will eventually migrate to the solution which is less efficient but more abstract. This is fine."
I share your concern about efficiency, but actually believe that raising the level of abstraction in interfaces will prove beneficial for performance by enabling important macro-optimizations. Providing a high-level API to ask for a bit of information rather than writing SQL queries describing how to retrieve it lets us switch to the most efficient back-end implementation without modifying all consumers. A higher-level interface also typically means that much fewer requests are necessary to perform the same task. This makes fixed per-request overheads less important. At the micro-optimization level SPDY / HTTP 2.0 and relatively efficient service platforms are bringing per-request overheads down to a point where the dichotomy between efficient low-level protocols and inefficient high-level protocols is disappearing.
"The workload for product teams would be reduced by having well-documented modules provided by internal service teams."
Agreed, especially for lower-level interfaces. For high-level interfaces between services I prefer REST style interfaces for their promotion of narrowness, support for distribution and external use, and an easier integration in a common parallel IO abstraction. Simplifying parallel IO is good for performance.
Regarding team structures, I believe that we can benefit from more cross-team communication and shared interfaces / infrastructure between products. Dividing the responsibility for the design and implementation of a product between sub-teams or just members of different groups can help to add a stronger group axis to our communication and thinking. With sub-teams starting to negotiate the interface definition early in the development process, the result is more likely to strike a good balance between the needs of, for example, the front vs. back-end. Common patterns and interface needs are more likely to be picked up between members of a group working on similar parts of different projects. New engineers don't need to know the full stack to be productive, and feature teams can gradually build on more commonly useful infrastructure provided to them by other groups.
"Sometimes a prospective backend layer can be identified for abstraction, sometimes not."
I have a hard time coming up with any non-trivial feature that does not have any identifiable back-end layer, or does not benefit from higher-level back-end services. Whether those services are implemented as a network service or local code can be an implementation / optimization question transparent to the consuming code, especially when it is accessed through a REST-style interface.
None of these ideas are terribly earth-shattering or original. The contribution this RFC is intended to make is to more coherently describe this architectural option and its advantages and disadvantages so that we can consider it along with the ideas described in the Architecture guidelines‎. -- Gabriel Wicke (GWicke) (talk) 03:07, 3 January 2014 (UTC)

Shared hosting and small hosts[edit]

We need robust figures for the alleged similarity in cost of virtual servers vs. shared hosting. I remember some examples being made, but not a real market survey. Statistics on what people are actually using can be produced with queries such as wikiapiary:Host:Hosts makes ([2] would help improving the dataset). A non-negligible amount of wikis runs on gratis shared hosting, in addition to those running on free farms.

As for the importance of such users, my (biased) opinion is that they are the source for most of the positive innovation in MediaWiki. They probably also are the source of most of the MediaWiki code around: after all, most MediaWiki sysadmins are forced to be a bit developers as well, in fact running MediaWiki has never been as easy as running Wordpress and dozens of CMS are more common than MediaWiki by 1–3 orders of magnitude. Moreover, what WMF, Wikia, wikiHow etc. make specifically for themselves is rarely ever used by non-WMF wikis.

--Nemo 14:35, 16 January 2015 (UTC) P.s.: Finally, I don't like that non-WMF users are confined to a sub-section of this RfC, as if users were a roadblock rather than the blood of MediaWiki. This RfC can't look serious to me until it becomes more balanced: the 2nd and 3rd section should be split in three even sections on benefits for all users, for Wikimedia projects, for non-Wikimedia wikis.

@Nemo_bis:: Regarding costs, some examples are [3][4][5][6]. The first is from one of the largest European hosting companies (OVH), and gives you 1G of RAM & 10G disk for $3/month. I think it's safe to say that you can run MW with Parsoid etc on your own instance at <= $5/month, and prices continue to fall. -- Gabriel Wicke (GWicke) (talk) 16:45, 16 January 2015 (UTC)

DigitalOcean WordPress is one of the most commonly ask question from newbie due to the popularity of cloud hosting. Over the decade traditional hosting is now getting out of the market due to performance and security.

MZMcBride on security section[edit]

Isn't it trivial to argue that more points of entry and additional complexity (for example, introducing new programming languages such as Java or Hack or whatever) substantially increases the attack vectors? --MZMcBride (talk) 20:37, 14 January 2015 (UTC)