Wikimedia Engineering/Service and REST API team

One of outcomes of the architecture summit in January was a consensus for moving towards a service-oriented architecture. Major reasons for this are:


 * enabling new engineers to be more productive by letting them concentrate on a part of the stack
 * improved reliability and security through fault and security context isolation
 * ease of testing, debugging, monitoring and scaling
 * supporting new uses of our data by rich clients and mobile through a well-structured and efficient public content API

A storage service and a REST content API are particularly good candidates for first steps in this direction. There has been some preliminary work on the Rashomon revision storage service by the Parsoid team and a generic PHP service interface with support for parallel operations by Aaron.

This has happened as side projects, with other team responsibilities taking priority. We need to do more if we really want to reap the benefits of a service-oriented architecture.

The new service team will take on this challenge. It will


 * guide existing work to think in terms of independent services and versioned APIs,
 * help to develop internal and external REST APIs (ex: Flow),
 * help to set up automated tests against those APIs,
 * identify and implement general backend services in support of other teams (ex: storage service), and
 * support packaging for internal and third party distribution.

In the short term, this can help other teams by removing some of their workload, avoiding duplicate work and helping with API implementations.

In the longer term, it will
 * build up a consistent REST content API to enable mobile and innovative feature projects,
 * provide back-end services like the Rashomon storage service, and
 * improve testing and thus the ability to deploy more often without breaking things.

Ultimately, it should help us make real progress towards a service-oriented architecture with strong APIs. It will hopefully also result in improved developer productivity, testability, security and agility as discussed in the SOA RFC.

= Plans for next fiscal = Note: This is currently in the drafting stage.

Q4 / Q1

 * Implement first iteration of REST API front-end (restface): (G: 1-2m)
 * Alpha deploy to api.wikimedia.org (G: 1w)
 * Simple POC implementation of page metadata end point for Parsoid HTML views (redlinks etc) (M: 1w)
 * PDF deployment: (M: 2m)
 * First iteration on Rashomon (with versioned blob and queue buckets & lame auth) (G: 1-1.5m)
 * Deploy & start using it with Parsoid (G: 2w)
 * Wrap HTML load/save in restface (M: 2w, G: 1w)
 * Citation service (M,G: 2d-3w)
 * Mathoid deploy (M: 2w, G: 1w)
 * HTML templating: documentation, in particular KnockOff compiler & PHP implementation (G: 2d)
 * Iterate once we have feedback from users & think more about use for content & messages (G: 1w)

Q2

 * Security: Design & implement more intelligent security architecture / authentication solution (G: 2m)
 * Efficient page metadata end point for Parsoid HTML views (redlinks etc) (M: 1m, G: 3w)
 * Set up proper Varnish caching and purging (w/ help from ops) (G: 2w
 * Packaging / deployment -- Make debian/ubuntu packages for frontend / pdf / rashomon (M: 2w)
 * Structured API documentation -- set up frontend (G: 2w)
 * Help other teams like Mobile use page-related storage & build data extraction services (G: 3w)

Q3

 * Job queue runner using storage service queues (M,G: 2w)
 * Help with mobile API / service needs (ongoing)
 * Help with Flow API needs
 * Think about solution (bucket type?) for link table scaling, in collaboration with platform & ops
 * Implement new bucket types in storage service (G: 1m)
 * Prototype HTML content / i18n message templating solution in collaboration with Parsoid & platform (G: 1m)

Q4

 * Implement an efficient CentralNotice end point (M: 1m)
 * Iterate on HTML templating solution (G: 1m)
 * Multi datacentre operation (G: mostly ops, 1w)
 * Possibly implement link table solution in storage service (?)
 * cacheable WikiData API? Echo?
 * Possibly look into HTML diffing service for HTML-only operation

Interdependencies

 * Parsoid depends on Rashomon revision storage & content API
 * VE, Flow, Mobile & platform depend on HTML content & page metadata end points
 * lots of stakeholders on storage service (platform, features, mobile, dev community)
 * lots of stakeholders on HTML templating (community, platform, features, mobile)

REST API front-end (working title: restface)

 * Goal: support high volume with low latency
 * Varnish caching & reliable purging
 * Usually thin wrapper around back-end services; normal case: just load from storage service
 * If missing, ask other services to create data on demand & save back to storage service
 * Consistent REST API with structured API docs

Enable move to native Parsoid HTML5 storage & page views

 * Use static Parsoid HTML5 for all page views
 * HTML5 load / save entry point for use by desktop and Mobile page views, VE, content translation and others
 * To power Mobile skin, apps
 * Improve desktop page view latency for editors (currently 50+% higher median page load times)
 * Page metadata entry point for rendering of red links and other bits currently implemented as server-side content transformations
 * Facilitate additional content derivative end points (e.g. Mobile: section loading, citations, section image urls)

Miscellaneous service end points

 * Citation expansion service entry point for VE & others: expand a URL to full citation data using Zotero data extractors
 * CentralNotice banner service

API end point design and prototyping support for other teams

 * Example: Help Flow team in the development of a REST API for use by rich front-end, mobile

Storage service

 * Performance benefits: VE async save, static HTML for mobile front-end & authenticated users
 * Aiming for ability to use this for regular page views Q2
 * Improved page view performance for editors (currently 50+% slower)
 * Reduce load on PHP cluster (HW cost and energy savings)
 * Enables seamless and fast switching from page view to VE, async saving
 * Support for cross-datacenter replication, compression and even load distribution across storage cluster
 * Helps to solve scaling problems in MySQL (revision table, link tables)

Generalization of storage service to support different bucket types

 * Candidate bucket types, roughly by priority: versioned blob, queue, key-value, ordered key-value, counter
 * Features like authentication, TTL

Update & invalidation jobs

 * Ensure that stored data is kept up to date with changes, and front-end caches are invalidated
 * Possibly look into simple HTTP job runner using queue in storage service

Misc backend services

 * Deploy & maintain PDF render service


 * Maintain Math render service (Mathoid)

Structured API documentation

 * Goals:
 * Machine-readable API specs
 * Browsable documentation & sandbox
 * Auto-generated mock APIs
 * Help establish best practices in declarative API documentation using tools like swagger
 * See this section in the content API RFC

Drive automated service testing

 * Mocking


 * Work with QA & Antoine on containerization
 * Try to leverage API specs

Evolve authentication in collaboration with platform

 * Develop security & authentication / authorization architecture in collaboration with platform
 * Least privilege
 * Isolation
 * Efficient for high request volumes
 * Using standards (OAuth2, OpenID connect)
 * Document authentication requirements clearly in API spec

Deployment and Packaging in collab with platform, ops

 * Drive packaging of services for practical third-party and internal use
 * Leverage packages as much as possible for deployment, DRY
 * Use Puppet for configuration management

HTML content
= FAQ =
 * Continue work on HTML content templating in collaboration with Parsoid & other teams
 * Build on TAssembly, KnockOff
 * Stretch goal: Look into stand-alone HTML diffing service independent from Parsoid

Division of labor between PHP and REST API
The PHP and REST APIs have a different focus and will be largely complementary. The PHP API offers powerful features including generators, but is not designed to support serving content at high request rates. Per-request overheads are high, and caching the output is not generally possible. Mostly due to its performance characteristics it is not commonly consumed by MediaWiki itself.

The REST API on the other hand focuses on simple but high-volume interfaces with a small per-request overhead. It further supports caching, which makes it suitable for directly serving content at a larger scale.

Developers will need clear guidance on which interface they should use for specific tasks. The REST interface will start out fairly small and won't overlap with the PHP API. Over time we can consider gradually migrating simple and high-volume end points (like opensearch) to the REST interface. This will happen in close cooperation with the PHP interface team.

Also see these meeting notes from the API roadmap meeting last September.