Wikimedia Services/Roadmap

Note: This is currently in the drafting stage.

Q4 / Q1

 * Implement first iteration of REST API front-end (restface): (G: 1-2m)
 * Alpha deploy to api.wikimedia.org (G: 1w)
 * Simple POC implementation of page metadata end point for Parsoid HTML views (redlinks etc) (M: 1w)
 * PDF deployment: (M: 2m)
 * First iteration on Rashomon (with versioned blob and queue buckets & lame auth) (G: 1-1.5m)
 * Deploy & start using it with Parsoid (G: 2w)
 * Wrap HTML load/save in restface (M: 2w, G: 1w)
 * Citation service (M,G: 2d-3w)
 * Mathoid deploy (M: 2w, G: 1w)
 * HTML templating: documentation, in particular KnockOff compiler & PHP implementation (G: 2d)
 * Iterate once we have feedback from users & think more about use for content & messages (G: 1w)

Q2

 * Security: Design & implement more intelligent security architecture / authentication solution (G: 2m)
 * Efficient page metadata end point for Parsoid HTML views (redlinks etc) (M: 1m, G: 3w)
 * Set up proper Varnish caching and purging (w/ help from ops) (G: 2w
 * Packaging / deployment -- Make debian/ubuntu packages for frontend / pdf / rashomon (M: 2w)
 * Structured API documentation -- set up frontend (G: 2w)
 * Help other teams like Mobile use page-related storage & build data extraction services (G: 3w)

Q3

 * Job queue runner using storage service queues (M,G: 2w)
 * Help with mobile API / service needs (ongoing)
 * Help with Flow API needs
 * Think about solution (bucket type?) for link table scaling, in collaboration with platform & ops
 * Implement new bucket types in storage service (G: 1m)
 * Prototype HTML content / i18n message templating solution in collaboration with Parsoid & platform (G: 1m)

Q4

 * Implement an efficient CentralNotice end point (M: 1m)
 * Iterate on HTML templating solution (G: 1m)
 * Multi datacentre operation (G: mostly ops, 1w)
 * Possibly implement link table solution in storage service (?)
 * cacheable WikiData API? Echo?
 * Possibly look into HTML diffing service for HTML-only operation

Interdependencies

 * Parsoid depends on Rashomon revision storage & content API
 * VE, Flow, Mobile & platform depend on HTML content & page metadata end points
 * Lots of stakeholders on storage service (platform, features, mobile, dev community)
 * Lots of stakeholders on HTML templating (community, platform, features, mobile)
 * We depend on ops for provisioning, deployment & monitoring

= Details on individual projects =

REST API front-end (working title: restface)

 * Goal: support high volume with low latency
 * Varnish caching & reliable purging
 * Usually thin wrapper around back-end services; normal case: just load from storage service
 * If missing, ask other services to create data on demand & save back to storage service
 * Consistent REST API with structured API docs

Enable move to native Parsoid HTML5 storage & page views

 * Use static Parsoid HTML5 for all page views
 * HTML5 load / save entry point for use by desktop and Mobile page views, VE, content translation and others
 * To power Mobile skin, apps
 * Improve desktop page view latency for editors (currently 50+% higher median page load times)
 * Page metadata entry point for rendering of red links and other bits currently implemented as server-side content transformations
 * Facilitate additional content derivative end points (e.g. Mobile: section loading, citations, section image urls)

Miscellaneous service end points

 * Citation expansion service entry point for VE & others: expand a URL to full citation data using Zotero data extractors
 * CentralNotice banner service

API end point design and prototyping support for other teams

 * Example: Help Flow team in the development of a REST API for use by rich front-end, mobile

Storage service

 * See RFC for background
 * Aiming for ability to use this for regular page views Q2
 * Improved page view performance for editors (currently 50+% slower)
 * Reduce load on PHP cluster (HW cost and energy savings)
 * Enables seamless and fast switching from page view to VE, async saving
 * Support for cross-datacenter replication, compression and even load distribution across storage cluster
 * Helps to solve scaling problems in MySQL (revision table, link tables)

Generalization of storage service to support different bucket types

 * Candidate bucket types, roughly by priority: versioned blob, queue, key-value, ordered key-value, counter
 * Features like authentication, TTL

Update & invalidation jobs

 * Ensure that stored data is kept up to date with changes, and front-end caches are invalidated
 * Possibly look into simple HTTP job runner using queue in storage service

Misc backend services

 * Deploy & maintain PDF render service


 * Maintain Math render service (Mathoid)

Structured API documentation

 * Goals:
 * Machine-readable API specs
 * Browsable documentation & sandbox
 * Auto-generated mock APIs
 * Help establish best practices in declarative API documentation using tools like swagger
 * See this section in the content API RFC

Drive automated service testing

 * Mocking


 * Work with QA & Antoine on containerization
 * Try to leverage API specs

Evolve authentication in collaboration with platform

 * Develop security & authentication / authorization architecture in collaboration with platform
 * Least privilege
 * Isolation
 * Efficient for high request volumes
 * Using standards (OAuth2, OpenID connect)
 * Document authentication requirements clearly in API spec

Deployment and Packaging in collab with platform, ops

 * Drive packaging of services for practical third-party and internal use
 * Leverage packages as much as possible for deployment, DRY
 * Use Puppet for configuration management

HTML content

 * Continue work on HTML content & i18n message templating in collaboration with Parsoid & other teams
 * Build on TAssembly, KnockOff
 * Stretch goal: Look into stand-alone HTML diffing service independent from Parsoid