Wikimedia Enterprise

The OKAPI Team (Ocean of Knowledge API) is a new cross-departmental team at the Wikimedia Foundation consisting of folks in the Technology, Product, and Advancement departments. Our core responsibilities are to uncover, design, and build products that will remove load from our primary servers and to enable more sustainable and diverse funding of the Wikimedia movement.

Current Technical Roadmap
Current Roadmap as laid out from previous customer discovery initiatives related to developing third-party products.

Personae
These will evolve and grow as more user personas start to engage with the product.


 * Downloader - Any person who intends to access Wikimedia Foundation data through bulk data downloading.

Epic 1: HTML Dumps (MVP)
The first epic for the OKAPI is to create "reliable" downloadable HTML dumps in a 2-week frequency on a simple web interface.

Goals

 * Already validated it will be valuable: Historically most requested feature by large technology partners.
 * Take some pressure off internal Wikimedia infrastructure: When organizations pull our current dumps, they need to hit our systems with every file in order to parse the Wikitext to HTML. Releasing this would immediately de-burden other parts of the organization.
 * Standalone in nature: Of the projects already laid out to consider, this is the most standalone. We can easily understand the specs without working with a specific partner. We also will not be forced to make design decisions that would affect a later product, again, it buys us some time.
 * Get BD in good shape: Before launching a larger business development effort, we would like to have something in place now that they can spark relationships with.
 * Strong introductory project for contractors: Limited in scope and touches many different projects internally. As far as projects to familiarize with the Wikimedia infrastructure, this is a great look into the Foundation’s tech stack/norms and will net learnings that the engineers will be able to use in future initiatives.

Hosting locations
This project is intended to serve already-public data but at massive scales, thereby reducing the burden on the existing infrastructures and teams. We’re prototyping using AWS because it’s faster, and while we’re working at the prototype stage and trying to figure out what this product should become, ability to make rapid changes in response to user and engineer feedback takes precedence.

We expect to use Kubernetes as the container for the tools we build to enable high portability and to be platform agnostic including with Wikimedia's own cloud services for the benefit of the Wikimedia movement. The code we produce will be published into a publicly accessible repo and will licensed under a free software license.

Closely Related Endeavours:

 * Dumps and Data dumps - We are working with the Dumps team to learn from their challenges and eventually help combine our work together around HTML Dumps and Wikitext Dumps.
 * Core Platform Team/Initiatives/API Gateway - In a similar effort of API Strategy, we are focusing on the "users of large scale" whereas the API Gateway is much more focused on the rest of the Wikimedian engineering community.

Public/Private Technologies as a part of the current solution:

 * RESTBase Parsoid Cache to pull the most recent HTML cache into our dumps
 * https://en.wikipedia.org/api/rest_v1/page/title/Test
 * https://en.wikipedia.org/api/rest_v1/page/html/Test
 * Event Streams API to monitor changes
 * ORES to monitor vandalism and help clean our dumps.

This project shares challenges and an overlapping problem space with the following:

 * Wikimedia update feed service - A previous paid data service that enabled third parties to maintain and update local databases of Wikimedia content.
 * Data request limitations - We are working with this team to see how this idea could play into OKAPI products.
 * Kiwix - We overlap with Kiwix's mwoffliner project for the HTML Dumps epic on our project but do not overlap in any way in use case. We are exploring leveraging their technology and also potentially providing tools to collaborate as we exit prototyping phases.