Kask

Kask stores and retrieves key/value pairs persistently. Keys are textual, values are arbitrary binary objects. Kask is intended for storing data such as web browser user sessions: relatively small amounts of data, but very large numbers of keys, and with tight latency requirements.

Kask is similar to Redis, but can be deployed to be distributed and replicated across several data centres. Kask also has a RESTful HTTP API, instead of using its own custom query language. Kask also has more access control than Redis.

Kask consists of an HTTP API layer on top of, and in front of, Apache Cassandra, the NoSQL database, which is used for storage. The API provides a simple key/value store using arbitrary textual keys and binary values. Cassandra is not visible to the API user. The API layer hides Cassandra and provides operations to set, get, and delete key/value pairs, as well as authentication.

The Cassandra layer provides the persistence, replication, and high-availability, across multiple data centres. It also handles the distributed quorum decisions to decide when updates. Cassandra is a master-less system, which means no single node is necessary for the group of Cassandra nodes to work.

Authentication is (currently) done using TLS certificates on the both client and the server side. There are no shared secrets, but the CA that signs the key needs to be trusted by the API layer.

Dependency Management
The libraries an application depends on are as much a part of the final product as the code we write ourselves, and yet it is all too common for us to choose them indiscriminately, retrieve them via untrusted sources, and treat them (and the entire graph of transitive dependencies) as black-boxes. Often this pattern is deeply ingrained in our tools and the culture surrounding them. Case in point: Kask is written in Go, where traditionally little emphasis has been placed on release management; Applications import external dependencies by referencing their remote Git repository, typically the HEAD of the master branch, with a result that is statically compiled (requiring recompilation to link against any updated dependencies). This –run the latest of everything, and hope for the best– mentality is antithetical to quality software. It makes reproducibility prohibitively difficult, and the complete lack of environmental stability makes tracking defects, (including those impacting security) and their interactions intractable.

Tooling notwithstanding, proper dependency management is difficult and labor intensive. It requires that each node in the dependency graph be released managed, and that compatibility between nodes be established to properly inform the edges. Change of any kind is as likely to introduce new bugs as it is to fix existing ones, and changes that alter existing or introduce new functionality disproportionately so. Sound judgement is required to balance the value of an update with the risks. When changes are made, careful testing is needed to ensure continued compatibility, and flag any new regressions. This is a tremendous amount of work, fortunately, there is an alternative to doing this ourselves.

Debian is a Linux distribution founded in 1993, with a long-standing reputation for quality control. Software that is packaged for Debian has been carefully curated. Packagers ensure that an active and responsive upstream exists, but accept responsibility for the duration of a release if an upstream becomes unwilling or unable to address issues. Care is taken to select the most appropriate version for release, and its transitive dependencies are satisfied by dependent relationships with other packages. Changes to a package during a stable release are made only on an as-needed basis (crippling bugs, security vulnerabilities, etc), and are as minimally invasive as possible. Additionally, PGP encryption is utilized to establish a strong chain of trust between the developers who upload packages, and the machines where they are ultimately installed. It would be difficult to overstate the amount of software life-cycle management work that goes into a distribution like Debian, work we do not have to do if we satisfy our dependencies using packaged software.

TL;DR Kask's code dependencies are sourced entirely from what is available in Debian GNU/Linux (Stretch/9.8 at the time of writing).

Setup
Clone Kask's source code repository. For example: Builds at the Wikimedia Foundation are created using a Docker image generated by Blubber; Utilizing Blubber with Kask's deployment pipeline configuration is the easiest way to create a container for development use. Prebuilt, statically linked binaries for most platforms can be obtained from the Blubber download page.

Blubber outputs a Dockerfile based on Kask's pipeline configuration, and  will create the corresponding image.

Building
The following can be copied to a file ( for example) and invoked as a script to issue commands inside the development container.

Releasing
To release a new version of Kask, simply create an annotated tag, and push it to Gerrit.

Running
The Wikimedia Foundation runs Kask in production using Kubernetes; The easiest way to get the service up and running is to use a Wikimedia Foundation Docker image.

Setup
Since the Foundation's registry does not implement the latest tag, the first step is to browse the list of available image tags and select one appropriate. We'll use  as the tag in the following examples. Once you've selected a Docker image, use  to retrieve a copy locally,   to verify success. It may prove useful to create an alias for the chosen tag, both to have something more descriptive, and to have a stable reference when starting containers. This step is entirely optional though.

Starting a container
The container expects Kask's configuration file to exist as, to accomplish this we'll mount a local directory containing the configuration (as  ) inside the container as. The following assumes  is in the current working directory.