Kask

From MediaWiki.org
Jump to navigation Jump to search

Kask is an opaque key-value data store with a RESTful (HTTP) interface. It utilizes Apache Cassandra for persistence, making it suitable for very large and/or high-volume data sets, and applications requiring geographically aware master-master replication, and high-availability.

Some of its features include:

API[edit]

Operations[edit]

get[edit]

URL /sessions/v1/{key}
Method GET
Params None
Data None
Success Example:
HTTP/1.0 200 OK
Content-Type: application/octet-stream
Content-Length: 27
Date: Mon, 22 Oct 2018 22:07:59 GMT

sample value
Error Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json.
Code Reason Example
400 Bad request
{
    "type": "https://www.mediawiki.org/wiki/errors/bad_request",
    "title": "Bad request",
    "detail": "The request was incorrect or malformed",
    "instance": "/sessions/v1/test_key"
}
401 Not authenticated and/or authorized to write.
{
    "type": "https://www.mediawiki.org/wiki/errors/not_authorized",
    "title": "Not Authorized",
    "detail": "You are not authorized to access this value",
    "instance": "/sessions/v1/test_key"
}
404 Key not found
{
    "type": "https://www.mediawiki.org/wiki/errors/not_found",
    "title": "Not Found",
    "detail": "The value you requested was not found",
    "instance": "/sessions/v1/test_key"
}
500 Internal server error
{
    "type": "https://www.mediawiki.org/wiki/errors/server_error",
    "title": "Internal Server Error",
    "detail": "The server encountered an error with your request",
    "instance": "/sessions/v1/test_key"
}
Example
$ curl http://api.example.org/sessions/v1/test_key
Notes

set[edit]

URL /sessions/v1/{key}
Method POST
Params None
Data The body of the request is the value represented as arbitrary bytes, using a content-type of application/octet-stream.


Example:
sample value
Success Example:
HTTP/1.0 201 CREATED
Content-Type: application/octet-stream
Content-Length: 0
Date: Tue, 23 Oct 2018 19:40:40 GMT
Error Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json.
Code Reason Example
401 Not authenticated and/or authorized to write.
{
    "type": "https://www.mediawiki.org/wiki/errors/not_authorized",
    "title": "Not Authorized",
    "detail": "You are not authorized to access this value",
    "instance": "/sessions/v1/test_key"
}
500 Internal server error
{
    "type": "https://www.mediawiki.org/wiki/errors/server_error",
    "title": "Internal Server Error",
    "detail": "The server encountered an error with your request",
    "instance": "/sessions/v1/test_key"
}
Example
$ curl -X POST -H 'Content-Type: application/octet-stream' -d 'sample value' \
       http://api.example.org/sessions/v1/test_key
Notes This operation assigns a value to key. The return does not differentiate between a request that created a new value, or one that overwrote an existing one.


Values persist until expiring, the result of a TTL dictated by service configuration (read: a single TTL applying to all stored sessions). Values retrieved after expiry result in a 404 (see above).


When operated in a multi-DC environment, a successful return guarantees that subsequent calls to get will return the value assigned from the data-center local to the executing endpoint. Remote data-centers (remote from the endpoint executing the request) are updated with eventual consistency semantics.

delete[edit]

URL /sessions/v1/{key}
Method DELETE
Params None
Data None
Success Example:
HTTP/1.0 204 NO CONTENT
Content-Type: application/octet-stream
Content-Length: 0
Date: Tue, 23 Oct 2018 19:40:40 GMT
Error Errors are JSON objects conforming to RFC7807 (Problem Details for HTTP APIs) with a content-type of application/problem+json.
Code Reason Example
401 Not authenticated and/or authorized to write.
{
    "type": "https://www.mediawiki.org/wiki/errors/not_authorized",
    "title": "Not Authorized",
    "detail": "You are not authorized to access this value",
    "instance": "/sessions/v1/test_key"
}
500 Internal server error
{
    "type": "https://www.mediawiki.org/wiki/errors/server_error",
    "title": "Internal Server Error",
    "detail": "The server encountered an error with your request",
    "instance": "/sessions/v1/test_key"
}
Example
$ curl -X DELETE http://api.example.org/sessions/v1/test_key
Notes This operation deletes the value associated with key, if it exists. The return status does not distinguish between a value that was not present at the time of delete (a no-op), and those where a value was successfully deleted.

When operated in a multi-DC environment, a successful return guarantees that subsequent GET requests will not return a value for key in any data-center (404 is returned). If a failure occurs, the disposition of the value is unknown (any or none of the values may have been deleted); Failures should be handled accordingly.

Developing[edit]

Dependency Management[edit]

The libraries an application depends on are as much a part of the final product as the code we write ourselves, and yet it is all too common for us to choose them indiscriminately, retrieve them via untrusted sources, and treat them (and the entire graph of transitive dependencies) as black-boxes. Often this pattern is deeply ingrained in our tools and the culture surrounding them. Case in point: Kask is written in Go, where traditionally little emphasis has been placed on release management; Applications import external dependencies by referencing their remote Git repository, typically the HEAD of the master branch, with a result that is statically compiled (requiring recompilation to link against any updated dependencies). This –run the latest of everything, and hope for the best– mentality is antithetical to quality software. It makes reproducibility prohibitively difficult, and the complete lack of environmental stability makes tracking defects, (including those impacting security) and their interactions intractable.

Tooling notwithstanding, proper dependency management is difficult and labor intensive. It requires that each node in the dependency graph be released managed, and that compatibility between nodes be established to properly inform the edges. Change of any kind is as likely to introduce new bugs as it is to fix existing ones, and changes that alter existing or introduce new functionality disproportionately so. Sound judgement is required to balance the value of an update with the risks. When changes are made, careful testing is needed to ensure continued compatibility, and flag any new regressions. This is a tremendous amount of work, fortunately, there is an alternative to doing this ourselves.

Debian is a Linux distribution founded in 1993, with a long-standing reputation for quality control. Software that is packaged for Debian has been carefully curated. Packagers ensure that an active and responsive upstream exists, but accept responsibility for the duration of a release if an upstream becomes unwilling or unable to address issues. Care is taken to select the most appropriate version for release, and its transitive dependencies are satisfied by dependent relationships with other packages. Changes to a package during a stable release are made only on an as-needed basis (crippling bugs, security vulnerabilities, etc), and are as minimally invasive as possible. Additionally, PGP encryption is utilized to establish a strong chain of trust between the developers who upload packages, and the machines where they are ultimately installed. It would be difficult to overstate the amount of software life-cycle management work that goes into a distribution like Debian, work we do not have to do if we satisfy our dependencies using packaged software.

TL;DR Kask's code dependencies are sourced entirely from what is available in Debian GNU/Linux (Stretch/9.8 at the time of writing).

Setup[edit]

Clone Kask's source code repository. For example:

$ git clone https://gerrit.wikimedia.org/r/admin/projects/mediawiki/services/kask && cd kask

Builds at the Wikimedia Foundation are created using a Docker image generated by Blubber; Utilizing Blubber with Kask's deployment pipeline configuration is the easiest way to create a container for development use. Prebuilt, statically linked binaries for most platforms can be obtained from the Blubber download page. Blubber outputs a Dockerfile based on Kask's pipeline configuration, and docker build will create the corresponding image.

$ blubber .pipline/blubber.yaml build | docker build --tag kask-dev -
...
$ docker images
REPOSITORY                              TAG          IMAGE ID            CREATED              SIZE
kask-dev                                latest       5dec96f34114        About a minute ago   875MB
docker-registry.wikimedia.org/golang    1.11.5-1     58a912d05a49        2 months ago         344MB
Documentation of Git is beyond the scope of this guide, please visit https://git-scm.com/doc for detailed instructions.
Complete documentation of Blubber is beyond the scope of this guide, see Blubber for more information.
Documenting Docker is beyond the scope of this guide, for more information visit https://docs.docker.com for complete usage information.

Building[edit]

The following can be copied to a file (buildenv.sh for example) and invoked as a script to issue commands inside the development container.

#!/bin/sh

# Wraps docker-run to issue commands inside the development container.
# Usage: buildenv.sh make
# Usage: buildenv.sh make unit-test
# Usage: buildenv.sh ./kask --config config.test.yaml

set -e

docker run \
    --rm \
    --name kask-dev \
    -u $UID \                      # Avoid permissions issues; Use your UID inside container
    -e GOPATH=/usr/share/gocode \  # Use Debian-based dependencies installed inside container
    -e GOCACHE=/tmp/gocache \      # See https://github.com/golang/go/issues/26280
    -v "$(pwd)":/kask \
    -w /kask \
    kask-dev \
    "$@"

Releasing[edit]

To release a new version of Kask, simply create an annotated tag, and push it to Gerrit.

user@host ~$ VERSION="v1.0.0"
user@host ~$ git tag -am "$VERSION release" $VERSION master
user@host ~$ git push gerrit $VERSION
Refer to the semver guidelines in selecting a version

Running[edit]

The Wikimedia Foundation runs Kask in production using Kubernetes; The easiest way to get the service up and running is to use a Wikimedia Foundation Docker image.

Setup[edit]

Since the Foundation's registry does not implement the latest tag, the first step is to browse the list of available image tags and select one appropriate. We'll use 2019-05-10-162420-production as the tag in the following examples. Once you've selected a Docker image, use docker pull to retrieve a copy locally, docker images to verify success.

$ IMAGE_TAG=2019-05-10-162420-production
$ docker pull docker-registry.wikimedia.org/wikimedia/mediawiki-services-kask:$IMAGE_TAG
2019-05-10-162420-production: Pulling from wikimedia/mediawiki-services-kask
5c86276767f3: Pull complete 
a413e562d2b8: Pull complete 
648d537effeb: Pull complete 
864e75c6ef22: Pull complete 
49e16a850e4b: Pull complete 
Digest: sha256:2d7f3118b6e091233e62760f37e44e590fd986f227516a1b54689507b433a41b
Status: Downloaded newer image for docker-registry.wikimedia.org/wikimedia/mediawiki-services-kask:2019-05-10-162420-production
$ docker images
REPOSITORY                                            TAG                           IMAGE ID      CREATED      SIZE
docker-registry.wikimedia.../mediawiki-services-kask  2019-05-10-162420-production  73515b6aa2d3  13 days ago  110MB

It may prove useful to create an alias for the chosen tag, both to have something more descriptive, and to have a stable reference when starting containers. This step is entirely optional though.

$ docker tag docker-registry.wikimedia.org/wikimedia/mediawiki-services-kask:$IMAGE_TAG kask
$ docker images
REPOSITORY                                            TAG                           IMAGE ID      CREATED      SIZE
docker-registry.wikimedia.../mediawiki-services-kask  2019-05-10-162420-production  73515b6aa2d3  13 days ago  110MB
kask                                                  latest                        73515b6aa2d3  13 days ago  110MB

Starting a container[edit]

The container expects Kask's configuration file to exist as /etc/mediawiki-services-kask/config.yaml, to accomplish this we'll mount a local directory containing the configuration (as config.yaml) inside the container as /etc/mediawiki-services-kask. The following assumes config.yaml is in the current working directory.

$ docker run --rm --name kask -v "$(pwd)":/etc/mediawiki-services-kask kask