Core Platform Team/Decisions Architecture Research Documentation/Using our Integration Testing Framework for Monitoring

From mediawiki.org

Monitoring of production web services is currently accomplished with Service-Checker, which reads test data from an OpenAPI extension. This works well but has a number of drawbacks.

For example: OpenAPI specifications are a machine-readable definition of a service's interface, information that moves in lock-step with the implementing code. However, the extension used to define tests often requires configuration on a case-by-case basis. This mixing of concerns is problematic and leads to one or both of duplication, fragmentation of the config, or worse encourages an even tighter coupling by using the specification as application configuration (see: RESTBase for an example of the latter).

Additionally, Service-Checker makes a hard-dependency of OpenAPI specifications, and as a solution, provides no answer for non-RESTful HTTP interfaces (MediaWiki's Action API for example).

Finally, as we begin standardizing integration testing around our Integration Testing Framework we want to minimize duplication (DRY) in regards to monitoring.  We realize monitoring tests are in fact integration tests and as an initial step, we want to investigate whether or not the Integration Testing Framework will be sufficient to provide monitoring for instances of the Kask service.

Investigation[edit]

We begin our investigation by comparing the current monitoring tool, Service-Checker, to our Integration Testing Framework against a set of minimum requirements for web service monitoring.

Requirements Service-Checker Integration Testing Framework

(Mocha/SuperTest/Chai)

Must be configurable for use by

various deployment setups

Service-Checker is configurable by passing path arguments such as the base url, timeout, and the test specification url Mocha provides a lot of configuration options that can be passed through the command line or a configuration file. Additional configuration such as the base url can be passed as environment variables to tests
Must be able to discover tests Tests are found on thex-amplessections of the paths part of the OpenAPI specification By default, Mocha looks for tests under the test/ directory but it can be instructed to look under any other directory or even specific file(s) to discover tests
Should be able to run tests locally and with the CI Once installed Service-Checker can be run locally and in the CI post-merge Mocha tests can be run locally and with the CI at various stages of the pipeline.
Must support HTTP requests Supports GET and POST requests and the use of headers, query, params, and body. Uses Url template interpolation which supports simple, optional and multiple parameter substitutions Supports GET, POST, PUT, DELETE and other methods supported by Node and the use of headers, query, params, body, TLS options, and multipart
Must validate HTTP responses Supports assertion of response status, headers, and body. Body data is assumed to be either JSON or plain text and can be matched exactly or with a regexp SupertTest and the Chai assertion library provide a lot of options and flexibility for validating any part of the response

As shown above, our testing framework meets the requirements for monitoring and even excels in certain aspects such as HTTP requests. Our testing framework supports sending all HTTP methods meanwhile Service-Checker only supports GET and POST requests. Also because Service-Checker uses URL template interpolation it limits the kind of substitutions that can occur while our testing framework allows for any kind of URL to be used.

On the other hand, because our testing framework provides a lot of flexibility it creates a greater opportunity for abuse. Thus we suggest all integration tests, whether being used for monitoring or not, should follow the same patterns and guidelines set in the Integration Testing Framework repo.

Proposed approach[edit]

We are not proposing a change in overall functionality but rather a change in test runner from Service-Checker to our Integration Testing Framework. Currently, to use Service-Checker post-merge and in production, we pass the base URL, spec URL (location of tests), and the timeout (number of seconds to wait for each request). To utilize our framework in the same sense we propose adding integration tests to each service’s repository, limiting dependencies to those currently used in our framework where appropriate, and ensuring all dependencies are installed when deploying the service. Then we would simply need a script to set the base URL as an environment variable and invoke Mocha with the location of the tests (file(s) or directory) and the timeout.

As for the tests, they should be written with monitoring in mind. That is to say, they shouldn’t exceed the timeout specified for each request (5 seconds) and the entire service check (typically 60 seconds for Nagios-like systems). Also, up to this point, all integration tests have been written using the Chai assertion library, monitoring tests should follow the same pattern and limit the use of other assertion libraries where appropriate.

Finally, we’d suggest using both the Service-Checker and our integration testing framework for a set of weeks to gather live information and adjust guidelines and implementation details as necessary.