Core Platform Team/Decisions Architecture Research Documentation/Using SuperTest Rather Than Creating an API Integration Test Runner

Decision
The core platform team will use SuperTest together with Mocha to build end-to-end integration tests for MediaWiki's action API. We intent to use SuperTest for all ent-to-end tests of APIs, both existing and new.

Rationale
SuperTest meets all our immediate needs for testing MediaWiki's action API out of the box, and is flexible enough to allow us to add any capabilities we may want in the future. While building our own test runner (Phester) would have produced a system that fits our functional needs exactly, it would have been risky and would have required quite a bit of up-front investment of resources. Also, the declarative approach of Phester would have been a disadvantage for some non-functional needs, like IDE integration.

Below is a list of the relevant requirements and criteria, with an assessment of how well SuperTest meets them, and how that compares to Phester.


 * Support/community: SuperTest is pretty widely used; the SuperAgent library it is based on has been established for many years, the Mocha framework is the standard in the Node.js ecosystem. Phester would have been a home grown solution.
 * IDE integrations: Since SuperTest is based on Node.js, it benefits from the full feature set of JavaScript enabled IDEs. Phester would have had very limited integration beyond syntax highlighting for YAML.
 * Debugging: SuperTest tests can be debugged using a JavaScript debugger. Phester tests would have had no debugging capability.
 * Running tests locally: SuperTest runs on Node.js, Phester would have been implemented in PHP. Neither is great for people who don't use the respective language. Nither is a big problem though, especially since the plan is to containerize the test environment.
 * Running tests in CI: WMF is already running Mocha tests via Jenkins, to adding API tests is straight forward. Running Phester in Jenkins wouldn't have been hard either. Phester could have been integrated directly with PhpUnit, which would have allowed it to be used without any need for Jenkins to know about it.
 * Tests with multiple users/agents: Critical MediaWiki user stories often involve more than one user. SuperTest makes it easy to create one agent for each user, log in each agent, and test their interaction. Phester was supposed to have this ability built in as well, but implementing it would have taken some effort. Most alternative frameworks we investigated do not make this kind of thing easy, and several do not support this at all.
 * Parallel test execution: not seamless with SuperTest, but doable; JavaScript is asynchronous by nature, and there is tooling for running Mocha tests in parallel. Phester would have made parallel execution work transparently for test authors, but since PHP does not support threads, this would have been implemented using fork, which is rather heavy weight and somewhat tricky.
 * Building abstractions: Since SuperTest and Mocha are both do not impose any restrictions on what tests can do, it is easy to create abstractions specific to the target system (e.g. "login", "edit", etc.) that makes writing tests easier, and reading tests more clear. Phester was planned to have support for re-usable "resources" and request "prototypes", but the spec for these things was getting complicated, and started to feel like we were building a programming language on top of YAML. Frameworks like behat and codeception encourage the creation of such abstractions, but the way this is done is rather idiosyncratic to the framework.

Disadvantages considered, but outweighed by the advantages above:


 * Lock-in: SuperTest is just JavaScript. Porting to another system or even language would need a lot of manual work. A purely declarative approach based on YAML would have made it trivial to create a script that translates to some other format or language. We may still add support for a limited YAML based format (like X-Amples) to SuperTest.
 * Discipline: With SuperTest, keeping tests clean and readable can only be achieved with discipline. Phester would be nature enforce a clean structure of the tests, and prevent any "cheating".
 * Readability: While SuperTest allows abstractions to be created that make tests more expressive and concise, it comes with the syntactic clutter imposed by JavaScript - especially since we have to manage asynchronous HTTP request throughout all testing code, making it necessary to constantly deal with Promises using async/await or then-chains. Phester tests are rather verbose since they are, by design, stuck on the level of HTTP requests. But they don't have any syntactic clutter, they are "all meat".
 * Fixtures: the concept of global fixtures is not native to Mocha or SuperTest, but it's possible to add it without too much pain. We'll have to be careful of how they behave with parallel testing, and be disciplined about not polluting them. Phester would have had the concept of global fixtures baked in, making them straight forward to create and use.

Background
While planning the decoupling initiative, we identified the need for better test coverage to reduce fallout and improve confidence while refactoring. While having full unit test coverage would be ideal for the purpose, it seems impractical: writing tests for the old code is hard exactly for the reasons we want to refactor it, namely, tight coupling and poor modularity. Also, we would be writing unit tests for code we plan to deprecate and eventually remove.

So instead of unit tests, we decided to create a comprehensive suite of end-to-end integration tests for MediaWiki's action API. This should exercise all relevant code paths and enable us to make sure all relevant use cases and workflows remain functional. For this purpose, we started the Add API integration test initiative. The goal of that initiative is not just to provide a suite of tests for the action API, but also to establish tools, methods, and best practices that can be used for other initiatives like Core REST API, Remove storage from RESTBase, and Enable Multi-DC Session Storage.

An initial exploration of possible approaches to end-to-end testing for APIs led to the idea of generalizing the X-Amples mechanism currently in use with RESTbase. X-Amples follows a declarative approach, where HTTP requests and expectations about the response are encoded in YAML. This seemed straight forward, a good fit for JSON based HTTP APIs, and had the advantage of being language neutral. The fact that there exist a number of testing frameworks that use a similar approach (e.g. tavern, strest, and dredd) seemed to validate the approach, though the specifics of their respective feature sets proved incompatible with our requirements.

As a result of this exploration, we started to plan out the implementation of our own test runner, dubbed Phester. We built a very limited MVP to validate the idea, and at the same time kept exploring alternatives, including ones that didn't follow the YAML based approach. The strongest contenders seemed to be the PHP based behat and codeception testing frameworks, which are very powerful and flexible. They however both impose a very distinct approach to test implementation, that is rather different from generally working with PHP, and would need quite a bit of work for our use case of testing APIs.

After collecting feedback from outside the team, specifically from the Code Health Group, gathering experiences with writing dummy tests against the proposed spec for Phester, and re-evaluating out assumptions and priorities based on this, we extended our search for alternatives, with a stronger focus on existing tooling, IDE integrations, and extensibility. That search turned up SuperTest, which quickly convinced us.