Requests for comment/Unit testing
||This proposal has been processed and should no longer be updated. The following pages are the (in)direct result of this RFC:|
This document explains:
- why MediaWiki needs even better unit testing,
- what we're doing now
- and what we need to do next.
This is also a request for comments & commitments to make our testing environment better.
Why MediaWiki needs even better unit testing 
There are lots of approaches to software testing, and "integration testing" (testing the integrated whole as it operates together) is one of them. But we've found that automated integration testing of MediaWiki is less useful, and more likely to break. That's why many MediaWiki contributors want to focus on "unit testing" right now.
Unit testing 
In unit testing, you precisely and quickly test one isolated part a time (a unit). For each unit, there is one or more tests. And, instead of mucking around trying to create realistic input that tests every edge case, each test uses a very precisely defined decoy object called a mock. Mock objects are stand-ins for all the code that the test isn't checking; each mock thus isolates the function. For example, if you want to test how an application deals with database failures, you create a mock database, test the database access code with those mocks, and can then test against a mock database that's programmed to fail at appropriate times.
If the mock is right, then every time an automated test suite runs the unit test, the test will thoroughly exercise all the function's code paths, and we'll know if there's a failure.
So you can see that unit tests are a valuable form of automation (and automated regression testing). You instantly see not just that something's broken but what's broken. When someone breaks the unit test, there is a specific function they know they need to go investigate. When some developers want to refactor code, improve its performance or add a new feature, the automatic regression testing is a huge time saver.
(And to be honest, unit tests make it easy to encourage people to write quality code. It forces programmers to produce self-contained chunks of code you are able to test - otherwise you can't write a test - and at the same time reasonably encapsulate code to reuse it. If a test fails, either the function needs fixing or the test is wrong and needs fixing. And either way, it's just one small, well-defined thing, so it's ridiculously specific and easy-to-act-on feedback.)
The future 
Unit testing highlights breaking code and makes it easier for developers to find what they broke. Unit tests, done properly and running quickly and consistently, should be our first line of defense in baking quality into MediaWiki. This is why we're prioritizing unit testing frameworks and tools.
What we're doing right now 
- for PHP: PHPUnit
We also have an homemade integration system for the Parser code (parsertests).
But using TestSwarm doesn't rule out using virtual machines. Ashar notes that we can reproduce the Grid's advantages by running browsers in VMs and pointing their homepages to the TestSwarm. We will probably (if needed) maintain a few VMs running with old versions of browsers like IE or Safari. A solution remains to be found to test the various mobile browsers.
PHP: PHPUnit, CruiseControl, and parser tests 
For example, we have parser tests, a homegrown set of tests that have been around for a very long time. They use strange edge-case markup as test case inputs and confirm that the parser parses them into the expected HTML outputs. These are now integrated into our PHPUnit tests and thus run every time CruiseControl runs.
Originally, Hexmode set up CruiseControl in mid-2010. By late 2010, people weren't paying attention to it, and it was no longer functioning. As of July 2011, Chad, Ashar and others have repaired CruiseControl. It currently fetches MediaWiki code every 2 minutes and runs PHPUnit automatically against the new code. If there are any failures, CC saves them as a text file. A bot then fetches it and announces the result in the #mediawiki IRC channel, spitting an error if something fails.
Because it runs on a schedule, rather than as a post-commit hook, it will only run once for a set of commits, even though there are 3 or 4 commits in a row that it's testing. That has confusing consequences for the yelling bot. As of july 2011, it's possible to get into a situation where there are 2 or 5 or more commits, one of which broke a test, but we're not sure which. This means the test breakage leads to an incomplete sense of ownership & shame. It's good, but we need to improve to be truly effective.
Moreover, if a revision fails, every subsequent run will be marked as a failure. That makes it hard to diagnose the root cause (although, with a test and start and end points, you can bisect the issue).
What we should do next 
What's supposed to happen:
Anytime you develop a new piece of code, it would be great if you also supplied unit tests for it.
Our unit test coverage is terrible, 2% at most.
But our codebase is so spaghetti & global-ridden that writing proper unit tests will be very difficult until we do some refactoring, which will be very time-consuming. Our codebase is cluttered and messy. We are scaring off new contributors & slowing down experienced contributors. And we see now that the codebase's messiness is also stopping us from taking steps towards automating quality into future releases.
This is the bulk of our technical debt that is keeping us from achieving velocity.
PHPUnit has some tools & tricks built in to deal with legacy codebase issues. For example, you can spawn off a copy of global namespace, modify it up, and return from it. The problem with this approach is that we have so many globals that the call takes about 2 minutes, which is unacceptable for performance for a post-commit hook.
We want to focus on the testing that is amenable to postcommit hooks. For example, Ashar suggests we could set up a sanity-check group of fast tests for a post-commit hook, and only run the slower tests locally or by CruiseControl.
We already have the infrastructure (CruiseControl and PHPUnit), and the interest and skill to get it done. Let's do high-yield stuff first.
Request for comments & commitments 
What are the next steps?
Better code coverage would be excellent. One suggestion: "70% should probably be our aim, with critical code (such as validating user input) being at 100% and carefully reviewed to make sure we actually cover 100%."
So: remember to try to write testable code and start considering writing tests alongside code you contribute. Writing these tests and mocks correctly, and running them quickly and consistently, is important to making good unit tests & getting what we want out of unit testing as a practice.
If you're getting rid of unnecessary globals when you see them, great -- keep doing that. And Chad's global config object project is a strong step towards getting us out of global hell so we can have proper unit testing in the future, so please comment on his request for comments.
Statistics might also help us. Ashar says: "It would be great to have statistics for most changed functions between versions, or over 1, 3 or 6 months periods. That can help us identify the code which is stable enough and the parts that keep breaking/changing."
This document was originally written by Sumana with material by RobLa, Chad, Krinkle, and Hashar.