Requests for comment/Unit testing

RobLa, 15 June braindump about our current testing frameworks and future.

What metaphor? not housecleaning or proofreading restructuring but gradually more like a blob gradually coming along and correcting things sciencefictionally Flylady 5 minutes a day

It's not just that things are a mess. Clutter is annoying. We sometimes make a distinction between clutter & mess mess is bad, easy to see how it's bad. [what's mess?] clutter is also bad. Long-term, we end up trying to sprint through molasses. Scaring off new contributors & slowing down experienced contributors.

Let me tell you: why MediaWiki needs even better unit testing what we're doing now what we need to do next

Why MediaWiki needs even better unit testing
I naively used to think that testing software just meant looking at an installation and trying to break it and filing bug reports, maybe in a mechanized way. And that's a useful kind of testing -- it's "integration testing" (testing the integrated whole as it operates together). But there's also "unit testing," and that's what many MediaWiki contributors want to focus on right now.

To precisely and quickly test only one function at a time, for each function, there is one or more tests. And instead of mucking around trying to create realistic input that tests every edge case, each test uses a very precisely defined decoy object called a mock. Mock objects are stand-ins for all the code that the test isn't checking; each mock thus isolates the function. For example, if you want to test how an application deals with database failures, you create a mock database, test the database access code with those mocks, and can then test against a mock database that's programmed to fail at appropriate times.

If the mock is right, then every time an automated test suite runs the unit test, the test will thoroughly exercise all the function's code paths, and we'll know if there's a failure.

An introduction to unit testing.

So you can see that unit tests are a valuable form of automation (and automated regression testing). You instantly see not just that something's broken but what's broken. When someone breaks the unit test, there is a specific function they know they need to go investigate.

(And to be honest, unit tests make it easy to encourage people to write quality code. If a test fails, either the function needs fixing or the test is wrong and needs fixing.  And either way, it's just one small, well-defined thing, so it's ridiculously specific and easy-to-act-on feedback.)

In contrast, automated integration testing is more brittle. Selenium is an automated tests framework suited to integration testing: you start Selenium and it fires up a browser, performs scripted actions as a user would, and checks the actual output against the desired output. But that means that we have to define the desired output in a way Selenium can programmatically test, and so the tests break when we change things that actual users wouldn't care about.

For example, a test might define success as "this div should have the value 1", and then the test'll fail if we changed the skin so it's now a span instead of a div. And even if a failed test is a legitimate signal of a problem, we have to start from scratch investigating what to fix and how.

So unit tests, done properly and running quickly and consistently, should be our first line of defense in baking quality into MediaWiki.

This is why we're prioritizing unit testing frameworks over integration testing frameworks.

What we're doing right now
To quickly summarize the automated *integration* testing we have: we're running Selenium. Also, to use Selenium on a zillion virtual machines, each with a different environment, we use Selenium Grid.

[where can I see the output of those tests? how often are they run? on request???]

But most contributors ignore Selenium; they don't find its output useful. And Selenium Grid is an ops headache; sometimes a Selenium Grid test failure actually indicates a Grid or virtual machine problem.

We do want to do automated integration testing on zillions of environments, though, which is where Swarm comes in (thanks, Krinkle!). [link to Berlin presentation???] With Swarm we can crowdsource the grid. The Wikimedia Foundation machine that doles out testing jobs is robust. People opt in with their browsers; this decouples independent browsers from that central machine. Maybe we'll maintain a few VMs running IE6 or the like, what no one wants to run themselves.

misleadingly named QUnit, Javascript-based testing framework; QUnit seems to have the added benefit of being able to do much more than just test your Javascript ??? find out more QUnit & Swarm are related ... TestSwarm depends on you writing tests in QUnit ?? how tightly coupled are these? worrying to see "unit" - unit testing framework? would need to get people to separate unit & integration tests

Back to unit testing: the important parts here are Cruise Control, PHPUnit, and parser tests.

What's supposed to happen:

anytime you develop a new piece of code, supposed to have unit tests for it. platonic ideal supposed to happen.

CruiseControl, as of recently, set up to actually run PHPUnit automatedly post-commit hook spits into bot in IRC channel if something fails

every time someone makes a commit? unclear! may be that it queues it up, so if it is already queued, it will only run once, even though there are 3 or 4 commits in a row. That has confusing consequences for the yelling bot. right now, possible to get into situation: there are 5 commits, one of which broke it, not sure which. not a full sense of ownership & shame. improved, but need to improve to be truly effective.

We're currently using several different tools as

Selenium Selenium Grid Cruise Control QUnit Swarm PHPUnit & parsertests

Several of these things are divided into old school & newschool.

Jasmine is old school, PHPUnit is new school

QUnit might end up replacing Selenium & Jasmine

Selenium Grid is old, Test Swarm is new school

Currently:

Additionally: unofficial ad hoc usage of QUnit & Test Swarm

we're not using Jasmine anywhere I think?

parsertests: have been around forever homegrown set of tests - funky markup & expected HTML output were integrated into PHPUnit tests???? so, running as a postcommit hook?

What we should do next
BIG PROBLEM:

unit test coverage is terrible, 2% tops

Our codebase is so spaghetti & global-ridden that writing proper unit tests will be very difficult until we do some refactoring, which will be very time-consuming

this is the bulk of our technical debt that is keeping us from achieving velocity

PHPUnit has some tools & tricks built in to deal with legacy codebase issue spawn off a ... create a copy of global namespace, mod it up, return from it & you're done prob with this approach: doing global thing - have so many globals that that call takes like 2 min

RobLa only wants to focus on the testing that is amenable to postcommit hooks. we already have the infrastructure, and it'll get done.

do high-yield stuff first

Me: temperately suggest that volunteers write unit tests along with their code offputting to see all the tech debt! unflattering, put people off participating but good to get people to get into the habit of thinking about it now
 * look into our dox for how to write unit tests

Chad recently committed global config object to trunk? way forward to getting us out of global hell this project does not have project page.

redlink off WMF GenEng page recruit volunteers to work on this.
 * Me: get braindump from Chad & put it on project page


 * Me: write this up, run it past Mark & Hashar for completeness?

Our codebase is so spaghetti & global-ridden that writing proper unit tests will be very difficult until we do some refactoring, which will be very time-consuming.

this is the bulk of our technical debt that is keeping us from achieving velocity

PHPUnit has some tools & tricks built in to deal with legacy codebase issue spawn off a ... create a copy of global namespace, mod it up, return from it & you're done prob with this approach: doing global thing - have so many globals that that call takes like 2 min

RobLa only wants to focus on the testing that is amenable to postcommit hooks. we already have the infrastructure, and it'll get done.

writing these tests and mocks correctly is important to making good unit tests & getting what we want out of unit testing as a practice.