Wikimedia Release Engineering Team/Book club/Software Testing Anti-Patterns

October 2019

This time we picked the blog post "Software Testing Anti-patterns" (http://blog.codepipes.com/testing/software-testing-antipatterns.html), published in April, 2018, by Kostis Kapelonis on the CodePipes blog.

This is a summary of our discussion of the article.

Overall comments:

E: Concise, feels very familiar from talking to Ž.
Ž: also feels very familiar, I can't remember disagreeing with anything in it.
L: I like the article, a couple of places where I object, nothing major -- I would recommend it.

Antipatterns 1-3: unit without integration, integration without unit, wrong kind of tests:

Ž: Most problems are teams not having a clear testing strategy.
- Testing Strategy: how do we test what is important? All of these antipatterns are rooted in this.
- I think we have all three of these antipatterns in the selenium tests
T: Examples, but not rationale.
Ž: All these antipatterns might be solved by thinking seriously as a team.
JH: Did talk about 3 layers of code - is that different than a strategy?
Ž: Did mention testing strategy through the article, and you could draw conclusions, but I have a feeling that a short section emphasizing explicit testing strategy would help.
E: I found it interesting that UI tests were not mentioned very much. Business logic is everywhere in a payment gateway it seems to me, so it's strange to me that there are no unit tests. It seems to me that it should always be 3 layers
Ž: the author did mention that UI tests were not going to be mentioned. UI tests might complicate this article a lot without adding a lot of value.
JR: this article leans towards testing form a developer perspective. There's a big overall antipattern re:scoping: As soon as an org hires dedicated test engineers that the rest of the org stops testing. The theme of this post is testing from the developer persepctive which leaves out a big chunk of testing that's happening in the org today
L: my main criticism is that it doesn't mention manual or exploratory testing at all.
Ž: actually it doesn't. The scoping that the author did.

Antipattern(s) 4-n: testing the wrong functionality:

B: my favorite point was the TDD as religion point. I'm happy to see that called out as a bad idea.
E: I have seen TDD not working in practice.
L: I disagree.
JR: TDD as a testing thing vs a design thing. The reason things are being done is important. Unit testing is not soly for defining correctness, but as a means to help you design.
Ž: Unit tests tell you that you're building the right thing, integration tests test that you built the thing right

Flaky tests:

JH: If there are connection issues: is that the fault of the test?
Ž: If your tests are flaky then you should fix your environment.
JH: If your env is hard to setup, fix your env, this is one of my favorite points.
E: Flow does not work in beta labs, you cannot post new topic.
JR: There's a task about this, disabling selenium as voting which I'm against. Do we have this problem with unit tests?
Ž: I don't have experience with flaky tests wrt unit tests. We have a speed problem with selenium tests. Timo mentioned that he optimized unit tests using a browser.
T: There have to be tools for finding flaky tests.
Ž: There's a Jenkins plugin to show tests and how long they're failing... I don't think there's a good report, but might be a plugin or something.

Code coverage:

L rant: Code coverage outside of people writing unit tests and measuring coverage is pointless.

Ž: antipattern 6 is an eye opener. I was pleasantly suprised about the author's metrics: those are pretty good numbers. Numbers like that are a better representation of our code health.
L: #6 I had disagreements. I've had the expereince of getting to 100% coverage easily. I cheated by using TDD and it does not count the parts of the code that I have explicitly excluded.
E: If it's easily achievable then why not? If it requires too much effort OK; but if it's easily achievable why not?
JH: It could take longer to run.
E: I'm against all tests being blocking all the time.
JR: I think one of the challeneges of having all our tests run all the time diminishes some of the feedback to developers. Looking at test from the abstraction of coverage is probably not the best way to make a decision about, say, deployment. Avoid "thud" factor; i.e., run all the tests all the time and being proud of that. Coverage can hide critical issues. Striving for 100% coverage then it may challenge you to make changes. This is important for the individual, but not the organization.

Converting production bugs to tests:

B: Can anyone give me a percentage of the time?
E: Unit tests are there usually.
Ž: 0% -- what prompted this is an incident -- swedish wikipedia didn't have their database set to utf-8 caused some mojibake. There was a selenium test, but we missed this because it's a configuration option, but none of our test environments had this setup. There was an incident report. Selenium tests just used ascii, and we updated that.

Test code as 2nd-class citizen

Ž: Test code should be as good as production code; however, sometimes folks may not have enough experience to make test code as good as production code. For testers, it's not their primary task (writing code), so they need support: code review, pairing. People get really dogmatic about DRY, but sometimes that's not as useful in test code. More verbose test code is better. (JR +1)
L: agree, re:DRY. I've seen test code that is DRY/SOLID/KISS/etc that was hard to understand. The #1 responsibility of test code is understandability. If abstractions help with that, excellent, but if you need a test suite for your test suite that's bad.
E: Cucumber should clarify things.
Ž: There are layers of implementation there. Cucumber made it hard to find what was failing.

E: Running tests manually && convert production bugs into tests contradict each other? This may inflate testsuite a lot.
Ž: A bug in production may be tested with a simple test.

Automating running of tests

L: I find it necessary that developers should be able to run the tests manually. If developers need to push things to CI to see if it works, then that will increase the cycle-time up to even a few hours (up from a few seconds). We've noted a few times that we don't have the data.

Additionally, L wrote the following notes, but not all of them were brought up in the discussion.

I like that this is not specific to a language, framework, or other tool.
I'm not entirely sure the testing pyramid is sufficient, but it's OK for what it's meant to convey. I need to ponder on this further, as I don't have any concrete objections, just a feeling.
I note that the article doesn't deal with acceptance tests, or capacity tests, at all. That's fine, but should not be forgotten.
I'm not entirely happy with the article's definition of unit for unit testing. I'd rather say a unit is any part of the software that is tested in isolation from the rest of the software, or the rest of the universe. This is, however, not really relevant for discussing the article.
The article doesn't discuss how the programming language and its tooling affects tests. As a recent convert to Rust, and a former dabbler in Haskell, and a long-time user of Python, I think this warrants a discussion. My own experience is that a strong, static type system helps avoid a need to unit test in quite the same detail as I've needed to do in dynamic, semi-soft typing in Python.
AP1: test environments, and production environments, must be easy to set up, in any case. So many things go wrong if they aren't. It's not even utopistic to require this, these days.
AP2: Here's my thinking: unit tests test small and possibly larger components, and can do so in very fine detail. The unit can and should be designed to easy to test well. Unit tests are like testing each ball bearing, bolt, nut, and wire, which goes into building a car or a bicycle. Unit tests are thorough and make sure the unit fits its requirements, within the tolerance limits specified. When it's comes time to assemble the car or bicycle, it's no longer necessary to test each part separately, and any testing can concentrate that the vehicle as a whole works. Acceptance testing is additionally needed to make sure the vehicle is the kind of vehicle that is wanted: a bicycle, and not a car.
AP3: Number of tests is pretty much meaningless. What matters is that the tests show that all the things you want the software to do, work.
AP4: It seems to me that we are in the position of having a large legacy application with less than acceptable automated testing. Getting to the point where we have strong confidence in our automated test suite is going to take a lot of effort and time. The suggestions in the article seem OK as pointers for how we can get there. What are our critical, core, and other code parts? Apart from MW, we have other software we need to care for. Some of this comes from elsewhere, but we're still responsible for keeping it running. More legacy. For any new software that we write ourselves, I think we should aim at having automated test suites we have very strong confidence in.
AP5: TL;DR: test expected behaviour of the unit, not its implementation. Tests should use the unit as if it was used on other code.
AP6: Code coverage is easy, but almost entirely meaningless. Test coverage should be measured of how well the software fulfills its requirements, acceptance criteria, and use cases. That's much harder to do, since it requires writing all those things down first, and very few software projects do that. I don't really agree with the claim that it's hard to get to complete code coverage during tests, but I cheat. I wrote many years ago a Python unit test runner, which measures code coverage for a code module only when running that module's unit tests. It fails the test suite if all lines are not executed during those unit tests, excluding - and this is important - those lines that are specially marked as excluded. This, combined with TDD and carefully designing classes that are easy to test, turned out to make it quite easy to reach 100% coverage.
AP7: I entirely agree that tests should be reliable and fast. Where do we fail with that?
AP8: I agree that automation (CI) must run tests for every commit, or otherwise frequently. I would like to add that developers must also be running at least some tests manually, so they know that what they push to CI is likely to work, because otherwise the edit-test-debug loop becomes far too long, which kills productivity. Creating a test instance of software for each change request is on the cards for new CI.
AP9: I find that the most important requirement for test code is that it's obviously correct, and that clarity is more important than almost any other aspect. It's better for test code to be repetitive than to use abstractions that obscure what's happening in the test, by requiring the reader to hold too much context in their mind at once. Of course, sometimes test code is clearer, more obvious, with the abstractions. Nobody said programming is easy. Also, remember to avoid accidental abstractions, where two parts of code happen to be similar, but only by happenstance, and not because they fundamentally, conceptually do the same thing. Extracting them into a helper function would be bad.
AP10: Fully agreed.
AP11: I like TDD, but anything can be taken too far. I haven't seen people do that, myself, in person, only online. I've seen people do TDD mechanically, and get bad results. Note also that just doing tests first is not all TDD is. You're supposed to refactor liberally. Also, spikes and experimentation are fine to do without tests at all.
AP12: Badly titled. Better would be: "Not knowing your tools".
AP13: Agreed. Very similar to people having one bad experience with any other tool or methodology and then criticising it forever after. I do that, too much.