Don't target 100% coverage... but achieve it anyway!
I recently noticed that a lot of people were advising to reach 100% of code coverag...
For further actions, you may consider blocking this person and/or reporting abuse
You should only cover that which needs covering, no more, but that's most things, really.
But it is unwise to choose an arbitrary number, say 70% or 80%, and to require that level of measured coverage. Remember that the tool does not measure many things, so even 100% measured coverage is probably not 100%.
The problem with this approach (arbitrary levels of coverage) is that the coverage report is always filled with reds and yellows. You become accustomed to that and don't think twice about it. It's like getting used to an alarm and so you no longer hear it.
And what is it exactly that you're not covering? Can you tell? Do you know? Do you know why or why not?
So a better approach is to always require 100% coverage as measured by the tool (e.g., Istanbul). Then choose which features, paths, etc. do not need coverage and disable coverage for those features in the code and with an explanation.
Now your coverage report is binary: it is either PASS (all green) or FAIL. Easy. And you can automate it (CI/CD), too, to prevent deploying bad shit.
And you can see with a simple global search what code is not covered and why. It is evident in code reviews, too.
And it's not arbitrary! You may find that you really only needed 50% coverage. Or that you'd have been living dangerously with less than 95%. In short, you get exactly the coverage you need, no more, no less, and an easy metric for success.
It astonishes me that this practice is virtually unheard of. I've been using it for probably a decade with great success. And if you write clean, simple, understandable code, you may find that 100% measured is easy to achieve even without ignoring anything.
In fact, if something is difficult to test properly, then it is probably poorly written. Better to fix it.
Struggles with coverage also probably mean you're writing too many unit tests and not enough integration tests. Testing how things work in isolation (unit) is almost worthless. I save it for simple utility functions, if then.
The best tests test the code as it is expected to work in the wild. Preferably in production mode and on production software.
Thanks for sharing you approach @chasm. As you mentioned there are many interesting ways to use coverage, the only thing is that it requires a good understanding of what code coverage is all about, and disciplines/tools to achieve it without effort/pain.
Exactly, and that's when TDD appears to be super useful, one must write code that is easily testable otherwise can't process to quickly and easily pass from RED to GREEN .
Integration tests are also very interesting and yes, ideally we should be executing them in production. Two resources I like about the subject:
Hi, @antoinecoulon. I agree. Thanks for the links. They look like good resources. I'll check them out.
My experience is the other way round:
Struggles with coverage also probably mean you're writing too many integration tests and not enough unit tests.
Having said that, I work mostly on backends and libraries, front-end developers may have different experiences.
That makes no sense to me at all. I wonder if we're talking about the same things? I can write one integration test and test parts of several different units. A couple of integration tests and I might not need any unit tests at all.
By unit tests I mean those that test a function or component in isolation from other functions and components, often by mocking out dependencies. Sometimes called anti-social testing.
By integration tests I mean moving the mocks to the edges of the app -- only mocking external resources (generally API calls) -- then allowing the components to interact the way they will in production.
My feeling is that if you're finding integration tests too difficult to write (they are easier in my experience), then it's time to rethink your architecture and your choices.
I don't see that backends are any different (I've built plenty). Libraries are probably more suited to unit tests, although functions intended to work together should be tested together, IMO.
Can you explain how and why integration tests (if we're talking about the same thing) make coverage a struggle?
Do you mind elaborating, please? If it's too broad for you to do justice to here, links are welcome
Thanks for letting me the chance to elaborate on that subject. Before sharing my thoughts here is one precious and valuable link I could share with you is that wonderful talk by Ian Cooper, explaining which pain points you'll face when writing tests last.
Putting that talk aside, my humble opinion would be:
Test-Last
will favor the writing of tests that are irrelevant because the code is already written, so how can you prove that your test is serving a specific purpose? Say you write 150 lines of code and after that you write 3 tests which all of them pass. How could you know if they pass because of your code given that in the first place there was no tests that asserts this was failing without the code? The danger here is that these tests that serve no purpose wrongly augment the confidence in the code produced.Test-Last
is one of the reasons developers might not write tests, as they become painful and hard to write once all the code was already written. When writing tests after, you might end up dealing with implicit dependencies on which you have no direct control, making the ability to easily test the behavior of the system fly away, and making also the tests non-deterministic for some cases. Take for instance a simple use case where some kind of date (Date.now() -> non-deterministic) is involved, how you're supposed to test it afterwards if you don't realize in the first place that this might be done differently for the test to be simple to write?Then when writing the test after:
Now that's a simple case, but imagine writing dozens or hundreds of lines of code at once and then trying out to figure how you could test that behavior having multiple nested implicit dependencies? Doing Test-Last will make just feel that writing tests is the worst part of the day
Watched the talk cuz this is a highly sensitive topic for me. Sadly didn't find a comparison between both paradigms there. But I appreciate you for obliging me.
My thought is that test-last by itself isn't a poor pattern to follow. If your hypothetic developer decides to test his 150 lines shabbily, that's his fault. He is probably not confident about the tests either. But you have the
@covers
annotation to focus coverage report surface area at a point of interest. This should answer the question of knowing whether the code influences the SUT. It's one thing to influence SUT but whether it does in an intended way can only be answered by mutation testing tools. That's not an exclusive preserve of test-lastersImplicit dependencies, again, is a fault of the developer, not the paradigm. People attribute arriving at good design to TDD. Maybe it's just me but I've never encountered an issue with wiring dependencies or the container just to make dependencies available
Now, I suspect this reply may come off dismissive even though that's not the case. I'm open to any argument I can relate to that can convince me that testing last is actually detrimental. If you find any more, I will be here
There is no direct comparison in this talk, but it exposes facts about what is most likely going to happen if you don't follow either a Test-First or a TDD approach.
Of course some developers can write reliable tests in a Test-Last fashion, my point (kinda pushed to the extreme, I admit it) is just based on the fact that by doing Test-Last you have more chances to end up writing tests that are doing nothing relevant or they just stand as confidence tests that are useless in the sense that other tests already cover the use cases. When doing Test-First or TDD, you're by design forced to implement a behavior following a spec described in a test case, while when doing Test-Last it's way more harder (nothing impossible) and less natural to achieve it in the same way. I have been doing mostly Test-Last and I'm not afraid to admit that I had tons of headaches when trying to write good and reliable unit tests after few hours of development, writing multiple use cases including each variant etc. And even after that, I don't even had guarantees that I tested correctly the whole flow of the code I produced.
As I mentioned in the article:
Being able to design correctly has nothing really to do with TDD, it just that having the constraint of writing testable code in the first place and in an incremental way helps you finding the good code design, breaking complexity piece by piece. If you're good at design software then TDD will just allow you to do it even quicker and safer. As you appear to be curious on that subject I must recommend you the book about TDD with C++ amazon.co.uk/Modern-Programming-Te... by Jeff Langr which is probably the most valuable resource on that subject, easy and quick to read, you'll understand what are the overall benefits of having a First (Before) instead of a Last (After) approach. There is no way you won't become a better developer after reading it :)
Goodhart's law: When a measure becomes a target, it ceases to be a good measure.
I could not agree more with you, the way you describe here is pretty much the way I achieved 100% rock solid test coverage for pygeoif.
I used mutmut for mutation testing.
@ldrscke love that Goodhart's law! Interesting to see that something that was initially related to economics also applies to software engineering.
Thanks for sharing that about
pygeoif
, you're now my official proof that I'm not saying any bullshit here 😎Thank you for reminding that code coverage is a metric, not an objective 😊
I would add that auto-test has never ensured your software works. It ensures it works like before. Tests freeze the behavior and architecture to ensure code modification won't change behavior in an unexpected way.
If you put too many tests, you might freeze your software in a way in won't be possible for you to make evolution without refactoring lots of tests.
And unfortunately, this generally happen when you put an objective of code coverage. You end up doing white box testing only for the sake of reaching that corner case line of code that never happen in real world.
@tandrieu thanks to you for that feedback!
I agree with you on the point that adding test to boost up metrics will never be a good thing as you underlined, it might get the software frozen and put developers in a situation where refactoring is impossible as tests are too tightly coupled to specific implementations.
IMHO that's where discipline like TDD (when doing it well) gets highly effective, in a sense that tests get written as a consequence from implementing specification and are not really coupled with implementation details, so it gets higher enough to have the ability to refactor but also to have a good enough "coverage" of the production code.
What your opinion about property based testing ?
Property-based testing is such an interesting testing technique that I could have mentioned you're completely correct, in fact Mutation Testing aims to achieve more or less the same goal which is finding variants (or mutations) that your production code doesn't cover. However we could consider them also complementary in the sense that the main difference is that PBT is in general meant to be easily configurable to test a specific set of entries with custom business rules whereas Mutation Testing does mutation things automatically under the hood using a list of pre-defined mutators (for Stryker here is a list of the supported mutators: stryker-mutator.io/docs/mutation-t...).
They both have use cases, but my personal opinion is that most of the time if you process to achieve TDD correctly, most of the expected business use cases should be covered. Moreover, testing things that shouldn't happen is often a smell in a sense that you could just ensure upfront that these things won't happen. Nevertheless for some parts of the software where the input must be controlled in a critical way, you can always end up adding another security layer (Unit Tests coming from TDD + PBT).
Note: I highly recommend github.com/dubzzz/fast-check for PBT in the JavaScript/TypeScript ecosystem.
A good introduction to property based testing are videos (mostly tutorials and conference talks) and blog posts about hypothesis. Never mind that it is a Python 🐍️ library, the approach is similar in other programming languages.
I finally finished my A Tale of two Kitchens post that touches on mutation testing and coverage as well @antoinecoulon
While I agree on the first part : 'don't target 100 coverage'. The second part 'but achieve it anyway!' is less obvious to me. Do you mean that we can run a mutation testing on projects with low / unexistent or less trustable initial coverage ?
No that's not really what I meant.
What I'm trying to say there is that the goal should be to achieve 100% coverage but indirectly, leveraging disciplines such as TDD, because of the Goodhart's law: *When a measure becomes a target, it ceases to be a good measure. *.
You have many ways to achieve 100% of coverage and the most important thing is not reaching 100%, it's how you managed to achieve 100%. As shown in the little example, by being solely focused on covering each line of code independently, you might feel (wrongly) safe and forget to cover other cases (not detected by coverage but by mutation testing). That's often (not always) the case when doing a Test-Last approach. That's why having a look just on the % of coverage in itself is not enough.
So while having less than 100% clearly indicates that your code misses some tests, the reverse is not true, having 100% coverage does not mean that your code has all tests it should have, and mutation testing helps you measure that.
Even if it's not what I meant (explained above), there is no relationship between the usefulness of mutation testing and the amount of code being covered by tests, it could still be valuable tu use mutation testing even if you're not on 100% coverage. Mutation testing evaluates existing tests, whether there are only 2-3 tests throughout the whole codebase. In that case it would at least make these few tests safer, even though you have 1% of code coverage