Centric connect.engage.succeed

Thoughts on code quality and testing

Geschreven door Redactie Craft - 31 mei 2016

Redactie Craft
Writing code isn’t easy. Writing good quality code is difficult. Testing code quality is even more difficult. So what makes for good quality code and how do we test for it? The short answer is: it really depends.

As a developer, I often discuss what constitutes well-written code with other developers. Sometimes we end up agreeing, but usually it turns into a very lengthy discussion. Aspects which are thought to determine code quality include:

  • following SOLID principles;
  • unit testing and high code coverage (code coverage tells you the exact quality of your code);
  • being able to release code for production;
  • high customer opinion of the product;
  • low bug rates.

All of the above aspects have merit, but they could just as easily be wrong given that quality is subjective. Consider the following I have recently been working on:

  • Application A: +/- 100000 LOC, high complexity, follows SOLID principles, over 15K unit tests, code coverage >98%, released every month, 0 bugs over the last year, 75 known issues.
  • Application B: +/- 10000 LOC, medium complexity, pragmatic approach to SOLID principles, some unit tests (+/- 300), the customers have a high opinion of the product and a new version is released to production each week. 10 bugs over the last year, but 0 known issues.

There are advantages and disadvantages of each application’s quality aspects.

Application A has fewer bugs, but more work needs to go into rearranging code in order to allow for new features. Application B, on the other hand, has a higher bug rate, but can quickly accommodate new features and fixes for the bugs that do occur.

Code quality continuum

These application examples are at opposite ends of the ‘code quality continuum’, as I call it.

This continuum defines the quality aspects that constitute good quality code for each application and provides a measurable scale that allows for an objective view of each individual aspect. Besides acting as a scale, the continuum also provides the following: the rationale for each aspect, the advantages and disadvantages of being at either end of the scale and guidance on how to objectively verify the position on the scale. An example is given below.

Aspect name
Rationale The reasoning behind how this aspect applies to code quality
High advantage The advantages of being at a high position on this scale
High disadvantage The disadvantages of being at a high position on this scale
Scale (1 to 5): Each scale position should define what it means to be at that position 
Low advantage The advantages of being at a low position on this scale
Low disadvantage The disadvantages of being at a low position on this scale
Verification How to verify an application’s position on the scale

The aspects I am most interested in are: separation of concerns, unit test count, code coverage percentage, code coverage exclusion percentage, release frequency and bug regression. Other concerns might be expressiveness of code, comment use, etc. For now, though, I will settle for the aspects I mentioned first.

Some of the aspects I mentioned above have no direct link to the actual code. However, in my experience, they are indicators of a software product’s general quality. These aspects are also often thought to constitute good code quality.

The table below provides possible definitions for several aspects.

Tabel definitions for several aspects

If we take this table of code quality aspects and apply it to the applications mentioned above, we could come up with the following table.

Application vs. code quality aspect scale Separation of concerns Unit test count Release frequency Bug regression Avg. score
Application A 4 5 2 1 3
Application B 2 3 5 3 3.25

From the table above, we can conclude the following:

  • Both applications score roughly the same on average, though Application B scores slightly higher.
  • Due to the known issues in its code base, Application A scores poorly on bug regression.
  • Application B scores reasonably well on the unit test count.

Which application has the higher quality? Given the amount of data, it is still not possible to determine which application has the higher quality. The table only indicates how each application can improve on each scale aspect. To benchmark applications against each other, we would have to broaden the number of aspects on which to test each application, so as to increase the data set.

However, it is probably safe to say that an application scoring high on all aspects is of better quality than an application that scores lower.

In a real-world situation, these numbers might be used as quality gates to promote an application to production.


Code quality is subjective and can only be made objective if it is broken down into individual aspects. Each aspect needs to be assessed independently. In the future, I may look into tooling that facilitates the assertion of these quality aspects.


  • Centric
    Peer Fisser
    13 augustus 2016
    A most interesting read! However, I believe your article is flawed on a number of matters which I would like to respectfully point out:

    First of all, the presented code quality table seems more like an arbitrary application aspect table. For example, release frequency has nothing to do with code quality: being able to fix bugs faster ensures faster functional code, not better code. Also bug frequency is not a direct measure for code quality, nor is it accurate for application quality. All things being equal, bug frequency is expected to be related to the size and complexity of the application. Furthermore, unit test density (not count) is not a good measure. I could write a million unit tests for a hello world application to attain a phenomenal score without doing any actual application testing. Code coverage would be a much better measure, but also still not ideal. Also, test density in number of tests per 1000 LOC is a thousand times the quotient of the number unit tests divided by LOC, not LOC divided by 1000. As for SoC column, the scale raises questions. First you need to define what you mean by SoC. Assuming it's DI and SR (as it typically is), how will you measure responsibility of all your classes? And what about the many other design principles?

    I'm sure you will agree that it is not fair to treat the advantages and disadvantages of a code quality aspect on the same footing. The advantages may outweigh some trivial disadvantages by far, and therefore related advantages and disadvantages may be deemed unacceptable. While this is not clearly reflected in the table, it is somewhat implied through the scale system. However, I would still object against what you refer to as "high disadvantage" and "low advantage", since they attempt to provide motivations for moving away from principles, instead of towards them. Also note the inverse relation between high advantage and low disadvantage (as well as of course its permutation), rendering two of the four rows redundant.

    While your scaling system is an ambitious attempt at objectively quantifying code quality, it ultimately fails on its inherit subjective qualities. Take for example the presumption that all aspects carry equal weight. Such a distribution (or any other for that matter) is inherently subjective, and so is the scale quantification, as well as any selection of aspects, which is the very thing we want to get rid of when it comes to judging quality of code. Perhaps that was also the point of your story, since you also correctly conclude that each aspect needs to be assessed independently. In any case, I would not presume that code quality is a subjective matter when it comes to the well-defined principles that are the culmination from decades of painstaking experience and deliberation from both developers and computer scientists. But any attempt at a general description with whatever arbitrary derivations from quality principles you can make, such as the one demonstrated here, most definitely is. Therefore I have not seen a subjective matter objectified, but rather the inverse.
  • Centric
    Robbie Kouwenberg
    21 oktober 2016
    @Peer Fisser:
    Thank you for your well thought of response, please accept my sincere apologies for the late reply on it.
    I believe your response points out exactly the reason I wrote this post in the first place since it is based on your personal opinions.

    You are right to point out that the presented code quality table seems arbitrary, however when you look from a standpoint of 'what constitutes a successful application' you might share the following opinions:
    - Being able to release frequently means quality in your delivery process, being able to fix bugs faster is simply an additional benefit.
    - High bug frequency constitutes a low quality of your application.
    My opinion is that there is no 'one quality scale to rule them all', it all depends on the type of application you are trying to make.
    Advantages and disadvantages are simply that, they carry no innate weight by themselves but gain it from the context of the application.
    I will agree that the terms used may imply moving away from principles which I am in no way advocating, I am however advocating that the choice of following principles be a conscious one.

    My point is that quality comes from getting the right product at the right time at the right place without losing yourself in the details of theory and opinionated bias.
    In the perfect world we as developers would have unlimited time and resources to make our applications conform to each and every principle, a costly and time consuming operation to say the least.
    In the real world we as developers will be required to make deliberate choices about what matters most for our customers, their product.
  • Houtzager ICT
    Stefan Houtzager
    23 april 2017
    Code quality cannot be measured with test coverage percentages. This is a dangerous myth.
Schrijf een reactie
  • Captcha image
  • Verzenden