Writing Tests for Data Access Code - Green Build Is Not Good Enough

Author: Petri Kainulainen Published: July 13, 2014 3 comments

Tags: Clean Code, Integration Testing, Unit Testing

The first thing that we have to do before we can start writing integration tests for our data access code is to decide how we will configure our test cases.

We have two options: the right one and wrong one.

Unfortunately many developers make the wrong choice.

How can we avoid making the same mistake?

We can make the right decisions by following these three rules:

Rule 1: We Must Test Our Application

This rule seems obvious. Sadly, many developers use a different configuration in their integration tests because it makes their tests pass.

This is a mistake!

We should ask ourselves this question:

Do we want to test that our data access code works when we use the configuration that is used in the production environment or do we just want that our tests pass?

I think that the answer is obvious. If we use a different configuration in our integration tests, we are not testing how our data access code behaves in the production environment. We are testing how it behaves when we run our integration tests.

In other words, we cannot verify that our data access code works as expected when we deploy our application to the production environment.

Does this sound like a worthy goal?

If we want to test that our data access code works when we use the production configuration, we should follow these simple rules:

We should configure our tests by using the same configuration class or configuration file which configures the persistence layer of our application.
Our tests should use the same transactional behavior than our application.

These rules have two major benefits:

Because our integration tests use exactly the same configuration than our application and share the same transactional behavior, our tests help us to verify that our data access code is working as expected when we deploy our application to the production environment.
We don’t have to maintain different configurations. In other words, if we make a change to our production configuration, we can test that the change doesn’t break anything without making any changes to the configuration of our integration tests.

Rule 2: We Can Break Rule One

There are no universal truths in software development. Every principle rule is valid only under certain conditions. If the conditions change, we have to re-evaluate these principles. This applies to the first rule as well.

It is a good starting point, but sometimes we have to break it.

If we want to introduce a test specific change to our configuration, we have to follow these steps:

Figure out the reason of the change.
List the benefits and drawbacks of the change.
If the benefits outweigh the drawbacks, we are allowed to change the configuration of our tests.
Document the reason why this change was made. This crucial because it gives us the possibility to revert that change if we find out that making it was a bad idea.

For example, we want to run our integration tests against an in-memory database when these tests are run in a development environment (aka developer’s personal computer) because this shortens the feedback loop. The only drawback of this change is that we cannot be 100% sure that our code works in the production environment because it uses a real database.

Nevertheless, the benefits of this change outweigh its drawbacks because we can (and we should) still run our integration tests against a real database. A good way to do this is to configure our CI server to run these tests.

This is of course a very simple (and maybe a bit naive) example and often the situations we face are much more complicated. That is why we should follow this guideline:

If in doubt, leave test config out.

Rule 3: We Must Not Write Transactional Integration Tests

One of the most dangerous mistakes that we can make is to modify the transactional behavior of our application in our integration tests.

If we make our tests transactional, we ignore the transaction boundary of our application and ensure that the tested code is executed inside a transaction. This is extremely harmful because it only helps us to hide the possible errors instead of revealing them.

If you want to know how transactional tests can ruin the reliability of your test suite, you should read a blog post titled: Spring pitfalls: transactional tests considered harmful by Tomasz Nurkiewicz. It provides many useful examples about the errors which are hidden if you write transactional integration tests.

Once again we have to ask ourselves this question:

Do we want to test that our data access code works when we use the configuration that is used in the production environment or do we just want that our tests pass?

And once again, the answer is obvious.

Summary

This blog post has taught use three things:

Our goal is not to verify that our data access code is working correctly when we run our tests. Our goal is to ensure that it is working correctly when our application is deployed to the production environment.
Every test specific change creates a difference between our test configuration and production configuration. If this difference is too big, our tests are useless.
Transactional integration tests are harmful because they ignore the transactional behavior of our application and hides errors instead of revealing them.

That is a pretty nice summary. We did indeed learn those things, but we learned something much more important as well. The most important thing we learned from this blog post is this question:

Do we want to test that our data access code works when we use the configuration that is used in the production environment or do we just want that our tests pass?

If we keep asking this question, the rest should be obvious to us.

3 comments… add one

ArunM Jul 15, 2015 @ 13:47

By saying "We should configure our tests by using the same configuration class or configuration file which configures the persistence layer of our application.", Do you mean that we should test against the actual development database while doing an integration test ?

This seems to make a test difficult to maintain ? For eg:- Suppose you had an application processing system and we have a service layer method which will take an application number, process it completely(Update 4 - 5 tables with status like application status,email sent status ..).

In the above scenario we will need to provide fresh test data(application Number) for each run of the integration test which means that I need to be aware of how to set up the test data(pre conditions) for setting up an application ...

Reply Link
- Petri Jul 15, 2015 @ 18:34
  
  Hi ArunM,
  
  Thank you for an interesting comment.
  
  Do you mean that we should test against the actual development database while doing an integration test ?
  
  If the term development database means the database that is used by the developer when he/she runs the application, the answer is no.
  
  I meant that we should run our integration tests against the database that is used when the application is deployed to the production environment. For example, if we use PostgreSQL as a production database, we should run our tests against a PostgreSQL database.
  
  However, the developer don't necessarily have to do it (check the rule 2). He/she can run these tests against the H2 in-memory database and let the CI server run the integration tests against the PostgreSQL database. Also, if it makes sense to run integration tests only on the CI server, that is fine too.
  
  This seems to make a test difficult to maintain ? For eg:- Suppose you had an application processing system and we have a service layer method which will take an application number, process it completely(Update 4 – 5 tables with status like application status,email sent status ..).
  
  I agree that writing integration tests for complex functions can be "hard". That is why we should always evaluate how many integration tests we should write for a function X.
  
  Sometimes these tests can be so hard to write that we should test the components by writing unit tests and write only a few integration / end-to-end tests which test only the happy path. Kenneth Truyers has written a good blog post titled: The Test Pyramid. I recommend that you read it.
  
  In the above scenario we will need to provide fresh test data(application Number) for each run of the integration test which means that I need to be aware of how to set up the test data(pre conditions) for setting up an application …
  
  Actually I think that we should initialize our database into a known state before each integration test. I agree that managing test data can be a bit painful, but I think that it is still easier than to debug non-deterministic integration tests. Martin Fowler has written an excellent article that describes how you can get rid of non-deterministic tests. He also explains why you should do it.
  
  Reply Link