What I Learned This Week (Week 42/2013)

Author: Petri Kainulainen Published: October 20, 2013 3 comments

Tags: Hibernate, JPA2, Learning, SQL, TDD, Technology Evaluation, Unit Testing

Each week I write a blog post which describes what I learned that week. I write these blog posts for two reasons.

First, I want to keep track of my personal development and writing regular blog posts is a great way to do it.

Second, I want to share my findings with you. I hope that you can use some of them in your daily work.

Let's get started and find out what I learned in week 42.

What I Learned in Week 42

First, A low code coverage is a good indicator of technical debt.

My experience has taught me that often when a software project has serious technical problems, it has a low code coverage as well. The most obvious problems caused by low code coverage are:

You have no easy way to to verify that your code is working.
You have no easy way to ensure that your changes won’t break anything.

Of course you might argue that unit tests can be used only to test individual components, and you would be right. That brings us to a less known benefit of unit testing:

Unit testing is actually a design tool!

Writing unit tests will help to identify crappy code even if you don't use TDD as long as you remember this rule:

Crappy code is hard to test!

In other words, if writing tests for your code feels hard, it is a sign that your code is crap. Take a good look at your code and make it better. After you have cleaned up your code, you shouldn't have any troubles writing tests for it.

A word of warning though, even though low code coverage means that that you are probably in trouble, a high code coverage doesn't necessarily mean that your application is debt free. For example, it is possible that you have technical debt in your tests!

Second, We cannot defeat the Brook's law.

The Brook's law is a software development principle which states that:

"Adding manpower to a late software project makes it later."

If our project is late (or in trouble) and we need to speed it up, the correct way to do this is to try to increase the productivity of our existing team members. We need to remove all unnecessary distractions so that these people can concentrate on getting the project back on track.

On the other hand, we don't live in an ideal world and sometimes it is necessary to add manpower to a project even if we know that it is going to hurt us.

If this happens, we have to minimize the damage.

One way to do this is to give our new team members easy tasks which doesn't require any domain knowledge. This can be a bit demotivating for the new team members but it means that the old team members can spend more time working on the tasks which require domain knowledge.

If this is out of the question, one option is to assign a programming pair for each new team member and assign tasks for each pair. This way the old team members can transfer their domain knowledge to the new ones. This is probably painful in the short run but it will help the project in the long run.

If we cannot do this either, we are screwed and we should prepare to abandon the ship before it hits the iceberg.

Third, Let your database to do its job.

I have noticed that many developers think that the biggest benefit of ORMs is that developers can get an entity from the database and load its relationships lazily when they need them. In other words, these developers execute table joins in their code.

From my point of view, ORMs have three major benefits:

I don't have to write boilerplate code which transforms result sets into objects.
I don't have to create database queries which inserts data to the database.
If I make changes to persistent objects inside a read-write transaction, I don't have to manually update these changes to the database.

I want to execute join operations in the database because

That is the responsibility of a relational database, and they are good at it.
This helps be to avoid the notorious n+1 selects problem.

Also, I am very well aware of the fact that some developers think that a join operation is slow but in reality the performance of join operations is very good when they are done in the right way.

If you want to use the relational database only as a key-value store, you should seriously ask yourself this question:

Do I really need a relational database or should I use a *gasp* real key-value store?

Fourth, Create one database query per each use case.

I have noticed that developers (me included) have a tendency to reuse as much code as possible. This is a good thing but reusing database queries is a bad idea because you cannot optimize its performance for any specific use case. This means that you have to create a query which performance is good enough for all supported use cases.

In other words, the performance of your application is not as good as it could be.

That being said, if the same database query can truly support more than one use case (the same select clause and the same where clause can be used), it is fine to use this query for all use cases. However, remember that if the situation changes, you have to create a new database query/queries instead of modifying the existing one.

Fifth, If you only show information, return data transfer objects instead of entities.

When you need to show information in the user interface, you might be facing one of the following situations:

You don’t need all fields of an entity.
You need to combine information from multiple entities but get only a few fields per entity.

If you face one of these situations, ask yourself two questions:

If I only need a few fields, does it make sense to get all fields of the entity?
If I need to combine information from multiple entities but I need only a few fields per entity, does it make sense to get all fields of all entities?

Let's be honest here. Often the only reason why we query entities is laziness. We are too lazy to think about these questions because we think that the overhead of querying entities doesn't matter. Well, how often do we test how big the overhead really is?

Exactly!

I used to be a huge fan of querying entities but my recent experiences made me realize that it makes sense only if I want to update or delete something. If I want only to read information, often the fastest way to do it is return DTOs instead of entities.

This takes a bit more work but we cannot just ignore the option which gives us the best performance just because we are lazy. Right?

By the way, here are some suggestions how you can query DTOs instead of entities:

If you use Hibernate, you can use the AliasToBeanResultTransformer class.
If you like SQL, take a look at library called jOOQ. It supports multiple ways to fetch data from the database, and provides an easy way to map your query results into POJOs. If the mapping capabilities of jOOQ are not good enough for you, you can also integrate jOOQ with ModelMapper.

What Did You Learn This Week?

Share your learning experiences or other comments on the comment section.

3 comments… add one

Pedro Oct 29, 2013 @ 18:11

What about using DTO on Spring Data JPA?

I would be really interested in that.... eg. getting my coment on the 4th spring data jpa tutorial, I'd like to get transactions and some card fields.

Thanks a lot.

Reply Link
- Petri Oct 29, 2013 @ 18:19
  
  Spring Data JPA doesn't have a very good support for querying DTOs. However, It can be done.
  
  I don't like this approach because it means that I have to create a constructor which takes all required objects as constructor arguments. However, if this is not a problem to you, it should solve your problem.
  
  Reply Link
Pedro Oct 30, 2013 @ 11:31

Other approach (not better) would be to define the relationship as EAGER. In this case JPA will get just one extra entity not a set or list so the overload won't be much higher. And the Spring data code will be the same.

Reply Link