The 10% discount ends 22nd of December. Get the master package of the Test With Spring course.

Spring Batch Tutorial: Reading Information From a Database

The previous part of my Spring Batch tutorial described how we can read information from CSV and XML files.

This is definitely a useful skill, but if we want to write real-life batch jobs, we have to know how we can read the input data of a Spring Batch job from a database.

This blog post helps us to learn that skill and describes how we can read information from a database by using either a cursor based approach or pagination.

Let’s start by taking a quick look at our example application.

If you are not familiar with Spring Batch, you should read the following blog posts before you continue reading this blog post:

Introduction to Our Example Application

During this tutorial we will implement several Spring Batch jobs that processes the student information of an online course. We have to create Spring Batch jobs that can export student information from the used database. The exported information is found from the students table that has the following columns:

  • The email_address column contains the email address of the student.
  • The name column contains the name of the student.
  • The purchased_package column contains the name of the purchased package.

When we start writing a batch job, our first step is to provide input data for our batch job. In this case, we have to read student information from the database and transform that information into StudentDTO objects which are processed by our batch job.

The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {
 
    private String emailAddress;
    private String name;
    private String purchasedPackage;
 
    public StudentDTO() {}
 
    public String getEmailAddress() {
        return emailAddress;
    }
 
    public String getName() {
        return name;
    }
 
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
 
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
 
    public void setName(String name) {
        this.name = name;
    }
 
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Let’s move on and read the input data by using a database cursor.

Reading Information by Using a Database Cursor

We can provide the input data for our batch job by configuring an ItemReader bean. If we want to configure an ItemReader bean, which reads the input data from the database by using a database cursor, we have to follow these steps:

  1. Create a configuration class and annotate the created class with the @Configuration annotation. This class is the configuration class of our batch job, and it configures the beans that describe the flow of our batch job.
  2. Create a method that configures our ItemReader bean, and ensure that the method returns an ItemReader<StudentDTO> object and takes a DataSource object as a method parameter.
  3. Implement the created method by following these steps:
    1. Create a new JdbcCursorItemReader object. This object reads the input data by opening a new JDBC cursor and continuously retrieving the next row from the ResultSet.
    2. Configure the used Datasource.
    3. Configure the SQL query that is used to fetch the input data from the database.
    4. Configure the RowMapper<T> object that transforms the information of a single database row into a new T object. Because the field names of the StudentDTO class matches with the column names of the students table, we can can use the BeanPropertyRowMapper<T> class.
    5. Return the created JdbcCursorItemReader object.

The source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.database.JdbcCursorItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.BeanPropertyRowMapper;

import javax.sql.DataSource;

@Configuration
public class DatabaseToXmlFileJobConfig {

    private static final String QUERY_FIND_STUDENTS =
            "SELECT " +
                "email_address, " +
                "name, " +
                "purchased_package " +
            "FROM STUDENTS " +
            "ORDER BY email_address ASC";

    @Bean
    ItemReader<StudentDTO> databaseXmlItemReader(DataSource dataSource) {
        JdbcCursorItemReader<StudentDTO> databaseReader = new JdbcCursorItemReader<>();

        databaseReader.setDataSource(dataSource);
        databaseReader.setSql(QUERY_FIND_STUDENTS);
		databaseReader.setRowMapper(new BeanPropertyRowMapper<>(StudentDTO.class));

        return databaseReader;
    }
}
This configuration reads all rows found from the students table into memory. We use this configuration because the students table has only three rows.

If you are dealing with large datasets, you probably don’t want to read the entire ResultSet into memory. If this is the case, you must limit the number of rows that are fetched from the database when more rows are needed. You can do this by invoking the setFetchSize() method of the AbstractCursorItemReader class.

Additional Reading:

Let’s find out how we can read the input data by using pagination.

Reading Information by Using Pagination

We can provide the input data for our batch job by configuring an ItemReader bean. If we want to configure an ItemReader bean, which reads the input data from the database by using pagination, we have to follow these steps:

  1. Create a configuration class and annotate the created class with the @Configuration annotation. This class is the configuration class of our batch job, and it configures the beans that describe the flow of our batch job.
  2. Create a method that configures our ItemReader bean, and ensure that the method returns an ItemReader<StudentDTO> object and takes a DataSource object as a method parameter.
  3. Implement the created method by following these steps:
    1. Create a new JdbcPagingItemReader object. This object reads the input data by using JDBC and paginates the query results.
    2. Configure the used DataSource.
    3. Configure the used page size. We use one as page size because the students table has only three rows. However, if this would be a real-life application, we would have to select the best page size by profiling our batch job.
    4. Configure the used PagingQueryProvider object. This object specifies the SQL query that fetches the input data from the database. Because we use the H2 in-memory database, we have to create a new H2PagingQueryProvider object.
    5. Configure the RowMapper<T> object that transforms the information of a single database row into a new T object. Because the field names of the StudentDTO class matches with the column names of the students table, we can can use the BeanPropertyRowMapper<T> class.
    6. Return the created JdbcPagingItemReader object.

The source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.database.JdbcPagingItemReader;
import org.springframework.batch.item.database.Order;
import org.springframework.batch.item.database.PagingQueryProvider;
import org.springframework.batch.item.database.support.H2PagingQueryProvider;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.BeanPropertyRowMapper;

import javax.sql.DataSource;
import java.util.HashMap;
import java.util.Map;

@Configuration
public class DatabaseToCsvFileJobConfig {

    @Bean
    ItemReader<StudentDTO> databaseCsvItemReader(DataSource dataSource) {
        JdbcPagingItemReader<StudentDTO> databaseReader = new JdbcPagingItemReader<>();

        databaseReader.setDataSource(dataSource);
        databaseReader.setPageSize(1);

        PagingQueryProvider queryProvider = createQueryProvider();
        databaseReader.setQueryProvider(queryProvider);

		databaseReader.setRowMapper(new BeanPropertyRowMapper<>(StudentDTO.class));

        return databaseReader;
    }

    private PagingQueryProvider createQueryProvider() {
        H2PagingQueryProvider queryProvider = new H2PagingQueryProvider();

        queryProvider.setSelectClause("SELECT email_address, name, purchased_package");
        queryProvider.setFromClause("FROM students");
        queryProvider.setSortKeys(sortByEmailAddressAsc());

        return queryProvider;
    }

    private Map<String, Order> sortByEmailAddressAsc() {
        Map<String, Order> sortConfiguration = new HashMap<>();
        sortConfiguration.put("email_address", Order.ASCENDING);
        return sortConfiguration;
    }
}
The performance of this solution depends from the size of your dataset and the used database. Some implementations of the PagingQueryProvider interface paginate the query results by using the LIMIT and OFFSET keywords, and this is a lot slower than using the so called seek method.

Additional Reading:

Let’s move on and summarize what we learned from this blog post.

Summary

This blog post has taught us four things:

  • If we want to read information from a database by using JDBC cursors, we have to use the JdbcCursorItemReader class.
  • If we are using JDBC cursors, we can configure the fetch size by invoking the setFetchSize() method of the AbstractCursorItemReader class.
  • If we want to read information from a database by using JDBC and pagination, we have to use the JdbcPagingItemReader class.
  • If we use JDBC and pagination, the performance of our solution depends from the size of our dataset and the used database.

The next part of this tutorial describes how we can implement a custom ItemReader.

P.S. You can get the example applications of this blog post from Github: Spring example and Spring Boot example.

About the Author

Petri Kainulainen is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of Spring Data book.

About Petri Kainulainen →

5 comments… add one
  • Hi Petri,

    thanks for the interesting tutorial series.

    As far as i know setFetchsize only defines how many rows are returned by the database in one chunk, but after all, the elements are all loaded in memory before being processed no ?

    thanks

    Reply
    • Hi Chakib,

      Are you talking about the setPageSize() method? In any case, I have the same impression as you. Although I haven’t debugged it, the Javadoc of that that method states that it configures the number of rows which are retrieved at a time.

      Also, it doesn’t make any sense that you could override the chunk size which determines the “length” of the transaction by simply setting the page size of your reader.

      Reply
  • Hi Petri,
    Initially thanks for the tutorial. My scenario is reading the data from on table and writing the records into another table using HibernateCursorItemReader and HibernateItemWriter. Is it possible.
    If possible how can I transfer data from reader to writer through xml configuration.

    Thanks.

    Reply
  • Hi Petri,
    Thanks for the great tutorial..
    I’m wondering can we use JPA / Hibernate entity or model to read from or write to database with this spring batch? Thank you.

    Reply

Leave a Comment