Are you tired of writing tests which have a lot of boilerplate code? If so, get started with Spock Framework >>

Spring Batch Tutorial: Reading Information From a Relational Database

The previous parts of my Spring Batch tutorial described how we can read information from CSV and XML files. These are definitely useful skills, but if we want to write real-life batch jobs, we have to know how we can read the input data of a Spring Batch job from a relational database.

After we have read this blog post, we:

  • Can read the input data of our batch job by using a database cursor.
  • Understand how we can read the input data of our batch job by using pagination.

Let’s start by taking a quick look at our example application.

This blog post assumes that:

Introduction to Our Example Application

During this blog post we will read the input data of our batch job from a relational database which contains the student information of an online course. The student information is found from the students table that has the following columns:

  • The email_address column contains the email address of the student.
  • The name column contains the name of the student.
  • The purchased_package column contains the name of the purchased package.

The ItemReader which reads the student list from a relational database must return StudentDTO objects which are sorted in ascending order by using the email address of the student as a sort criteria. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {
  
    private String emailAddress;
    private String name;
    private String purchasedPackage;
  
    public StudentDTO() {}
  
    public String getEmailAddress() {
        return emailAddress;
    }
  
    public String getName() {
        return name;
    }
  
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
  
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
  
    public void setName(String name) {
        this.name = name;
    }
  
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Next, we will find out how we can read the input data of our batch job by using a database cursor.

Reading the Input Data of Our Batch Job by Using a Database Cursor

We can provide the input data for our batch job by configuring an ItemReader bean. Because we want to read the student information from a relational database by using a database cursor, we have to configure this bean by following these steps:

First, we have to create the configuration class that contains the beans which describe the flow of our batch job. The source code of our configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class DatabaseCursorExampleJobConfig {
}

Second, we have to create the method that configures our ItemReader bean. This method must take a DataSource object as a method parameter and return an ItemReader<StudentDTO> object. After we have created this method, the source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import javax.sql.DataSource;
 
@Configuration
public class DatabaseCursorExampleJobConfig {
 
    @Bean
    public ItemReader<StudentDTO> itemReader(DataSource dataSource) {
         
    }
}

Third, we have to configure our ItemReader bean by following these steps:

  1. Create a new JdbcCursorItemReaderBuilder<StudentDTO> object. This object provides a fluent API which allows us to build a new JdbcCursorItemReader<StudentDTO> object that reads the input data by opening a JDBC cursor and continuously retrieves the next row from the ResultSet.
  2. Configure the name of the created JdbcCursorItemReader<StudentDTO> object.
  3. Configure the data source that provides database connections to the created JdbcCursorItemReader<StudentDTO> object.
  4. Configure the SQL query that’s used to query the input data from the used database.
  5. Configure the RowMapper<T> object that transforms the information of a single database row into a new T object. Because the field names of the StudentDTO class matches with the column names of the students table, we can configure the used RowMapper by creating a new BeanPropertyRowMapper<StudentDTO> object.
  6. Return the created JdbcCursorItemReader<StudentDTO> object.

After we have configured our ItemReader bean, the source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.database.builder.JdbcCursorItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.BeanPropertyRowMapper;

import javax.sql.DataSource;

@Configuration
public class DatabaseCursorExampleJobConfig {

    private static final String QUERY_FIND_STUDENTS =
            "SELECT " +
                    "email_address, " +
                    "name, " +
                    "purchased_package " +
            "FROM STUDENTS " +
            "ORDER BY email_address ASC";

    @Bean
    public ItemReader<StudentDTO> itemReader(DataSource dataSource) {
        return new JdbcCursorItemReaderBuilder<StudentDTO>()
                .name("cursorItemReader")
                .dataSource(dataSource)
                .sql(QUERY_FIND_STUDENTS)
                .rowMapper(new BeanPropertyRowMapper<>(StudentDTO.class))
                .build();
    }
}

Let’s find out how we can read the input data of our batch job by using pagination.

Reading the Input Data of Our Batch Job by Using Pagination

We can provide the input data for our batch job by configuring an ItemReader bean. Because we want to read the student information from a relational database by using pagination, we have to configure this bean by following these steps:

First, we have to create the configuration class that contains the beans which describe the flow of our batch job. The source code of our configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class JDBCPaginationExampleJobConfig {
}

Second, we have create the method that configures the SqlPagingQueryProviderFactoryBean bean and ensure that this method takes a DataSource object as a method parameter. The SqlPagingQueryProviderFactoryBean class has two responsibilities:

  • It auto-detects the used database.
  • It determines the appropriate PagingQueryProvider implementation that constructs the SQL queries which fetch the paginated data from the database.

After we have created this method, we have to implement it by following these steps:

  1. Create a new SqlPagingQueryProviderFactoryBean object.
  2. Configure the data source which is used to determine the type of the used database.
  3. Specify the SELECT and FROM clauses which query all rows from the source table.
  4. Specify the sort keys which are used to sort the query results of the executed database queries.
  5. Return the created SqlPagingQueryProviderFactoryBean object.

After we have configured the SqlPagingQueryProviderFactoryBean bean, the source code of our configuration class looks as follows:

import org.springframework.batch.item.database.Order;
import org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import javax.sql.DataSource;
import java.util.HashMap;
import java.util.Map;

@Configuration
public class JDBCPaginationExampleJobConfig {
    
    @Bean
    public SqlPagingQueryProviderFactoryBean queryProvider(DataSource dataSource) {
        SqlPagingQueryProviderFactoryBean provider = 
                new SqlPagingQueryProviderFactoryBean();

        provider.setDataSource(dataSource);
        provider.setSelectClause("SELECT email_address, name, purchased_package");
        provider.setFromClause("FROM students");
        provider.setSortKeys(sortByEmailAddressAsc());

        return provider;
    }

    private Map<String, Order> sortByEmailAddressAsc() {
        Map<String, Order> sortConfiguration = new HashMap<>();
        sortConfiguration.put("email_address", Order.ASCENDING);
        return sortConfiguration;
    }
}

Third, we have to create the method that configures our ItemReader bean. This method must take DataSource and PagingQueryProvider objects as method parameters, and it must return an ItemReader<StudentDTO> object.

After we have created this method, we must implement it by following these steps:

  1. Create a new JdbcPagingItemReaderBuilder<StudentDTO> object. This object provides a fluent API which allows us to build a new JdbcPagingItemReader<StudentDTO> object that reads the input data of our batch job one page at a time by using JDBC.
  2. Configure the name of the created JdbcPagingItemReader<StudentDTO> object.
  3. Configure the data source that provides database connections to the created JdbcPagingItemReader<StudentDTO> object.
  4. Configure the used page size. We use one as page size because the students table has only three rows. However, if this would be a real-life application, we should select the best page size by profiling our batch job.
  5. Configure the used PagingQueryProvider object. As we remember, this object constructs the SQL queries which fetch the paginated data from the database.
  6. Configure the RowMapper<T> object that transforms the information of a single database row into a new T object. Because the field names of the StudentDTO class matches with the column names of the students table, we can configure the used RowMapper by creating a new BeanPropertyRowMapper<StudentDTO> object.
  7. Return the created JdbcPagingItemReader<StudentDTO> object.

After we have configured our ItemReader bean, the source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.database.Order;
import org.springframework.batch.item.database.PagingQueryProvider;
import org.springframework.batch.item.database.builder.JdbcPagingItemReaderBuilder;
import org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.BeanPropertyRowMapper;

import javax.sql.DataSource;
import java.util.HashMap;
import java.util.Map;

@Configuration
public class JDBCPaginationExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader(DataSource dataSource,
                                             PagingQueryProvider queryProvider) {
        return new JdbcPagingItemReaderBuilder<StudentDTO>()
                .name("pagingItemReader")
                .dataSource(dataSource)
                .pageSize(1)
                .queryProvider(queryProvider)
                .rowMapper(new BeanPropertyRowMapper<>(StudentDTO.class))
                .build();
    }

    @Bean
    public SqlPagingQueryProviderFactoryBean queryProvider(DataSource dataSource) {
        SqlPagingQueryProviderFactoryBean provider =
                new SqlPagingQueryProviderFactoryBean();

        provider.setDataSource(dataSource);
        provider.setSelectClause("SELECT email_address, name, purchased_package");
        provider.setFromClause("FROM students");
        provider.setSortKeys(sortByEmailAddressAsc());

        return provider;
    }

    private Map<String, Order> sortByEmailAddressAsc() {
        Map<String, Order> sortConfiguration = new HashMap<>();
        sortConfiguration.put("email_address", Order.ASCENDING);
        return sortConfiguration;
    }
}
}
Because the implementations of the PagingQueryProvider interface paginate the query results by using database specific features, the performance of this solution depends from the size of our dataset and the used database.

Additional Reading:

We can now read the input data of our batch job from a relational database. Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us three things:

  • If we want to read the input data of our batch job from a relational database by using a database cursor, we can build the JdbcCursorItemReader<T> object by using the JdbcCursorItemReaderBuilder<T> class.
  • If we want to read the input data of our batch job from a relational database by using pagination, we can auto-detect the used database and select the approriate pagination logic by configuring the SqlPagingQueryProviderFactoryBean bean.
  • If we want to read the input data of our batch job from a relational database by using pagination, we can build the JdbcPagingItemReader<T> object by using the JdbcPagingItemReaderBuilder<T> class.

The next part of this tutorial describes how we can implement a custom ItemReader.

P.S. You can get the example applications of this blog post from Github.

10 comments… add one
  • Hi Petri,

    thanks for the interesting tutorial series.

    As far as i know setFetchsize only defines how many rows are returned by the database in one chunk, but after all, the elements are all loaded in memory before being processed no ?

    thanks

    Reply
    • Hi Chakib,

      Are you talking about the setPageSize() method? In any case, I have the same impression as you. Although I haven’t debugged it, the Javadoc of that that method states that it configures the number of rows which are retrieved at a time.

      Also, it doesn’t make any sense that you could override the chunk size which determines the “length” of the transaction by simply setting the page size of your reader.

      Reply
  • Hi Petri,
    Initially thanks for the tutorial. My scenario is reading the data from on table and writing the records into another table using HibernateCursorItemReader and HibernateItemWriter. Is it possible.
    If possible how can I transfer data from reader to writer through xml configuration.

    Thanks.

    Reply
  • Hi Petri,
    Thanks for the great tutorial..
    I’m wondering can we use JPA / Hibernate entity or model to read from or write to database with this spring batch? Thank you.

    Reply
    • Hi,

      If you are reading information from the database, you can use the ItemReader that supports Hibernate or JPA.

      On the other hand, if you are writing information to the database, you have to implement a custom ItemWriter that writes the processed information to the database by using Hibernate or JPA.

      Reply
      • Hi Petri,
        Thank you for the reply.
        I have been tasked to execute nested select queries a step in my batch job
        I was thinking maybe I need to execute the select queries a couple times from inside-out (from inner ones first)
        Or is it better to just use something like JPA’s @NamedQuery for these complex subqueries?
        What do you think?

        Reply
        • Hi,

          I think that the best solution always depends from the queries you want to run. Also, if you have to optimize your queries, it always a good idea to profile both implementations because the “logical” choice might no always be the fastest one. And one last thing, try to use SQL as much as you can. I have noticed that using SQL has a dramatic positive impact to the performance of my batch jobs.

          Reply
  • Hi Petri,

    I am using paging itemreader Please can u help
    i have added detailed question in stackoverflow please find below URL

    https://stackoverflow.com/questions/51308687/org-springframework-batch-item-itemstreamexception-failed-to-initialize-the-rea/51373667#51373667

    It will be very helpful

    Thanks
    Pradeep

    Reply
  • Hello Petri

    I am using a JdbcCursorItemReader and I have a complex sql query to execute, with a lot of group by and parameters.
    Ex.
    JdbcCursorItemReader reader = new JdbcCursorItemReader()
    setDatasource
    set Sql(“Select ‘a lot of group by’ where country = ? and date = ?”)
    I am trying to pass these parameters with a PreparedStatementSetter but It is not working, throwing an Exception “Index 1 is out of range”
    Do you have an example with JdbcItemReader with PreparedStatementSetter?

    Thanks a lot and congrats
    Jorge

    Reply

Leave a Comment