Spring Batch Tutorial: Reading Information From a Relational Database

The previous parts of my Spring Batch tutorial described how you can read information from CSV and XML files. These are definitely useful skills, but if you want to write real-life batch jobs, you have to know how you can read the input data of a Spring Batch job from a relational database.

After you have read this blog post, you:

  • Can read the input data of your batch job by using a database cursor.
  • Understand how you can read the input data of your batch job by using pagination.

Let's start by taking a quick look at the example application.

This blog post assumes that:

Introduction to the Example Application

During this blog post you will read the input data of your batch job from a relational database which contains the student information of an online course. The student information is found from the students table that has the following columns:

  • The email_address column contains the email address of the student.
  • The name column contains the name of the student.
  • The purchased_package column contains the name of the purchased package.

The ItemReader which reads the student list from a relational database must return StudentDTO objects which are sorted in ascending order by using the email address of the student as a sort criteria. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {
  
    private String emailAddress;
    private String name;
    private String purchasedPackage;
  
    public StudentDTO() {}
  
    public String getEmailAddress() {
        return emailAddress;
    }
  
    public String getName() {
        return name;
    }
  
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
  
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
  
    public void setName(String name) {
        this.name = name;
    }
  
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Next, you will find out how you can read the input data of your batch job by using a database cursor.

Reading the Input Data of Your Batch Job by Using a Database Cursor

You can provide the input data for your batch job by configuring an ItemReader bean. Because you must read the student information from a relational database by using a database cursor, you have to configure this bean by following these steps:

First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class DatabaseCursorExampleJobConfig {
}

Second, you have to create a method that configures your ItemReader bean. This method must take a DataSource object as a method parameter and return an ItemReader<StudentDTO> object. After you have created this method, the source code of your configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import javax.sql.DataSource;
 
@Configuration
public class DatabaseCursorExampleJobConfig {
 
    @Bean
    public ItemReader<StudentDTO> itemReader(DataSource dataSource) {
         
    }
}

Third, you have to configure your ItemReader bean by following these steps:

  1. Create a new JdbcCursorItemReaderBuilder<StudentDTO> object. This object provides a fluent API which allows you to build a new JdbcCursorItemReader<StudentDTO> object that reads the input data by opening a JDBC cursor and continuously retrieves the next row from the ResultSet.
  2. Configure the name of the created JdbcCursorItemReader<StudentDTO> object.
  3. Configure the data source that provides database connections to the created JdbcCursorItemReader<StudentDTO> object.
  4. Configure the SQL query that's used to query the input data from the used database.
  5. Configure the RowMapper<T> object that transforms the information of a single database row into a new T object. Because the field names of the StudentDTO class matches with the column names of the students table, you can configure the used RowMapper by creating a new BeanPropertyRowMapper<StudentDTO> object.
  6. Return the created JdbcCursorItemReader<StudentDTO> object.

After you have configured your ItemReader bean, the source code of your configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.database.builder.JdbcCursorItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.BeanPropertyRowMapper;

import javax.sql.DataSource;

@Configuration
public class DatabaseCursorExampleJobConfig {

    private static final String QUERY_FIND_STUDENTS =
            "SELECT " +
                    "email_address, " +
                    "name, " +
                    "purchased_package " +
            "FROM STUDENTS " +
            "ORDER BY email_address ASC";

    @Bean
    public ItemReader<StudentDTO> itemReader(DataSource dataSource) {
        return new JdbcCursorItemReaderBuilder<StudentDTO>()
                .name("cursorItemReader")
                .dataSource(dataSource)
                .sql(QUERY_FIND_STUDENTS)
                .rowMapper(new BeanPropertyRowMapper<>(StudentDTO.class))
                .build();
    }
}

Let's find out how you can read the input data of your batch job by using pagination.

Reading the Input Data of Your Batch Job by Using Pagination

You can provide the input data for your batch job by configuring an ItemReader bean. Because you must read the student information from a relational database by using pagination, you have to configure this bean by following these steps:

First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class JDBCPaginationExampleJobConfig {
}

Second, you have create a method that configures the SqlPagingQueryProviderFactoryBean bean and ensure that this method takes a DataSource object as a method parameter. The SqlPagingQueryProviderFactoryBean class has two responsibilities:

  • It auto-detects the used database.
  • It determines the appropriate PagingQueryProvider implementation that constructs the SQL queries which fetch the paginated data from the database.

After you have created this method, you have to implement it by following these steps:

  1. Create a new SqlPagingQueryProviderFactoryBean object.
  2. Configure the data source which is used to determine the type of the used database.
  3. Specify the SELECT and FROM clauses which query all rows from the source table.
  4. Specify the sort keys which are used to sort the query results of the executed database queries.
  5. Return the created SqlPagingQueryProviderFactoryBean object.

After you have configured the SqlPagingQueryProviderFactoryBean bean, the source code of your configuration class looks as follows:

import org.springframework.batch.item.database.Order;
import org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

import javax.sql.DataSource;
import java.util.HashMap;
import java.util.Map;

@Configuration
public class JDBCPaginationExampleJobConfig {
    
    @Bean
    public SqlPagingQueryProviderFactoryBean queryProvider(DataSource dataSource) {
        SqlPagingQueryProviderFactoryBean provider = 
                new SqlPagingQueryProviderFactoryBean();

        provider.setDataSource(dataSource);
        provider.setSelectClause("SELECT email_address, name, purchased_package");
        provider.setFromClause("FROM students");
        provider.setSortKeys(sortByEmailAddressAsc());

        return provider;
    }

    private Map<String, Order> sortByEmailAddressAsc() {
        Map<String, Order> sortConfiguration = new HashMap<>();
        sortConfiguration.put("email_address", Order.ASCENDING);
        return sortConfiguration;
    }
}

Third, you have to create a method that configures your ItemReader bean. This method must take DataSource and PagingQueryProvider objects as method parameters, and it must return an ItemReader<StudentDTO> object.

After you have created this method, you must implement it by following these steps:

  1. Create a new JdbcPagingItemReaderBuilder<StudentDTO> object. This object provides a fluent API which allows you to build a new JdbcPagingItemReader<StudentDTO> object that reads the input data of your batch job one page at a time by using JDBC.
  2. Configure the name of the created JdbcPagingItemReader<StudentDTO> object.
  3. Configure the data source that provides database connections to the created JdbcPagingItemReader<StudentDTO> object.
  4. Configure the used page size. You can use one as page size because the students table has only three rows. However, if this would be a real-life application, you should select the best page size by profiling your batch job.
  5. Configure the used PagingQueryProvider object. As you remember, this object constructs the SQL queries which fetch the paginated data from the database.
  6. Configure the RowMapper<T> object that transforms the information of a single database row into a new T object. Because the field names of the StudentDTO class matches with the column names of the students table, you can configure the used RowMapper by creating a new BeanPropertyRowMapper<StudentDTO> object.
  7. Return the created JdbcPagingItemReader<StudentDTO> object.

After you have configured your ItemReader bean, the source code of your configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.database.Order;
import org.springframework.batch.item.database.PagingQueryProvider;
import org.springframework.batch.item.database.builder.JdbcPagingItemReaderBuilder;
import org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.jdbc.core.BeanPropertyRowMapper;

import javax.sql.DataSource;
import java.util.HashMap;
import java.util.Map;

@Configuration
public class JDBCPaginationExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader(DataSource dataSource,
                                             PagingQueryProvider queryProvider) {
        return new JdbcPagingItemReaderBuilder<StudentDTO>()
                .name("pagingItemReader")
                .dataSource(dataSource)
                .pageSize(1)
                .queryProvider(queryProvider)
                .rowMapper(new BeanPropertyRowMapper<>(StudentDTO.class))
                .build();
    }

    @Bean
    public SqlPagingQueryProviderFactoryBean queryProvider(DataSource dataSource) {
        SqlPagingQueryProviderFactoryBean provider =
                new SqlPagingQueryProviderFactoryBean();

        provider.setDataSource(dataSource);
        provider.setSelectClause("SELECT email_address, name, purchased_package");
        provider.setFromClause("FROM students");
        provider.setSortKeys(sortByEmailAddressAsc());

        return provider;
    }

    private Map<String, Order> sortByEmailAddressAsc() {
        Map<String, Order> sortConfiguration = new HashMap<>();
        sortConfiguration.put("email_address", Order.ASCENDING);
        return sortConfiguration;
    }
}
}
Because the implementations of the PagingQueryProvider interface paginate the query results by using database specific features, the performance of this solution depends from the size of your dataset and the used database.

Additional Reading:

You can now read the input data of your batch job from a relational database. Let's summarize what you learned from this blog post.

Summary

This blog post has taught you three things:

  • If you want to read the input data of your batch job from a relational database by using a database cursor, you can build the JdbcCursorItemReader<T> object by using the JdbcCursorItemReaderBuilder<T> class.
  • If you want to read the input data of your batch job from a relational database by using pagination, you can auto-detect the used database and select the approriate pagination logic by configuring the SqlPagingQueryProviderFactoryBean bean.
  • If you want to read the input data of your batch job from a relational database by using pagination, you can build the JdbcPagingItemReader<T> object by using the JdbcPagingItemReaderBuilder<T> class.

The next part of this tutorial describes how you can implement a custom ItemReader.

P.S. You can get the example applications of this blog post from Github.

12 comments… add one
  • Chakib Oct 2, 2016 @ 3:02

    Hi Petri,

    thanks for the interesting tutorial series.

    As far as i know setFetchsize only defines how many rows are returned by the database in one chunk, but after all, the elements are all loaded in memory before being processed no ?

    thanks

    • Petri Oct 4, 2016 @ 0:10

      Hi Chakib,

      Are you talking about the setPageSize() method? In any case, I have the same impression as you. Although I haven't debugged it, the Javadoc of that that method states that it configures the number of rows which are retrieved at a time.

      Also, it doesn't make any sense that you could override the chunk size which determines the "length" of the transaction by simply setting the page size of your reader.

  • Vamshi Sep 22, 2017 @ 13:38

    Hi Petri,
    Initially thanks for the tutorial. My scenario is reading the data from on table and writing the records into another table using HibernateCursorItemReader and HibernateItemWriter. Is it possible.
    If possible how can I transfer data from reader to writer through xml configuration.

    Thanks.

  • stanley Dec 12, 2017 @ 12:27

    Hi Petri,
    Thanks for the great tutorial..
    I'm wondering can we use JPA / Hibernate entity or model to read from or write to database with this spring batch? Thank you.

    • Petri Dec 13, 2017 @ 21:57

      Hi,

      If you are reading information from the database, you can use the ItemReader that supports Hibernate or JPA.

      On the other hand, if you are writing information to the database, you have to implement a custom ItemWriter that writes the processed information to the database by using Hibernate or JPA.

      • stanley Dec 14, 2017 @ 6:00

        Hi Petri,
        Thank you for the reply.
        I have been tasked to execute nested select queries a step in my batch job
        I was thinking maybe I need to execute the select queries a couple times from inside-out (from inner ones first)
        Or is it better to just use something like JPA's @NamedQuery for these complex subqueries?
        What do you think?

        • Petri Dec 19, 2017 @ 19:55

          Hi,

          I think that the best solution always depends from the queries you want to run. Also, if you have to optimize your queries, it always a good idea to profile both implementations because the "logical" choice might no always be the fastest one. And one last thing, try to use SQL as much as you can. I have noticed that using SQL has a dramatic positive impact to the performance of my batch jobs.

  • Pradeep Huded Jul 20, 2018 @ 15:01

    Hi Petri,

    I am using paging itemreader Please can u help
    i have added detailed question in stackoverflow please find below URL

    https://stackoverflow.com/questions/51308687/org-springframework-batch-item-itemstreamexception-failed-to-initialize-the-rea/51373667#51373667

    It will be very helpful

    Thanks
    Pradeep

  • Jorge Feb 13, 2019 @ 22:35

    Hello Petri

    I am using a JdbcCursorItemReader and I have a complex sql query to execute, with a lot of group by and parameters.
    Ex.
    JdbcCursorItemReader reader = new JdbcCursorItemReader()
    setDatasource
    set Sql("Select 'a lot of group by' where country = ? and date = ?")
    I am trying to pass these parameters with a PreparedStatementSetter but It is not working, throwing an Exception "Index 1 is out of range"
    Do you have an example with JdbcItemReader with PreparedStatementSetter?

    Thanks a lot and congrats
    Jorge

  • VishwasRaj Jul 20, 2022 @ 14:55

    Hi I have scenario where I want to read data row by row for that using JdbcCursorItemReader and after processing need to update same record in same table using ItemWriter and jdbcTemplate with same datasource what is used in reader did it, chunk size is 1 and FetchSize is 1, there are millions of records, issue faced is connection closed and job fails after long time.

    • Petri Jul 27, 2022 @ 8:58

      Can I see the full stack trace? Also, does your Spring Batch job update the records which are processed before the database connection is closed?

Leave a Reply