The previous parts of my Spring Batch tutorial described how you can read information from CSV and XML files. These are definitely useful skills, but if you want to write real-life batch jobs, you have to know how you can read the input data of a Spring Batch job from a relational database.
After you have read this blog post, you:
- Can read the input data of your batch job by using a database cursor.
- Understand how you can read the input data of your batch job by using pagination.
Let's start by taking a quick look at the example application.
- You are familiar with Spring Batch
- You can get the required dependencies with Maven or Gradle
Introduction to the Example Application
During this blog post you will read the input data of your batch job from a relational database which contains the student information of an online course. The student information is found from the students
table that has the following columns:
- The
email_address
column contains the email address of the student. - The
name
column contains the name of the student. - The
purchased_package
column contains the name of the purchased package.
The ItemReader
which reads the student list from a relational database must return StudentDTO
objects which are sorted in ascending order by using the email address of the student as a sort criteria. The StudentDTO
class contains the information of a single student, and its source code looks as follows:
public class StudentDTO { private String emailAddress; private String name; private String purchasedPackage; public StudentDTO() {} public String getEmailAddress() { return emailAddress; } public String getName() { return name; } public String getPurchasedPackage() { return purchasedPackage; } public void setEmailAddress(String emailAddress) { this.emailAddress = emailAddress; } public void setName(String name) { this.name = name; } public void setPurchasedPackage(String purchasedPackage) { this.purchasedPackage = purchasedPackage; } }
Next, you will find out how you can read the input data of your batch job by using a database cursor.
Reading the Input Data of Your Batch Job by Using a Database Cursor
You can provide the input data for your batch job by configuring an ItemReader
bean. Because you must read the student information from a relational database by using a database cursor, you have to configure this bean by following these steps:
First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:
import org.springframework.context.annotation.Configuration; @Configuration public class DatabaseCursorExampleJobConfig { }
Second, you have to create a method that configures your ItemReader
bean. This method must take a DataSource
object as a method parameter and return an ItemReader<StudentDTO>
object. After you have created this method, the source code of your configuration class looks as follows:
import org.springframework.batch.item.ItemReader; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import javax.sql.DataSource; @Configuration public class DatabaseCursorExampleJobConfig { @Bean public ItemReader<StudentDTO> itemReader(DataSource dataSource) { } }
Third, you have to configure your ItemReader
bean by following these steps:
- Create a new
JdbcCursorItemReaderBuilder<StudentDTO>
object. This object provides a fluent API which allows you to build a newJdbcCursorItemReader<StudentDTO>
object that reads the input data by opening a JDBC cursor and continuously retrieves the next row from theResultSet
. - Configure the name of the created
JdbcCursorItemReader<StudentDTO>
object. - Configure the data source that provides database connections to the created
JdbcCursorItemReader<StudentDTO>
object. - Configure the SQL query that's used to query the input data from the used database.
- Configure the
RowMapper<T>
object that transforms the information of a single database row into a newT
object. Because the field names of theStudentDTO
class matches with the column names of thestudents
table, you can configure the usedRowMapper
by creating a newBeanPropertyRowMapper<StudentDTO>
object. - Return the created
JdbcCursorItemReader<StudentDTO>
object.
After you have configured your ItemReader
bean, the source code of your configuration class looks as follows:
import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.database.builder.JdbcCursorItemReaderBuilder; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.jdbc.core.BeanPropertyRowMapper; import javax.sql.DataSource; @Configuration public class DatabaseCursorExampleJobConfig { private static final String QUERY_FIND_STUDENTS = "SELECT " + "email_address, " + "name, " + "purchased_package " + "FROM STUDENTS " + "ORDER BY email_address ASC"; @Bean public ItemReader<StudentDTO> itemReader(DataSource dataSource) { return new JdbcCursorItemReaderBuilder<StudentDTO>() .name("cursorItemReader") .dataSource(dataSource) .sql(QUERY_FIND_STUDENTS) .rowMapper(new BeanPropertyRowMapper<>(StudentDTO.class)) .build(); } }
Let's find out how you can read the input data of your batch job by using pagination.
Reading the Input Data of Your Batch Job by Using Pagination
You can provide the input data for your batch job by configuring an ItemReader
bean. Because you must read the student information from a relational database by using pagination, you have to configure this bean by following these steps:
First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:
import org.springframework.context.annotation.Configuration; @Configuration public class JDBCPaginationExampleJobConfig { }
Second, you have create a method that configures the SqlPagingQueryProviderFactoryBean
bean and ensure that this method takes a DataSource
object as a method parameter. The SqlPagingQueryProviderFactoryBean
class has two responsibilities:
- It auto-detects the used database.
- It determines the appropriate
PagingQueryProvider
implementation that constructs the SQL queries which fetch the paginated data from the database.
After you have created this method, you have to implement it by following these steps:
- Create a new
SqlPagingQueryProviderFactoryBean
object. - Configure the data source which is used to determine the type of the used database.
- Specify the
SELECT
andFROM
clauses which query all rows from the source table. - Specify the sort keys which are used to sort the query results of the executed database queries.
- Return the created
SqlPagingQueryProviderFactoryBean
object.
After you have configured the SqlPagingQueryProviderFactoryBean
bean, the source code of your configuration class looks as follows:
import org.springframework.batch.item.database.Order; import org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import javax.sql.DataSource; import java.util.HashMap; import java.util.Map; @Configuration public class JDBCPaginationExampleJobConfig { @Bean public SqlPagingQueryProviderFactoryBean queryProvider(DataSource dataSource) { SqlPagingQueryProviderFactoryBean provider = new SqlPagingQueryProviderFactoryBean(); provider.setDataSource(dataSource); provider.setSelectClause("SELECT email_address, name, purchased_package"); provider.setFromClause("FROM students"); provider.setSortKeys(sortByEmailAddressAsc()); return provider; } private Map<String, Order> sortByEmailAddressAsc() { Map<String, Order> sortConfiguration = new HashMap<>(); sortConfiguration.put("email_address", Order.ASCENDING); return sortConfiguration; } }
Third, you have to create a method that configures your ItemReader
bean. This method must take DataSource
and PagingQueryProvider
objects as method parameters, and it must return an ItemReader<StudentDTO>
object.
After you have created this method, you must implement it by following these steps:
- Create a new
JdbcPagingItemReaderBuilder<StudentDTO>
object. This object provides a fluent API which allows you to build a newJdbcPagingItemReader<StudentDTO>
object that reads the input data of your batch job one page at a time by using JDBC. - Configure the name of the created
JdbcPagingItemReader<StudentDTO>
object. - Configure the data source that provides database connections to the created
JdbcPagingItemReader<StudentDTO>
object. - Configure the used page size. You can use one as page size because the
students
table has only three rows. However, if this would be a real-life application, you should select the best page size by profiling your batch job. - Configure the used
PagingQueryProvider
object. As you remember, this object constructs the SQL queries which fetch the paginated data from the database. - Configure the
RowMapper<T>
object that transforms the information of a single database row into a newT
object. Because the field names of theStudentDTO
class matches with the column names of thestudents
table, you can configure the usedRowMapper
by creating a newBeanPropertyRowMapper<StudentDTO>
object. - Return the created
JdbcPagingItemReader<StudentDTO>
object.
After you have configured your ItemReader
bean, the source code of your configuration class looks as follows:
import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.database.Order; import org.springframework.batch.item.database.PagingQueryProvider; import org.springframework.batch.item.database.builder.JdbcPagingItemReaderBuilder; import org.springframework.batch.item.database.support.SqlPagingQueryProviderFactoryBean; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.jdbc.core.BeanPropertyRowMapper; import javax.sql.DataSource; import java.util.HashMap; import java.util.Map; @Configuration public class JDBCPaginationExampleJobConfig { @Bean public ItemReader<StudentDTO> itemReader(DataSource dataSource, PagingQueryProvider queryProvider) { return new JdbcPagingItemReaderBuilder<StudentDTO>() .name("pagingItemReader") .dataSource(dataSource) .pageSize(1) .queryProvider(queryProvider) .rowMapper(new BeanPropertyRowMapper<>(StudentDTO.class)) .build(); } @Bean public SqlPagingQueryProviderFactoryBean queryProvider(DataSource dataSource) { SqlPagingQueryProviderFactoryBean provider = new SqlPagingQueryProviderFactoryBean(); provider.setDataSource(dataSource); provider.setSelectClause("SELECT email_address, name, purchased_package"); provider.setFromClause("FROM students"); provider.setSortKeys(sortByEmailAddressAsc()); return provider; } private Map<String, Order> sortByEmailAddressAsc() { Map<String, Order> sortConfiguration = new HashMap<>(); sortConfiguration.put("email_address", Order.ASCENDING); return sortConfiguration; } } }
PagingQueryProvider
interface paginate the query results by using database specific features, the performance of this solution depends from the size of your dataset and the used database.
Additional Reading:
- Spring Batch Reference Documentation: Paging ItemReader Implementations
- The Javadoc of the
SqlPagingQueryProviderFactoryBean
class. - The Javadoc of the
JdbcPagingItemReaderBuilder<T>
class - The Javadoc of the
JdbcPagingItemReader<T>
class - The Javadoc of the
PagingQueryProvider
interface - The Javadoc of the
RowMapper<T>
interface - The Javadoc of the
BeanPropertyRowMapper<T>
class
You can now read the input data of your batch job from a relational database. Let's summarize what you learned from this blog post.
Summary
This blog post has taught you three things:
- If you want to read the input data of your batch job from a relational database by using a database cursor, you can build the
JdbcCursorItemReader<T>
object by using theJdbcCursorItemReaderBuilder<T>
class. - If you want to read the input data of your batch job from a relational database by using pagination, you can auto-detect the used database and select the approriate pagination logic by configuring the
SqlPagingQueryProviderFactoryBean
bean. - If you want to read the input data of your batch job from a relational database by using pagination, you can build the
JdbcPagingItemReader<T>
object by using theJdbcPagingItemReaderBuilder<T>
class.
The next part of this tutorial describes how you can implement a custom ItemReader
.
P.S. You can get the example applications of this blog post from Github.
Hi Petri,
thanks for the interesting tutorial series.
As far as i know setFetchsize only defines how many rows are returned by the database in one chunk, but after all, the elements are all loaded in memory before being processed no ?
thanks
Hi Chakib,
Are you talking about the
setPageSize()
method? In any case, I have the same impression as you. Although I haven't debugged it, the Javadoc of that that method states that it configures the number of rows which are retrieved at a time.Also, it doesn't make any sense that you could override the chunk size which determines the "length" of the transaction by simply setting the page size of your reader.
Hi Petri,
Initially thanks for the tutorial. My scenario is reading the data from on table and writing the records into another table using HibernateCursorItemReader and HibernateItemWriter. Is it possible.
If possible how can I transfer data from reader to writer through xml configuration.
Thanks.
Hi Petri,
Thanks for the great tutorial..
I'm wondering can we use JPA / Hibernate entity or model to read from or write to database with this spring batch? Thank you.
Hi,
If you are reading information from the database, you can use the
ItemReader
that supports Hibernate or JPA.On the other hand, if you are writing information to the database, you have to implement a custom
ItemWriter
that writes the processed information to the database by using Hibernate or JPA.Hi Petri,
Thank you for the reply.
I have been tasked to execute nested select queries a step in my batch job
I was thinking maybe I need to execute the select queries a couple times from inside-out (from inner ones first)
Or is it better to just use something like JPA's @NamedQuery for these complex subqueries?
What do you think?
Hi,
I think that the best solution always depends from the queries you want to run. Also, if you have to optimize your queries, it always a good idea to profile both implementations because the "logical" choice might no always be the fastest one. And one last thing, try to use SQL as much as you can. I have noticed that using SQL has a dramatic positive impact to the performance of my batch jobs.
Hi Petri,
I am using paging itemreader Please can u help
i have added detailed question in stackoverflow please find below URL
https://stackoverflow.com/questions/51308687/org-springframework-batch-item-itemstreamexception-failed-to-initialize-the-rea/51373667#51373667
It will be very helpful
Thanks
Pradeep
Hello Petri
I am using a JdbcCursorItemReader and I have a complex sql query to execute, with a lot of group by and parameters.
Ex.
JdbcCursorItemReader reader = new JdbcCursorItemReader()
setDatasource
set Sql("Select 'a lot of group by' where country = ? and date = ?")
I am trying to pass these parameters with a PreparedStatementSetter but It is not working, throwing an Exception "Index 1 is out of range"
Do you have an example with JdbcItemReader with PreparedStatementSetter?
Thanks a lot and congrats
Jorge
Hi I have scenario where I want to read data row by row for that using JdbcCursorItemReader and after processing need to update same record in same table using ItemWriter and jdbcTemplate with same datasource what is used in reader did it, chunk size is 1 and FetchSize is 1, there are millions of records, issue faced is connection closed and job fails after long time.
Can I see the full stack trace? Also, does your Spring Batch job update the records which are processed before the database connection is closed?