It is pretty easy to create a Spring Batch job that reads its input data from a CSV or an XML file because these file formats are supported out of the box.
However, if you want to read the input data of your batch job from a .XLS or .XLSX file that was created with Excel, you have to work a bit harder. This blog post helps you to solve that problem.
Let's get started.
- Spring Batch Tutorial: Introduction specifies the term batch job, explains why you should use Spring Batch, and identifies the basic building blocks of a Spring Batch job.
- Spring Batch Tutorial: Getting the Required Dependencies With Maven describes how you can get Spring Batch dependencies with Maven.
- Spring Batch Tutorial: Getting the Required Dependencies With Gradle describes how you can get Spring Batch dependencies with Gradle.
- Spring Batch Tutorial: Reading Information From a File describes how you can read information from CSV and XML files.
Introduction to the Example Application
During this blog post you will learn to configure an ItemReader
bean that can read the input data of your batch job from an Excel file. This file contains a student list that provides the following information for your batch job:
- The name of the student.
- The email address of the student.
- The name of the purchased package.
When you read the student information from an Excel file, you have to transform that information into StudentDTO
objects which are processed by your batch job. The StudentDTO
class contains the information of a single student, and its source code looks as follows:
public class StudentDTO { private String emailAddress; private String name; private String purchasedPackage; public StudentDTO() {} public String getEmailAddress() { return emailAddress; } public String getName() { return name; } public String getPurchasedPackage() { return purchasedPackage; } public void setEmailAddress(String emailAddress) { this.emailAddress = emailAddress; } public void setName(String name) { this.name = name; } public void setPurchasedPackage(String purchasedPackage) { this.purchasedPackage = purchasedPackage; } }
Before you can configure the ItemReader
that reads student information from your Excel file, you have to add a few dependencies to the classpath.
Getting the Required Dependencies
If you want to read the input data of your Spring Batch job from an Excel document, you have to add the following dependencies to the classpath:
- Spring Batch Excel is a Spring Batch extension that provides
ItemReader
implementations for Excel. Unfortunately at the moment the only way to get the required jar file is to build it from the source. - Apache POI provides a Java API which helps you to read (and create) Microsoft Office documents. It's an optional dependency of Spring Batch Excel, and you can use it for reading the input data of your batch job from .XLS and .XLSX documents.
Additional Reading:
After you have added the required dependencies to the classpath, you can finally configure the ItemReader
that reads the student information from your Excel spreadsheet.
Reading Information From an Excel File
The students.xlsx file contains the student list of an online course. This file is found from the classpath and its full path is: data/students.xlsx. The content of this Excel spreadsheet looks as follows:
NAME |EMAIL_ADDRESS |PURCHASED_PACKAGE Tony Tester |tony.tester@gmail.com |master Nick Newbie |nick.newbie@gmail.com |starter Ian Intermediate|ian.intermediate@gmail.com |intermediate
As you already know, you can provide the input data for your Spring batch job by configuring an ItemReader
bean. Because you must read the input data of your batch job from an Excel document, you can configure the ItemReader
bean by following these steps:
- Create an
ExcelFileToDatabaseJobConfig
class and annotate it with the@Configuration
annotation. This class is the configuration class of your batch job, and it contains the beans that describe the flow of your batch job. - Create a method that configures your
ItemReader
bean and ensure that the method returns anItemReader<StudentDTO>
object. - Implement the created method by following these steps:
- Create a new
PoiItemReader<StudentDTO>
object. - Ensure that the created reader ignores the header of your spreadsheet.
- Configure the created reader to read the student information from the
data/students.xlsx
file that's found from the classpath. - Configure the reader to transform a student information row into a
StudentDTO
object with theBeanWrapperRowMapper
class. This class populates the fields of the createdStudentDTO
object by using the column names given on the header row of your spreadsheet. - Return the created
PoiItemReader<StudentDTO>
object.
- Create a new
The source code of the ExcelFileToDatabaseJobConfig
class looks as follows:
import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.excel.RowMapper; import org.springframework.batch.item.excel.mapping.BeanWrapperRowMapper; import org.springframework.batch.item.excel.poi.PoiItemReader; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.env.Environment; import org.springframework.core.io.ClassPathResource; @Configuration public class ExcelFileToDatabaseJobConfig { @Bean ItemReader<StudentDTO> excelStudentReader() { PoiItemReader<StudentDTO> reader = new PoiItemReader<>(); reader.setLinesToSkip(1); reader.setResource(new ClassPathResource("data/students.xlsx")); reader.setRowMapper(excelRowMapper()); return reader; } private RowMapper<StudentDTO> excelRowMapper() { BeanWrapperRowMapper<StudentDTO> rowMapper = new BeanWrapperRowMapper<>(); rowMapper.setTargetType(StudentDTO.class); return rowMapper; } }
This approach works as long as your Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the StudentDTO
class.
However, it's entirely possible that you have to read the input data from a spreadsheet that doesn't have a header row. If this is the case, you have to create a custom RowMapper
that transforms the rows of your spreadsheet into StudentDTO
objects.
You can create a custom RowMapper
by following these steps:
- Create a
StudentExcelRowMapper
class. - Implement the
RowMapper<T>
interface and pass the type of created object (StudentDTO
) as a type parameter. - Implement the
T mapRow(RowSet rowSet)
method of theRowMapper<T>
interface by following these steps:- Create a new
StudentDTO
object. - Populate the field values of the created object. You can read the column values of the processed row by invoking the
getColumnValue(int columnIndex)
method of theRowSet
interface. Also, you must remember that the index of the first column is 0. - Return the created
StudentDTO
object.
- Create a new
The source code of the StudentExcelRowMapper
class looks as follows:
import org.springframework.batch.item.excel.RowMapper; import org.springframework.batch.item.excel.support.rowset.RowSet; public class StudentExcelRowMapper implements RowMapper<StudentDTO> { @Override public StudentDTO mapRow(RowSet rowSet) throws Exception { StudentDTO student = new StudentDTO(); student.setName(rowSet.getColumnValue(0)); student.setEmailAddress(rowSet.getColumnValue(1)); student.setPurchasedPackage(rowSet.getColumnValue(2)); return student; } }
After you have created your custom row mapper, you have to make the following changes to the configuration of your ItemReader
bean:
- Ensure that the your
ItemReader
doesn't ignore the first line of the input data. - Replace the old
excelRowMapper()
method with a method that returns a newStudentExcelRowMapper
object.
After you have made these changes to the ExcelFileToDatabaseJobConfig
class, its source code looks as follows:
import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.excel.RowMapper; import org.springframework.batch.item.excel.poi.PoiItemReader; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; @Configuration public class ExcelFileToDatabaseJobConfig { @Bean ItemReader<StudentDTO> excelStudentReader() { PoiItemReader<StudentDTO> reader = new PoiItemReader<>(); reader.setResource(new ClassPathResource("data/students.xlsx")); reader.setRowMapper(excelRowMapper()); return reader; } private RowMapper<StudentDTO> excelRowMapper() { return new StudentExcelRowMapper(); } }
Let's summarize what you learned from this blog post.
Summary
This blog post has taught you four things:
- If you want to read the input data of a Spring Batch job from an Excel spreadsheet, you have to add Spring Batch Excel and Apache POI dependencies to the classpath.
- If you want to read the input data of your batch job by using Spring Batch Excel and Apache POI, you have to use the
PoiItemReader
class. - You can map the rows of your spreadsheet into
T
objects by using theBeanWrapperRowMapper<T>
class as long as your Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of theT
class. - If your Excel spreadsheet doesn't have a header row or the column names of the header row cannot be resolved into the field names of the
T
class, you have to create a custom row mapper component that implements theRowMapper<T>
interface.
The next part of this tutorial describes how you can write the output data of your batch job to a CSV file.
P.S. You can get the example applications of this blog post from Github: Spring example and Spring Boot example.
Hi,
Could you give an example of passing the resource path dynamically.Instead of hardcoding the file's path in the reader.
Normally this would be dynamic, you could have a website where people upload files and they have to be processed.
Thanks Petri, I am facing the similar issue as described by Amos. Need to pass in a file dynamically rather than hard coding it. Help would be appreciated.
Thanks!
Please provide the sample code for the above example...
Thank you.
Your example is very good please send code for dynamic resources path to excel upload using spring batch.
Hi,
I don't know how you can use a "dynamic file name" and use Spring Batch Excel. If you write your own
ItemReader
, you can of course implement the file reading logic as you wish, but this means that you have to also write the code that reads the content of the input file.One option is that you could create your own
Resource
that can locate input file dynamically, but I am not sure if this possible and how it should be implemented.Hi,
I have an excel file which contains rowspan and colspan. Can you please share an example which we can read span data from excel file.
I tried the example to read excel file by using spring batch. I have kept only one xlsx file i.e. Student.xlsx file. For the first time it works fine but the second time when scheduler starts the job it is giving me exception 'java.lang.IllegalArgumentException: Sheet index (1) is out of range (0..0)'.
Mine xlsx file has only one sheet. May I know from where is it incrementing the sheet index?
Thanks.
Hmm. I think that you have just found a bug from my example. I will take a closer look at it.
Hey, have you fixed it? sorry I Ask but it happes to me too
Ah. I forgot to report my findings. I am sorry about that :(
In any way, it seems that this problem is caused by the Spring Batch Excel. The problem is that it doesn't reset the number of the current sheet after it has processed the input file. This means that when the job is run for the second time, it fails because the sheet cannot be opened (because it doesn't exist).
In other words, you can fix this problem by cloning the Spring Batch Excel extension and making the required change to the
AbstractExcelItemReader
class. After you have made the required change, you have to create new jars and use them instead of the jars that are provided by this example.Do you have an example where the Excel file is being submitted as Multipartfile.? Your example reads the excel file from the classpath and sets the resource on the Item reader. I am hoping for an example where the job is triggered when a user sends a request with the excel file payload as such open an input stream read the excel. If you we are reading the excel file as input stream, how do we set the resource on the reader bean?
Unfortunately I don't have an example that reads the Excel file from an HTTP request. However, you could save the uploaded file to a directory and read the file from the upload directory by using Spring Batch. Actually, I will add this example to my to-do list and I maybe I will implement it after I have released my testing course.
Have you had a chance to implement the uploading Excel file with HTTP request?
Hi,
Unfortunately I haven't had any time to write that example (I am still recording my testing course).
Did you get a chance to implement the uploading Excel file with HTTP request?I googled so many things but not finding any proper solution.
Hi,
I am sorry that it took me some time to answer to your comment. As you probably guessed, I am still recording my testing course. That being said, the course should be finally done after a few weeks, and I can concentrate on writing more content to my blog.
About your problem: I haven't been able to find a proper solution to it because Spring Batch doesn't provide a good support for reading data from files that are determined at runtime. I assume that it's possible to support this use case, but I just haven't found the solution yet.
How about if I want to read item from a particular sheet? how to do it in reader ya? Thank You.
I took a quick look at the source of the
PoiItemReader
andAbstractExcelItemReader
classes, and it seems that there is no way to set the opened sheet (or at least I couldn't find it). In other words, if you want to do this, yo have to make the required changes directly to the source code of Spring Batch Excel.The main classes of PIO have that option. but the Extension of spring batch doesnt. you just set the sheet you want using .getSheet(X)
I have an requirement where i need to read dynamic excel sheets and process same and write in db. All of the examples i have seen so far requires a dto to access the excel file.Is there anyway i can read excel sheet dynamically without dto ?
do you resolve this problem ?? i have the same problem
Hi,
You have to implement a custom
ItemReader
that reads the input file by using Apache POI. Unfortunately I cannot give you the exact steps because they depend from the structure of the input file.I have added spring batch excel dependencies to my build script but I am getting error : import org.springframework.batch.item.excel.* cannot be resolved.
Are you using Maven or Gradle? Also, did your IDE reload the dependencies of your project after you made changes to your build script?
if you like to help.
https://stackoverflow.com/questions/50219092/how-to-import-from-xls-to-mysql-database-using-springbatch
Hi This helped me a lot. Unfortunately facing some issues.
excelStudentReader works fine for the firsttime. When i make a call again with the same excel, getting below error
Caused by: java.lang.IllegalArgumentException: Sheet index (3) is out of range (0..2)
I tried changing the below line in PoiItemReader getSheet()
return new PoiSheet(this.workbook.getSheetAt(sheet)); to return new PoiSheet(this.workbook.getSheetAt(0));
This time it worked but not as expected.
Can you please look into it.
Hi,
This is a known bug of the Spring Batch Excel library. It seems that the original project is abandoned and someone has created a fork that fixes this problem (check the bug report). I am going to update my Spring Batch examples during next summer, and I am going to solve this problem when I do so.
Hi This helped me a lot. Unfortunately facing some issues.
excelStudentReader works fine for the firsttime. When i make a call again with the same excel, getting below error
Caused by: java.lang.IllegalArgumentException: Sheet index (3) is out of range (0..2)
I tried changing the below line in PoiItemReader getSheet()
return new PoiSheet(this.workbook.getSheetAt(sheet)); to return new PoiSheet(this.workbook.getSheetAt(0));
This time it worked but not as expected.
Can you please update this article? That would very kind of you. :)
Hi,
This is a known bug of the Spring Batch Excel library. Luckily, the Github issue (check the link) provides a workaround for this problem. Nevertheless, I think that it's a good idea to rewrite this blog post at some point. Unfortunately, I don't know when I have the time to do it.
When I run this code it works, but keeps going past where the data in the spreadsheet ends. Is there a way to set a row range or a trigger to have the code stop reading the file?
Hi,
As far as I know, this is a known bug of the Spring Batch Excel library. I recommend that you take a look at the linked Github issue because it has a workaround for this problem.
Hi Petri,
Any workaround for this Caused by: java.lang.IllegalArgumentException: Sheet index (3) is out of range (0..2)
when job is running for second time.how to specify the sheet number
Hi,
Your problem is a known bug of the Spring Batch Excel library. I recommend that you take a look at the linked Github issue because it has a workaround for this problem (check out the linked repository).
Is this really working ? BeanWrapperRowMapper is not mapping the excel row into a DTO object . I have implemented as mentioned but all the bean properties are NULL. Can you please help me to fix the issue
Dobyoi have code for multiple sheets
How to read the Excel file data with chunk in spring batch? As i am facing issue its not able to pick the record for next chunk.
getColumnValue() method not showing for me, i am using 0.1.0 version
Is there any difference between StreamingXlsxItemReader and PoiItemReader ?
I'm getting heap memory at AWS with Poi.
The README of the Spring Batch Excel extension states that:
In other words, if I would be in your shoes, I would test if the
StreamingXlsxItemReader
supports all features used in the processed spreadsheet. If it doesn't, then you have to split the input spreadsheet into smaller spreadsheets and read the data found from the smaller spreadsheets by using thePoiItemReader
class.Poi reader reads row which are not set (empty ..keeps reading rows )and then throws null pointer exception. How do i stop it from reading empty rows
Hi,
The support for this (skipping empty rows) was added in this commit which was committed after this blog post was written. Also, note that the Spring Batch Excel is now part of the Spring Batch Extensions project (the old project is no longer maintained).
I recommend that you clone the repository of the Spring Batch Extensions project and build the
spring-batch-excel
jar from the source. After you have replaced the old jar file with the new one, thePoiItemReader
should skip empty rows.