Spring Batch Tutorial: Reading Information From an Excel File

It is pretty easy to create a Spring Batch job that reads its input data from a CSV or an XML file because these file formats are supported out of the box.

However, if you want to read the input data of your batch job from a .XLS or .XLSX file that was created with Excel, you have to work a bit harder. This blog post helps you to solve that problem.

Let's get started.

If you are not familiar with Spring Batch, you should read the following blog posts before you continue reading this blog post:

Introduction to the Example Application

During this blog post you will learn to configure an ItemReader bean that can read the input data of your batch job from an Excel file. This file contains a student list that provides the following information for your batch job:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

When you read the student information from an Excel file, you have to transform that information into StudentDTO objects which are processed by your batch job. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {
 
    private String emailAddress;
    private String name;
    private String purchasedPackage;
 
    public StudentDTO() {}
 
    public String getEmailAddress() {
        return emailAddress;
    }
 
    public String getName() {
        return name;
    }
 
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
 
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
 
    public void setName(String name) {
        this.name = name;
    }
 
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Before you can configure the ItemReader that reads student information from your Excel file, you have to add a few dependencies to the classpath.

Getting the Required Dependencies

If you want to read the input data of your Spring Batch job from an Excel document, you have to add the following dependencies to the classpath:

  • Spring Batch Excel is a Spring Batch extension that provides ItemReader implementations for Excel. Unfortunately at the moment the only way to get the required jar file is to build it from the source.
  • Apache POI provides a Java API which helps you to read (and create) Microsoft Office documents. It's an optional dependency of Spring Batch Excel, and you can use it for reading the input data of your batch job from .XLS and .XLSX documents.
I assume that most of you don't want to build Spring Batch Excel from the source. That's why I did it for you. Take a look at the example applications of this blog post: Spring example and Spring Boot example. These applications have working Gradle and Maven build scripts which use the Maven repository found from the repo directory.

Additional Reading:

After you have added the required dependencies to the classpath, you can finally configure the ItemReader that reads the student information from your Excel spreadsheet.

Reading Information From an Excel File

The students.xlsx file contains the student list of an online course. This file is found from the classpath and its full path is: data/students.xlsx. The content of this Excel spreadsheet looks as follows:

NAME			|EMAIL_ADDRESS				|PURCHASED_PACKAGE
Tony Tester		|tony.tester@gmail.com		|master
Nick Newbie		|nick.newbie@gmail.com		|starter
Ian Intermediate|ian.intermediate@gmail.com	|intermediate
Your spreadsheet doesn't contain pipe ('|') characters. This character is used here to separate the different cells from each other.

As you already know, you can provide the input data for your Spring batch job by configuring an ItemReader bean. Because you must read the input data of your batch job from an Excel document, you can configure the ItemReader bean by following these steps:

  1. Create an ExcelFileToDatabaseJobConfig class and annotate it with the @Configuration annotation. This class is the configuration class of your batch job, and it contains the beans that describe the flow of your batch job.
  2. Create a method that configures your ItemReader bean and ensure that the method returns an ItemReader<StudentDTO> object.
  3. Implement the created method by following these steps:
    1. Create a new PoiItemReader<StudentDTO> object.
    2. Ensure that the created reader ignores the header of your spreadsheet.
    3. Configure the created reader to read the student information from the data/students.xlsx file that's found from the classpath.
    4. Configure the reader to transform a student information row into a StudentDTO object with the BeanWrapperRowMapper class. This class populates the fields of the created StudentDTO object by using the column names given on the header row of your spreadsheet.
    5. Return the created PoiItemReader<StudentDTO> object.

The source code of the ExcelFileToDatabaseJobConfig class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.mapping.BeanWrapperRowMapper;
import org.springframework.batch.item.excel.poi.PoiItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class ExcelFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> excelStudentReader() {
        PoiItemReader<StudentDTO> reader = new PoiItemReader<>();
        reader.setLinesToSkip(1);
        reader.setResource(new ClassPathResource("data/students.xlsx"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

    private RowMapper<StudentDTO> excelRowMapper() {
        BeanWrapperRowMapper<StudentDTO> rowMapper = new BeanWrapperRowMapper<>();
        rowMapper.setTargetType(StudentDTO.class);
        return rowMapper;
    }
}

This approach works as long as your Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the StudentDTO class.

However, it's entirely possible that you have to read the input data from a spreadsheet that doesn't have a header row. If this is the case, you have to create a custom RowMapper that transforms the rows of your spreadsheet into StudentDTO objects.

You can create a custom RowMapper by following these steps:

  1. Create a StudentExcelRowMapper class.
  2. Implement the RowMapper<T> interface and pass the type of created object (StudentDTO) as a type parameter.
  3. Implement the T mapRow(RowSet rowSet) method of the RowMapper<T> interface by following these steps:
    1. Create a new StudentDTO object.
    2. Populate the field values of the created object. You can read the column values of the processed row by invoking the getColumnValue(int columnIndex) method of the RowSet interface. Also, you must remember that the index of the first column is 0.
    3. Return the created StudentDTO object.

The source code of the StudentExcelRowMapper class looks as follows:

import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.support.rowset.RowSet;

public class StudentExcelRowMapper implements RowMapper<StudentDTO> {

    @Override
    public StudentDTO mapRow(RowSet rowSet) throws Exception {
        StudentDTO student = new StudentDTO();

        student.setName(rowSet.getColumnValue(0));
        student.setEmailAddress(rowSet.getColumnValue(1));
        student.setPurchasedPackage(rowSet.getColumnValue(2));

        return student;
    }
}

After you have created your custom row mapper, you have to make the following changes to the configuration of your ItemReader bean:

  1. Ensure that the your ItemReader doesn't ignore the first line of the input data.
  2. Replace the old excelRowMapper() method with a method that returns a new StudentExcelRowMapper object.

After you have made these changes to the ExcelFileToDatabaseJobConfig class, its source code looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.poi.PoiItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class ExcelFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> excelStudentReader() {
        PoiItemReader<StudentDTO> reader = new PoiItemReader<>();
        reader.setResource(new ClassPathResource("data/students.xlsx"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

	private RowMapper<StudentDTO> excelRowMapper() {
       return new StudentExcelRowMapper();
    }
}

Let's summarize what you learned from this blog post.

Summary

This blog post has taught you four things:

  • If you want to read the input data of a Spring Batch job from an Excel spreadsheet, you have to add Spring Batch Excel and Apache POI dependencies to the classpath.
  • If you want to read the input data of your batch job by using Spring Batch Excel and Apache POI, you have to use the PoiItemReader class.
  • You can map the rows of your spreadsheet into T objects by using the BeanWrapperRowMapper<T> class as long as your Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the T class.
  • If your Excel spreadsheet doesn't have a header row or the column names of the header row cannot be resolved into the field names of the T class, you have to create a custom row mapper component that implements the RowMapper<T> interface.

The next part of this tutorial describes how you can write the output data of your batch job to a CSV file.

P.S. You can get the example applications of this blog post from Github: Spring example and Spring Boot example.

42 comments… add one
  • Amos Nov 28, 2016 @ 15:53

    Hi,
    Could you give an example of passing the resource path dynamically.Instead of hardcoding the file's path in the reader.
    Normally this would be dynamic, you could have a website where people upload files and they have to be processed.

    • gtvk Dec 28, 2016 @ 22:36

      Thanks Petri, I am facing the similar issue as described by Amos. Need to pass in a file dynamically rather than hard coding it. Help would be appreciated.

      Thanks!

      • kishore Feb 15, 2017 @ 12:54

        Please provide the sample code for the above example...

        Thank you.

      • Prakash Feb 17, 2017 @ 6:32

        Your example is very good please send code for dynamic resources path to excel upload using spring batch.

      • Petri Feb 17, 2017 @ 10:07

        Hi,

        I don't know how you can use a "dynamic file name" and use Spring Batch Excel. If you write your own ItemReader, you can of course implement the file reading logic as you wish, but this means that you have to also write the code that reads the content of the input file.

        One option is that you could create your own Resource that can locate input file dynamically, but I am not sure if this possible and how it should be implemented.

  • Praveen Apr 3, 2017 @ 16:07

    Hi,

    I have an excel file which contains rowspan and colspan. Can you please share an example which we can read span data from excel file.

  • Sujeet May 28, 2017 @ 21:26

    I tried the example to read excel file by using spring batch. I have kept only one xlsx file i.e. Student.xlsx file. For the first time it works fine but the second time when scheduler starts the job it is giving me exception 'java.lang.IllegalArgumentException: Sheet index (1) is out of range (0..0)'.
    Mine xlsx file has only one sheet. May I know from where is it incrementing the sheet index?

    Thanks.

    • Petri May 30, 2017 @ 22:35

      Hmm. I think that you have just found a bug from my example. I will take a closer look at it.

      • Daniel Henao Sep 9, 2017 @ 0:09

        Hey, have you fixed it? sorry I Ask but it happes to me too

        • Petri Sep 20, 2017 @ 23:25

          Ah. I forgot to report my findings. I am sorry about that :(

          In any way, it seems that this problem is caused by the Spring Batch Excel. The problem is that it doesn't reset the number of the current sheet after it has processed the input file. This means that when the job is run for the second time, it fails because the sheet cannot be opened (because it doesn't exist).

          In other words, you can fix this problem by cloning the Spring Batch Excel extension and making the required change to the AbstractExcelItemReader class. After you have made the required change, you have to create new jars and use them instead of the jars that are provided by this example.

  • Yohan Jul 20, 2017 @ 8:49

    Do you have an example where the Excel file is being submitted as Multipartfile.? Your example reads the excel file from the classpath and sets the resource on the Item reader. I am hoping for an example where the job is triggered when a user sends a request with the excel file payload as such open an input stream read the excel. If you we are reading the excel file as input stream, how do we set the resource on the reader bean?

    • Petri Jul 23, 2017 @ 21:36

      Unfortunately I don't have an example that reads the Excel file from an HTTP request. However, you could save the uploaded file to a directory and read the file from the upload directory by using Spring Batch. Actually, I will add this example to my to-do list and I maybe I will implement it after I have released my testing course.

      • Yohan Sep 6, 2017 @ 7:09

        Have you had a chance to implement the uploading Excel file with HTTP request?

        • Petri Sep 7, 2017 @ 9:15

          Hi,

          Unfortunately I haven't had any time to write that example (I am still recording my testing course).

          • Geek Feb 7, 2018 @ 14:42

            Did you get a chance to implement the uploading Excel file with HTTP request?I googled so many things but not finding any proper solution.

          • Petri Feb 11, 2018 @ 10:11

            Hi,

            I am sorry that it took me some time to answer to your comment. As you probably guessed, I am still recording my testing course. That being said, the course should be finally done after a few weeks, and I can concentrate on writing more content to my blog.

            About your problem: I haven't been able to find a proper solution to it because Spring Batch doesn't provide a good support for reading data from files that are determined at runtime. I assume that it's possible to support this use case, but I just haven't found the solution yet.

  • CHUA KWAN LIANG Aug 28, 2017 @ 10:52

    How about if I want to read item from a particular sheet? how to do it in reader ya? Thank You.

    • Petri Aug 29, 2017 @ 10:15

      I took a quick look at the source of the PoiItemReader and AbstractExcelItemReader classes, and it seems that there is no way to set the opened sheet (or at least I couldn't find it). In other words, if you want to do this, yo have to make the required changes directly to the source code of Spring Batch Excel.

    • Daniel Henao Sep 12, 2017 @ 0:47

      The main classes of PIO have that option. but the Extension of spring batch doesnt. you just set the sheet you want using .getSheet(X)

  • bala Jan 27, 2018 @ 9:36

    I have an requirement where i need to read dynamic excel sheets and process same and write in db. All of the examples i have seen so far requires a dto to access the excel file.Is there anyway i can read excel sheet dynamically without dto ?

    • mathieu Apr 4, 2018 @ 10:47

      do you resolve this problem ?? i have the same problem

      • Petri Apr 4, 2018 @ 10:58

        Hi,

        You have to implement a custom ItemReader that reads the input file by using Apache POI. Unfortunately I cannot give you the exact steps because they depend from the structure of the input file.

  • Naren Mar 7, 2018 @ 14:11

    I have added spring batch excel dependencies to my build script but I am getting error : import org.springframework.batch.item.excel.* cannot be resolved.

    • Petri Mar 12, 2018 @ 10:54

      Are you using Maven or Gradle? Also, did your IDE reload the dependencies of your project after you made changes to your build script?

  • Messi May 8, 2018 @ 8:56
  • joannes May 9, 2018 @ 13:08

    Hi This helped me a lot. Unfortunately facing some issues.

    excelStudentReader works fine for the firsttime. When i make a call again with the same excel, getting below error

    Caused by: java.lang.IllegalArgumentException: Sheet index (3) is out of range (0..2)

    I tried changing the below line in PoiItemReader getSheet()

    return new PoiSheet(this.workbook.getSheetAt(sheet)); to return new PoiSheet(this.workbook.getSheetAt(0));

    This time it worked but not as expected.

    Can you please look into it.

    • Petri May 9, 2018 @ 13:34

      Hi,

      This is a known bug of the Spring Batch Excel library. It seems that the original project is abandoned and someone has created a fork that fixes this problem (check the bug report). I am going to update my Spring Batch examples during next summer, and I am going to solve this problem when I do so.

  • MYJ Dec 9, 2018 @ 21:00

    Hi This helped me a lot. Unfortunately facing some issues.

    excelStudentReader works fine for the firsttime. When i make a call again with the same excel, getting below error

    Caused by: java.lang.IllegalArgumentException: Sheet index (3) is out of range (0..2)

    I tried changing the below line in PoiItemReader getSheet()

    return new PoiSheet(this.workbook.getSheetAt(sheet)); to return new PoiSheet(this.workbook.getSheetAt(0));

    This time it worked but not as expected.

    Can you please update this article? That would very kind of you. :)

    • Petri Dec 9, 2018 @ 21:34

      Hi,

      This is a known bug of the Spring Batch Excel library. Luckily, the Github issue (check the link) provides a workaround for this problem. Nevertheless, I think that it's a good idea to rewrite this blog post at some point. Unfortunately, I don't know when I have the time to do it.

  • DB Jun 13, 2019 @ 2:47

    When I run this code it works, but keeps going past where the data in the spreadsheet ends. Is there a way to set a row range or a trigger to have the code stop reading the file?

  • Nur Islam Nov 28, 2019 @ 14:11

    Hi Petri,

    Any workaround for this Caused by: java.lang.IllegalArgumentException: Sheet index (3) is out of range (0..2)

    when job is running for second time.how to specify the sheet number

  • Chandru Feb 18, 2020 @ 15:32

    Is this really working ? BeanWrapperRowMapper is not mapping the excel row into a DTO object . I have implemented as mentioned but all the bean properties are NULL. Can you please help me to fix the issue

  • Venkatesh Jan 5, 2021 @ 7:55

    Dobyoi have code for multiple sheets

  • Aarti Mar 23, 2021 @ 10:57

    How to read the Excel file data with chunk in spring batch? As i am facing issue its not able to pick the record for next chunk.

  • Praveen Sep 3, 2021 @ 12:47

    getColumnValue() method not showing for me, i am using 0.1.0 version

  • Iorran Oct 12, 2021 @ 1:00

    Is there any difference between StreamingXlsxItemReader and PoiItemReader ?

    I'm getting heap memory at AWS with Poi.

    • Petri Oct 12, 2021 @ 15:17

      The README of the Spring Batch Excel extension states that:

      To reduce the memory footprint the StreamingXlsxItemReader can be used, this will only keep the current row in memory and discard it afterward. Not everything is supported while streaming the XLSX file. It can be that formulas don’t get evaluated or lead to an error.

      In other words, if I would be in your shoes, I would test if the StreamingXlsxItemReader supports all features used in the processed spreadsheet. If it doesn't, then you have to split the input spreadsheet into smaller spreadsheets and read the data found from the smaller spreadsheets by using the PoiItemReader class.

  • Shariq Jan 12, 2022 @ 11:48

    Poi reader reads row which are not set (empty ..keeps reading rows )and then throws null pointer exception. How do i stop it from reading empty rows

    • Petri Jan 13, 2022 @ 19:29

      Hi,

      The support for this (skipping empty rows) was added in this commit which was committed after this blog post was written. Also, note that the Spring Batch Excel is now part of the Spring Batch Extensions project (the old project is no longer maintained).

      I recommend that you clone the repository of the Spring Batch Extensions project and build the spring-batch-excel jar from the source. After you have replaced the old jar file with the new one, the PoiItemReader should skip empty rows.

Leave a Reply