I released five new sample lessons from my Test With Spring course: Introduction to Spock Framework

Spring Batch Tutorial: Reading Information From an Excel File

It is pretty easy to create a Spring Batch job that reads its input data from a CSV or an XML file because these file formats are supported out of the box.

However, if we want to read the input data of our batch job from a .XLS or .XLSX file that was created with Excel, we have to work a bit harder. This blog post helps us to solve this problem.

Let’s get started.

If you are not familiar with Spring Batch, you should read the following blog posts before you continue reading this blog post:

Introduction to Our Example Application

During this tutorial we will implement several Spring Batch jobs that processes the student information of an online course. This time we need to create a Spring Batch job that can import student information from an Excel file. This file contains a student list that provides the following information for our application:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

When we read the student information from an Excel file, we have to transform that information into StudentDTO objects which are processed by our batch job. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {
 
    private String emailAddress;
    private String name;
    private String purchasedPackage;
 
    public StudentDTO() {}
 
    public String getEmailAddress() {
        return emailAddress;
    }
 
    public String getName() {
        return name;
    }
 
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
 
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
 
    public void setName(String name) {
        this.name = name;
    }
 
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Before we can configure an ItemReader that reads student information from our Excel file, we have to add a few additional dependencies into our build script.

Getting the Required Dependencies

If we want to read the input data of our Spring Batch job from an Excel document, we have to add the following dependency declarations into our build script:

  • Spring Batch Excel is a Spring Batch extension that provides ItemReader implementations for Excel. Unfortunately at the moment the only way to get the required jar file is to build it from the source.
  • Apache POI provides a Java API for Microsoft Office documents. It is an optional dependency of Spring Batch Excel, and we can use it for reading input data from .XLS and .XLSX documents.
I assume that most of you don’t want to build Spring Batch Excel from the source. That is why I did it for you. Take a look at the example applications of this blog post: Spring example and Spring Boot example. These applications have working Gradle and Maven build scripts that use the Maven repository found from the repo directory.

Additional Reading:

After we have added the required dependency declarations into our build script, we can finally configure the ItemReader that can read the student information from our Excel spreadsheet.

Reading Information From an Excel File

The students.xlsx file contains the student list of our course. This file is found from the classpath and its full path is: data/students.xlsx. The content of this Excel spreadsheet looks as follows:

NAME			|EMAIL_ADDRESS				|PURCHASED_PACKAGE
Tony Tester		|tony.tester@gmail.com		|master
Nick Newbie		|nick.newbie@gmail.com		|starter
Ian Intermediate|ian.intermediate@gmail.com	|intermediate
Our spreadsheet does not contain pipe (‘|’) characters. This character is used here to separate the different cells from each other.

As we already know, we can provide input data for our Spring batch job by configuring an ItemReader bean. We can configure an ItemReader bean, which reads the student information from the students.xlsx file, by following these steps:

  1. Create an ExcelFileToDatabaseJobConfig class and annotate it with the @Configuration annotation. This class is the configuration class of our batch job, and it contains the beans that describe the flow of our batch job.
  2. Create a method that configures our ItemReader bean and ensure that the method returns an ItemReader<StudentDTO> object.
  3. Implement the created method by following these steps:
    1. Create a new PoiItemReader<StudentDTO> object.
    2. Ensure that the created reader ignores the header of our spreadsheet.
    3. Configure the created reader to read the student information from the data/students.xlsx file that is found from the classpath.
    4. Configure the reader to transform a student information row into a StudentDTO object with the BeanWrapperRowMapper class. This class populates the fields of the created StudentDTO object by using the column names given on header row of our spreadsheet.
    5. Return the created PoiItemReader<StudentDTO> object.

The source code of the ExcelFileToDatabaseJobConfig class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.mapping.BeanWrapperRowMapper;
import org.springframework.batch.item.excel.poi.PoiItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class ExcelFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> excelStudentReader() {
        PoiItemReader<StudentDTO> reader = new PoiItemReader<>();
        reader.setLinesToSkip(1);
        reader.setResource(new ClassPathResource("data/students.xlsx"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

    private RowMapper<StudentDTO> excelRowMapper() {
        BeanWrapperRowMapper<StudentDTO> rowMapper = new BeanWrapperRowMapper<>();
        rowMapper.setTargetType(StudentDTO.class);
        return rowMapper;
    }
}

This approach works as long as our Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the StudentDTO class.

However, it is entirely possible that we have to read the input data from a spreadsheet that doesn’t have a header row. If this is the case, we have to create a custom RowMapper that transforms the rows of our spreadsheet into StudentDTO objects.

We can create a custom RowMapper by following these steps:

  1. Create a StudentExcelRowMapper class.
  2. Implement the RowMapper<T> interface and pass the type of created object (StudentDTO) as a type parameter.
  3. Implement the T mapRow(RowSet rowSet) method of the RowMapper<T> interface by following these steps:
    1. Create a new StudentDTO object.
    2. Populate the field values of the created object. We can read the column values of the processed row by invoking the getColumnValue(int columnIndex) method of the RowSet interface. Also, we must remember that the index of the first column is 0.
    3. Return the created StudentDTO object.

The source code of the StudentExcelRowMapper class looks as follows:

import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.support.rowset.RowSet;

public class StudentExcelRowMapper implements RowMapper<StudentDTO> {

    @Override
    public StudentDTO mapRow(RowSet rowSet) throws Exception {
        StudentDTO student = new StudentDTO();

        student.setName(rowSet.getColumnValue(0));
        student.setEmailAddress(rowSet.getColumnValue(1));
        student.setPurchasedPackage(rowSet.getColumnValue(2));

        return student;
    }
}

After we have created our custom row mapper, we have to make the following changes to the configuration of our ItemReader bean:

  1. Ensure that the our ItemReader does not ignore the first line of the input data.
  2. Replace the old excelRowMapper() method with a method that returns a new StudentExcelRowMapper object.

After we have made these changes to the ExcelFileToDatabaseJobConfig class, its source code looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.poi.PoiItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class ExcelFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> excelStudentReader() {
        PoiItemReader<StudentDTO> reader = new PoiItemReader<>();
        reader.setResource(new ClassPathResource("data/students.xlsx"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

	private RowMapper<StudentDTO> excelRowMapper() {
       return new StudentExcelRowMapper();
    }
}

Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us four things:

  • If we want to read the input data of a Spring Batch job from an Excel spreadsheet, we have to add Spring Batch Excel and Apache POI dependencies into our build script.
  • If we want to read input data by using Spring Batch Excel and Apache POI, we have to use the PoiItemReader class.
  • We can map the rows of our spreadsheet into T objects by using the BeanWrapperRowMapper<T> class as long as our Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the T class.
  • If our Excel spreadsheet doesn’t have a header row or the column names of the header row cannot resolved into the field names of the T class, we have to create a custom row mapper component that implements the RowMapper<T> interface.

P.S. You can get the example applications of this blog post from Github: Spring example and Spring Boot example.

About the Author

Petri Kainulainen is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of Spring Data book.

About Petri Kainulainen →

18 comments… add one
  • Hi,
    Could you give an example of passing the resource path dynamically.Instead of hardcoding the file’s path in the reader.
    Normally this would be dynamic, you could have a website where people upload files and they have to be processed.

    Reply
    • Thanks Petri, I am facing the similar issue as described by Amos. Need to pass in a file dynamically rather than hard coding it. Help would be appreciated.

      Thanks!

      Reply
      • Please provide the sample code for the above example…

        Thank you.

        Reply
      • Your example is very good please send code for dynamic resources path to excel upload using spring batch.

        Reply
      • Hi,

        I don’t know how you can use a “dynamic file name” and use Spring Batch Excel. If you write your own ItemReader, you can of course implement the file reading logic as you wish, but this means that you have to also write the code that reads the content of the input file.

        One option is that you could create your own Resource that can locate input file dynamically, but I am not sure if this possible and how it should be implemented.

        Reply
  • Hi,

    I have an excel file which contains rowspan and colspan. Can you please share an example which we can read span data from excel file.

    Reply
  • I tried the example to read excel file by using spring batch. I have kept only one xlsx file i.e. Student.xlsx file. For the first time it works fine but the second time when scheduler starts the job it is giving me exception ‘java.lang.IllegalArgumentException: Sheet index (1) is out of range (0..0)’.
    Mine xlsx file has only one sheet. May I know from where is it incrementing the sheet index?

    Thanks.

    Reply
    • Hmm. I think that you have just found a bug from my example. I will take a closer look at it.

      Reply
      • Hey, have you fixed it? sorry I Ask but it happes to me too

        Reply
        • Ah. I forgot to report my findings. I am sorry about that :(

          In any way, it seems that this problem is caused by the Spring Batch Excel. The problem is that it doesn’t reset the number of the current sheet after it has processed the input file. This means that when the job is run for the second time, it fails because the sheet cannot be opened (because it doesn’t exist).

          In other words, you can fix this problem by cloning the Spring Batch Excel extension and making the required change to the AbstractExcelItemReader class. After you have made the required change, you have to create new jars and use them instead of the jars that are provided by this example.

          Reply
  • Do you have an example where the Excel file is being submitted as Multipartfile.? Your example reads the excel file from the classpath and sets the resource on the Item reader. I am hoping for an example where the job is triggered when a user sends a request with the excel file payload as such open an input stream read the excel. If you we are reading the excel file as input stream, how do we set the resource on the reader bean?

    Reply
    • Unfortunately I don’t have an example that reads the Excel file from an HTTP request. However, you could save the uploaded file to a directory and read the file from the upload directory by using Spring Batch. Actually, I will add this example to my to-do list and I maybe I will implement it after I have released my testing course.

      Reply
      • Have you had a chance to implement the uploading Excel file with HTTP request?

        Reply
        • Hi,

          Unfortunately I haven’t had any time to write that example (I am still recording my testing course).

          Reply
  • How about if I want to read item from a particular sheet? how to do it in reader ya? Thank You.

    Reply
    • I took a quick look at the source of the PoiItemReader and AbstractExcelItemReader classes, and it seems that there is no way to set the opened sheet (or at least I couldn’t find it). In other words, if you want to do this, yo have to make the required changes directly to the source code of Spring Batch Excel.

      Reply
    • The main classes of PIO have that option. but the Extension of spring batch doesnt. you just set the sheet you want using .getSheet(X)

      Reply

Leave a Comment