I released the starter package of my Test With Spring course. Take a look at the course >>

Spring Batch Tutorial: Reading Information From an Excel File

pink data concept

It is pretty easy to create a Spring Batch job that reads its input data from a CSV or an XML file because these file formats are supported out of the box.

However, if we want to read the input data of our batch job from a .XLS or .XLSX file that was created with Excel, we have to work a bit harder. This blog post helps us to solve this problem.

Let’s get started.

If you are not familiar with Spring Batch, you should read the following blog posts before you continue reading this blog post:

Introduction to Our Example Application

During this tutorial we will implement several Spring Batch jobs that processes the student information of an online course. This time we need to create a Spring Batch job that can import student information from an Excel file. This file contains a student list that provides the following information for our application:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

When we read the student information from an Excel file, we have to transform that information into StudentDTO objects which are processed by our batch job. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {
 
    private String emailAddress;
    private String name;
    private String purchasedPackage;
 
    public StudentDTO() {}
 
    public String getEmailAddress() {
        return emailAddress;
    }
 
    public String getName() {
        return name;
    }
 
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
 
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
 
    public void setName(String name) {
        this.name = name;
    }
 
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Before we can configure an ItemReader that reads student information from our Excel file, we have to add a few additional dependencies into our build script.

Getting the Required Dependencies

If we want to read the input data of our Spring Batch job from an Excel document, we have to add the following dependency declarations into our build script:

  • Spring Batch Excel is a Spring Batch extension that provides ItemReader implementations for Excel. Unfortunately at the moment the only way to get the required jar file is to build it from the source.
  • Apache POI provides a Java API for Microsoft Office documents. It is an optional dependency of Spring Batch Excel, and we can use it for reading input data from .XLS and .XLSX documents.
I assume that most of you don’t want to build Spring Batch Excel from the source. That is why I did it for you. Take a look at the example applications of this blog post: Spring example and Spring Boot example. These applications have working Gradle and Maven build scripts that use the Maven repository found from the repo directory.

Additional Reading:

After we have added the required dependency declarations into our build script, we can finally configure the ItemReader that can read the student information from our Excel spreadsheet.

Reading Information From an Excel File

The students.xlsx file contains the student list of our course. This file is found from the classpath and its full path is: data/students.xlsx. The content of this Excel spreadsheet looks as follows:

NAME			|EMAIL_ADDRESS				|PURCHASED_PACKAGE
Tony Tester		|tony.tester@gmail.com		|master
Nick Newbie		|nick.newbie@gmail.com		|starter
Ian Intermediate|ian.intermediate@gmail.com	|intermediate
Our spreadsheet does not contain pipe (‘|’) characters. This character is used here to separate the different cells from each other.

As we already know, we can provide input data for our Spring batch job by configuring an ItemReader bean. We can configure an ItemReader bean, which reads the student information from the students.xlsx file, by following these steps:

  1. Create an ExcelFileToDatabaseJobConfig class and annotate it with the @Configuration annotation. This class is the configuration class of our batch job, and it contains the beans that describe the flow of our batch job.
  2. Create a method that configures our ItemReader bean and ensure that the method returns an ItemReader<StudentDTO> object.
  3. Implement the created method by following these steps:
    1. Create a new PoiItemReader<StudentDTO> object.
    2. Ensure that the created reader ignores the header of our spreadsheet.
    3. Configure the created reader to read the student information from the data/students.xlsx file that is found from the classpath.
    4. Configure the reader to transform a student information row into a StudentDTO object with the BeanWrapperRowMapper class. This class populates the fields of the created StudentDTO object by using the column names given on header row of our spreadsheet.
    5. Return the created PoiItemReader<StudentDTO> object.

The source code of the ExcelFileToDatabaseJobConfig class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.mapping.BeanWrapperRowMapper;
import org.springframework.batch.item.excel.poi.PoiItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.env.Environment;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class ExcelFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> excelStudentReader() {
        PoiItemReader<StudentDTO> reader = new PoiItemReader<>();
        reader.setLinesToSkip(1);
        reader.setResource(new ClassPathResource("data/students.xlsx"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

    private RowMapper<StudentDTO> excelRowMapper() {
        BeanWrapperRowMapper<StudentDTO> rowMapper = new BeanWrapperRowMapper<>();
        rowMapper.setTargetType(StudentDTO.class);
        return rowMapper;
    }
}

This approach works as long as our Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the StudentDTO class.

However, it is entirely possible that we have to read the input data from a spreadsheet that doesn’t have a header row. If this is the case, we have to create a custom RowMapper that transforms the rows of our spreadsheet into StudentDTO objects.

We can create a custom RowMapper by following these steps:

  1. Create a StudentExcelRowMapper class.
  2. Implement the RowMapper<T> interface and pass the type of created object (StudentDTO) as a type parameter.
  3. Implement the T mapRow(RowSet rowSet) method of the RowMapper<T> interface by following these steps:
    1. Create a new StudentDTO object.
    2. Populate the field values of the created object. We can read the column values of the processed row by invoking the getColumnValue(int columnIndex) method of the RowSet interface. Also, we must remember that the index of the first column is 0.
    3. Return the created StudentDTO object.

The source code of the StudentExcelRowMapper class looks as follows:

import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.support.rowset.RowSet;

public class StudentExcelRowMapper implements RowMapper<StudentDTO> {

    @Override
    public StudentDTO mapRow(RowSet rowSet) throws Exception {
        StudentDTO student = new StudentDTO();

        student.setName(rowSet.getColumnValue(0));
        student.setEmailAddress(rowSet.getColumnValue(1));
        student.setPurchasedPackage(rowSet.getColumnValue(2));

        return student;
    }
}

After we have created our custom row mapper, we have to make the following changes to the configuration of our ItemReader bean:

  1. Ensure that the our ItemReader does not ignore the first line of the input data.
  2. Replace the old excelRowMapper() method with a method that returns a new StudentExcelRowMapper object.

After we have made these changes to the ExcelFileToDatabaseJobConfig class, its source code looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.excel.RowMapper;
import org.springframework.batch.item.excel.poi.PoiItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class ExcelFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> excelStudentReader() {
        PoiItemReader<StudentDTO> reader = new PoiItemReader<>();
        reader.setResource(new ClassPathResource("data/students.xlsx"));
        reader.setRowMapper(excelRowMapper());
        return reader;
    }

	private RowMapper<StudentDTO> excelRowMapper() {
       return new StudentExcelRowMapper();
    }
}

Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us four things:

  • If we want to read the input data of a Spring Batch job from an Excel spreadsheet, we have to add Spring Batch Excel and Apache POI dependencies into our build script.
  • If we want to read input data by using Spring Batch Excel and Apache POI, we have to use the PoiItemReader class.
  • We can map the rows of our spreadsheet into T objects by using the BeanWrapperRowMapper<T> class as long as our Excel spreadsheet has a header row and the column names of the header row can be resolved into the field names of the T class.
  • If our Excel spreadsheet doesn’t have a header row or the column names of the header row cannot resolved into the field names of the T class, we have to create a custom row mapper component that implements the RowMapper<T> interface.

P.S. You can get the example applications of this blog post from Github: Spring example and Spring Boot example.

About the Author

Petri Kainulainen is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of Spring Data book.

About Petri Kainulainen →

3 comments… add one
  • Hi,
    Could you give an example of passing the resource path dynamically.Instead of hardcoding the file’s path in the reader.
    Normally this would be dynamic, you could have a website where people upload files and they have to be processed.

    Reply
    • Thanks Petri, I am facing the similar issue as described by Amos. Need to pass in a file dynamically rather than hard coding it. Help would be appreciated.

      Thanks!

      Reply

Leave a Comment