I released the intermediate package of my Test With Spring course. Take a look at the course >>

Spring Batch Tutorial: Reading Information From a File

The previous parts of my Spring Batch tutorial provided an introduction to Spring Batch and described how we can get Spring Batch dependencies by using either Maven or Gradle.

After we have downloaded the required dependencies, we can start writing Spring Batch jobs. The first thing that we have to do is to provide the input data for our batch job. This blog post helps us to read the input data from CSV and XML files.

Let’s start by taking a quick look at our example application.

If you are not familiar with Spring Batch or Gradle, you should read the following blog posts before you continue reading this blog post:

Introduction to Our Example Application

During this tutorial we will implement several Spring Batch jobs that processes the student information of an online course. Our first task is to create Spring Batch jobs that can import student information from CSV and XML files. These files contain a student list that provides the following information for our application:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

When we start writing a batch job, our first step is to provide input data for our batch job. In this case, we have to read student information from CSV and XML files and transform that information into StudentDTO objects which are processed by our batch job.

The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {

    private String emailAddress;
    private String name;
    private String purchasedPackage;

    public StudentDTO() {}

    public String getEmailAddress() {
        return emailAddress;
    }

    public String getName() {
        return name;
    }

    public String getPurchasedPackage() {
        return purchasedPackage;
    }

    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Let’s find out how we can read information from a CSV file.

Reading Information From a CSV File

The students.csv file contains the student list of our course. This file is found from the classpath and its full path is: data/students.csv. The content of the students.csv file looks as follows:

NAME;EMAIL_ADDRESS;PACKAGE
Tony Tester;tony.tester@gmail.com;master
Nick Newbie;nick.newbie@gmail.com;starter
Ian Intermediate;ian.intermediate@gmail.com;intermediate

We can provide the input data for our batch job by configuring an ItemReader bean. We can configure an ItemReader bean, which reads the student information from the students.csv file, by following these steps:

  1. Create a CsvFileToDatabaseJobConfig class and annotate the created class with the @Configuration annotation. This class is the configuration class of our batch job, and it contains the beans that describe the flow of our batch job.
  2. Create a method that configures our ItemReader bean and ensure that the method returns an ItemReader<StudentDTO> object.
  3. Implement the created method by following these steps:
    1. Create a new FlatItemReader<StudentDTO> object. This reader can read lines from the Resource that is configured by invoking the setResource() method.
    2. Configure the created reader to read student information lines from the data/students.csv file that is found from the classpath.
    3. Configure the created reader to ignore the header line of the CSV file.
    4. Configure the reader to transform the student information line into a StudentDTO object by using these steps:
      1. Split the student information line into tokens by using semicolon (;) as a delimiter character and configure the names of each token. The names of these tokens must match with the field names of the StudentDTO class.
      2. Ensure that that the mapper component creates a new StudentDTO object and sets its field values by using the created tokens.
    5. Return the created FlatItemReader<StudentDTO> object.

The source code of the CsvFileToDatabaseJobConfig class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class CsvFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> csvFileItemReader() {
        FlatFileItemReader<StudentDTO> csvFileReader = new FlatFileItemReader<>();
        csvFileReader.setResource(new ClassPathResource("data/students.csv"));
        csvFileReader.setLinesToSkip(1);

        LineMapper<StudentDTO> studentLineMapper = createStudentLineMapper();
        csvFileReader.setLineMapper(studentLineMapper);

        return csvFileReader;
    }

    private LineMapper<StudentDTO> createStudentLineMapper() {
        DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>();

        LineTokenizer studentLineTokenizer = createStudentLineTokenizer();
        studentLineMapper.setLineTokenizer(studentLineTokenizer);

        FieldSetMapper<StudentDTO> studentInformationMapper = createStudentInformationMapper();
        studentLineMapper.setFieldSetMapper(studentInformationMapper);

        return studentLineMapper;
    }

    private LineTokenizer createStudentLineTokenizer() {
        DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer();
        studentLineTokenizer.setDelimiter(";");
        studentLineTokenizer.setNames(new String[]{"name", "emailAddress", "purchasedPackage"});
        return studentLineTokenizer;
    }

    private FieldSetMapper<StudentDTO> createStudentInformationMapper() {
        BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper = new BeanWrapperFieldSetMapper<>();
        studentInformationMapper.setTargetType(StudentDTO.class);
        return studentInformationMapper;
    }
}

Let’s move on and find out how we can read information from an XML file.

Reading Information From an XML file

The students.xml file contains the student list of our course. This file is found from the classpath and its full path is: data/students.xml. The content of the students.xml file looks as follows:

<students>
    <student>
        <name>Tony Tester</name>
        <emailAddress>tony.tester@gmail.com</emailAddress>
        <purchasedPackage>master</purchasedPackage>
    </student>
    <student>
        <name>Nick Newbie</name>
        <emailAddress>nick.newbie@gmail.com</emailAddress>
        <purchasedPackage>starter</purchasedPackage>
    </student>
    <student>
        <name>Ian Intermediate</name>
        <emailAddress>ian.intermediate@gmail.com</emailAddress>
        <purchasedPackage>intermediate</purchasedPackage>
    </student>
</students>

We can provide the input data for our batch job by configuring an ItemReader bean. We can configure an ItemReader bean, which reads the student information from the students.xml file, by following these steps:

  1. Create an XmlFileToDatabaseJobConfig class and annotate the created class with the @Configuration annotation. This class is the configuration class of our batch job, and it contains the beans that describe the flow of our batch job.
  2. Create a method that configures our ItemReader bean and ensure that the method returns an ItemReader<StudentDTO> object.
  3. Implement the created method by following these steps:
    1. Create a new StaxEventItemReader<StudentDTO> object. This reader reads input data from an XML file by using StAX.
    2. Configure the created reader to read student information from the data/students.xml file that is found from the classpath.
    3. Configure the name of the XML element (student) that contains the information of a single student.
    4. Ensure that the reader transforms the processed XML fragment into a StudentDTO object by using JAXB2.
    5. Return the created StaxEventItemReader<StudentDTO> object.

The source of the XmlFileToDatabaseJobConfig class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.xml.StaxEventItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.oxm.jaxb.Jaxb2Marshaller;

@Configuration
public class XmlFileToDatabaseJobConfig {

    @Bean
    ItemReader<StudentDTO> xmlFileItemReader() {
        StaxEventItemReader<StudentDTO> xmlFileReader = new StaxEventItemReader<>();
        xmlFileReader.setResource(new ClassPathResource("data/students.xml"));
        xmlFileReader.setFragmentRootElementName("student");

        Jaxb2Marshaller studentMarshaller = new Jaxb2Marshaller();
        studentMarshaller.setClassesToBeBound(StudentDTO.class);
        xmlFileReader.setUnmarshaller(studentMarshaller);

        return xmlFileReader;
    }
}

Before we can transform the student information read from the student.xml file into StudentDTO objects, we have to configure the name of the fragment root element in the TodoDTO class. We can do this by following these steps:

  1. Annotate the class with the @XmlRootElement annotation.
  2. Configure the name of the root element by setting the value of the @XmlRootElement annotation’s name attribute to: student.

The source code of the modified TodoDTO class looks as follows:

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement(name="student")
public class StudentDTO {

    private String emailAddress;
    private String name;
    private String purchasedPackage;

    public StudentDTO() {}

    public String getEmailAddress() {
        return emailAddress;
    }

    public String getName() {
        return name;
    }

    public String getPurchasedPackage() {
        return purchasedPackage;
    }

    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}
Because the element names of the processed XML fragments are the same as the field names of the StudentDTO class, we don’t have to add additional annotations into the StudentDTO class. If this is not the case, or we need to use custom marshalling, we have to annotate our DTO class with JAXB annotations.

Additional Reading:

Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us four things:

  • If we need to read lines from an input file, we can use the FlatItemReader<T> class.
  • The FlatItemReader<T> class transforms lines into objects by using a LineMapper<T>.
  • If we need to read information from an XML document, we can use the StaxEventItemReader<T> class.
  • The StaxEventItemReader<T> class transforms XML fragments into objects by using an Unmarshaller.

The next part of this tutorial describes how we can read information from a database.

P.S. You can get the example applications of this blog post from Github: Spring example and Spring Boot example.

About the Author

Petri Kainulainen is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of Spring Data book.

About Petri Kainulainen →

12 comments… add one
  • Being very (just a few days) new to Spring Batch, the first question I have is, once the classes described above are created, how do you run them to actually see the data defined in the XML file in the database? Thank you.

    Reply
    • I will describe the required steps in my upcoming blog posts, but basically you have to follow these steps:

      1. Configure a Step that has an ItemReader and ItemWriter.
      2. Configure a Job that contains the created Step.
      3. Create a component that runs your Job.
      4. Ensure that the Spring container finds your component during classpath scan.

      For example, if you want see the configuration of a Spring Batch job that reads information from a CSV file and writes it to a database, you should take a look at this package. Also, you might want to take a look at this resource page that contains links to other Spring Batch tutorials.

      Reply
  • Thanks.

    How to read/write multiple txt files (which are not same) using spring batch with single field setmapper class?

    Reply
  • How to simply read a .csv file and show it to console only

    Reply
    • If you want to use Spring Batch, you need to create an ItemWriter that prints the information to console. However, you probably don’t need Spring Batch for this. If you want to use a simpler approach, take a look at this blog post.

      Reply
  • 1. Do you have any samples with reading multiple FragmentRootElementName from XML file?
    2. I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.

    Reply
    • Hi,

      Do you have any samples with reading multiple FragmentRootElementName from XML file?

      No :(

      I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.

      Yes. If I remove that annotation, the UnmarshallingFailureException is thrown:

      
      org.springframework.oxm.UnmarshallingFailureException: JAXB unmarshalling exception; 
      nested exception is javax.xml.bind.UnmarshalException
       - with linked exception:
      [com.sun.istack.internal.SAXParseException2; lineNumber: 2; columnNumber: 14; 
      unexpected element (uri:"", local:"student"). Expected elements are (none)]
      
      

      Can you configure the code generator to include these annotations?

      Reply
  • Suppose StudentDTO has a field that is a Foreign Key? It seems hibernate wants to try to insert an object into the SQL statement?

    Reply
    • Are you trying to insert something to the database by using Hibernate? If so, you need to transform the StudentDTO object into an entity and persist it to the database. If the entity has relationships with other entities, you need to obtain references to these objects before you persist the created entity to the database.

      Reply

Leave a Comment