Are you tired of writing tests which have a lot of boilerplate code? If so, get started with Spock Framework >>

Spring Batch Tutorial: Reading Information From a CSV File

The previous parts of my Spring Batch tutorial provided an introduction to Spring Batch and described how we can get the required dependencies by using either Maven or Gradle.

After we have downloaded the required dependencies, we can start writing Spring Batch jobs. The first thing that we have to do is to provide the input data for our batch job. This blog post helps us to read the input data a from CSV file. After we have read this blog post, we:

  • Can read the input data of our batch job from a CSV file.
  • Understand how we can transform a line read from a CSV file into a domain object.

Let’s start by taking a quick look at our example application.

This blog post assumes that:

Introduction to Our Example Application

During this blog post we will read the input data of our batch job from a CSV file which contains the student information of an online course. To be more specific, the CSV file contains a student list that provides the following information to our batch job:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

The content of our input file looks as follows:

NAME;EMAIL_ADDRESS;PACKAGE
Tony Tester;tony.tester@gmail.com;master
Nick Newbie;nick.newbie@gmail.com;starter
Ian Intermediate;ian.intermediate@gmail.com;intermediate

The ItemReader which reads the student list from a CSV file must return StudentDTO objects. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {

    private String emailAddress;
    private String name;
    private String purchasedPackage;

    public StudentDTO() {}

    public String getEmailAddress() {
        return emailAddress;
    }

    public String getName() {
        return name;
    }

    public String getPurchasedPackage() {
        return purchasedPackage;
    }

    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Next, we will find out how we can read the input data of our batch job from a CSV file.

Reading the Input Data From a CSV File

We can provide the input data for our batch job by configuring an ItemReader bean. Because we want to read the student information from a CSV file, we have to configure this bean by following these steps:

First, we have to create the configuration class that contains the beans which describe the flow of our batch job. The source code of our configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {
}

Second, we have to write a private method that returns a LineMapper<StudentDTO> object. This object transforms a String object read from the source CSV file into a domain object. We can write this method by following these steps:

  1. Create a new DefaultLineMapper<StudentDTO> object.
  2. Create a new DelimitedLineTokenizer object. Ensure that the created object splits the student information line into tokens by using semicolon (;) as a delimiter character and configure the names of each token. The names of these tokens must match with the field names of the target class (StudentDTO).
  3. Ensure that the DefaultLineMapper<StudentDTO> object splits each row into tokens by using the created DelimitedLineTokenizer object.
  4. Create a new BeanWrapperFieldSetMapper<StudentDTO> object which maps the tokenized input data into a domain object by using bean property paths. Remember to ensure that created object creates new StudentDTO objects.
  5. Ensure that the DefaultLineMapper<StudentDTO> object creates new StudentDTO objects by using the created BeanWrapperFieldSetMapper<StudentDTO> object.
  6. Return the created DefaultLineMapper<StudentDTO> object.

After we have written this method, the source code of our configuration class looks as follows:

import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {

    private LineMapper<StudentDTO> createStudentLineMapper() {
        DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>();

        LineTokenizer studentLineTokenizer = createStudentLineTokenizer();
        studentLineMapper.setLineTokenizer(studentLineTokenizer);

        FieldSetMapper<StudentDTO> studentInformationMapper =
                createStudentInformationMapper();
        studentLineMapper.setFieldSetMapper(studentInformationMapper);

        return studentLineMapper;
    }

    private LineTokenizer createStudentLineTokenizer() {
        DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer();
        studentLineTokenizer.setDelimiter(";");
        studentLineTokenizer.setNames(new String[]{
                "name", 
                "emailAddress", 
                "purchasedPackage"
        });
        return studentLineTokenizer;
    }

    private FieldSetMapper<StudentDTO> createStudentInformationMapper() {
        BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper =
                new BeanWrapperFieldSetMapper<>();
        studentInformationMapper.setTargetType(StudentDTO.class);
        return studentInformationMapper;
    }
}

Third, we have to create the method that configures our ItemReader bean and ensure that this method returns an ItemReader<StudentDTO> object. After we have created this method, we have to implement it by following these steps:

  1. Create a new FlatItemReader<StudentDTO> object. This reader can read lines from the specified Resource.
  2. Configure the location of the CSV file which contains the input data of our batch job. Because I wanted to create an example application that’s as easy to run as possible, I ensured that the input file (data/students.csv) of our batch job is found from the classpath
  3. Ignore the header line of the CSV file.
  4. Configure the used LineMapper<StudentDTO> object which transforms a String object read from the CSV file into a domain object (StudentDTO).
  5. Return the created FlatItemReader<StudentDTO> object.

After we have configured our ItemReader bean, the source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.FlatFileItemReader;
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class SpringBatchExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader() {
        FlatFileItemReader<StudentDTO> csvFileReader = new FlatFileItemReader<>();
        csvFileReader.setResource(new ClassPathResource("data/students.csv"));
        csvFileReader.setLinesToSkip(1);

        LineMapper<StudentDTO> studentLineMapper = createStudentLineMapper();
        csvFileReader.setLineMapper(studentLineMapper);

        return csvFileReader;
    }

    private LineMapper<StudentDTO> createStudentLineMapper() {
        DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>();

        LineTokenizer studentLineTokenizer = createStudentLineTokenizer();
        studentLineMapper.setLineTokenizer(studentLineTokenizer);

        FieldSetMapper<StudentDTO> studentInformationMapper =
                createStudentInformationMapper();
        studentLineMapper.setFieldSetMapper(studentInformationMapper);

        return studentLineMapper;
    }

    private LineTokenizer createStudentLineTokenizer() {
        DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer();
        studentLineTokenizer.setDelimiter(";");
        studentLineTokenizer.setNames(new String[]{
                "name",
                "emailAddress",
                "purchasedPackage"
        });
        return studentLineTokenizer;
    }

    private FieldSetMapper<StudentDTO> createStudentInformationMapper() {
        BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper =
                new BeanWrapperFieldSetMapper<>();
        studentInformationMapper.setTargetType(StudentDTO.class);
        return studentInformationMapper;
    }
}
We must configure the location of our input file by creating a new ClassPathResource object because I wanted to create an example application that’s as easy to run as possible. Typically, the input file of our batch job is found from the file system. This means that we can configure its location by creating a new FileSystemResource object.

Additional Reading:

We can now read the input data of our batch job from a CSV file. Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us four things:

  • We can read data from a CSV file by using the FlatItemReader<T> class.
  • The FlatItemReader<T> class transforms lines read from the input file into domain objects by using a LineMapper<T> object.
  • The DelimitedLineTokenizer class can split the input data into tokens by using the specified delimiter character. Also, this class allows us to configure the field names which are used to populate the fields of the created domain object.
  • The BeanWrapperFieldSetMapper<T> class can transform the tokenized input data into a domain object by using bean property paths.

The next part of this tutorial describes how we can read the input data from an XML file.

P.S. You can get the example application from Github.

27 comments… add one
  • Being very (just a few days) new to Spring Batch, the first question I have is, once the classes described above are created, how do you run them to actually see the data defined in the XML file in the database? Thank you.

    Reply
    • I will describe the required steps in my upcoming blog posts, but basically you have to follow these steps:

      1. Configure a Step that has an ItemReader and ItemWriter.
      2. Configure a Job that contains the created Step.
      3. Create a component that runs your Job.
      4. Ensure that the Spring container finds your component during classpath scan.

      For example, if you want see the configuration of a Spring Batch job that reads information from a CSV file and writes it to a database, you should take a look at this package. Also, you might want to take a look at this resource page that contains links to other Spring Batch tutorials.

      Reply
  • Thanks.

    How to read/write multiple txt files (which are not same) using spring batch with single field setmapper class?

    Reply
  • How to simply read a .csv file and show it to console only

    Reply
    • If you want to use Spring Batch, you need to create an ItemWriter that prints the information to console. However, you probably don’t need Spring Batch for this. If you want to use a simpler approach, take a look at this blog post.

      Reply
  • 1. Do you have any samples with reading multiple FragmentRootElementName from XML file?
    2. I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.

    Reply
    • Hi,

      Do you have any samples with reading multiple FragmentRootElementName from XML file?

      No :(

      I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.

      Yes. If I remove that annotation, the UnmarshallingFailureException is thrown:

      
      org.springframework.oxm.UnmarshallingFailureException: JAXB unmarshalling exception; 
      nested exception is javax.xml.bind.UnmarshalException
       - with linked exception:
      [com.sun.istack.internal.SAXParseException2; lineNumber: 2; columnNumber: 14; 
      unexpected element (uri:"", local:"student"). Expected elements are (none)]
      
      

      Can you configure the code generator to include these annotations?

      Reply
  • Suppose StudentDTO has a field that is a Foreign Key? It seems hibernate wants to try to insert an object into the SQL statement?

    Reply
    • Are you trying to insert something to the database by using Hibernate? If so, you need to transform the StudentDTO object into an entity and persist it to the database. If the entity has relationships with other entities, you need to obtain references to these objects before you persist the created entity to the database.

      Reply
  • Hi Petri,

    I have downloaded the source code from Github. Please let me know how to run. Which is the first class that needs to be executed?

    Reply
    • Hi,

      Each example has a README that explains how you can run the example application by using either Maven or Gradle. If you want to run the Spring example, you should take a look at this README. On the other hand, if you want to run the Spring Boot example, you should take a look at this README.

      Reply
  • How do i first fetch a csv file from any physical location then stream this csv file and read it in spring batch.
    if possible, could explain this without using annotation also.

    Reply
    • Hi,

      Unfortunately Spring Batch doesn’t have a very good support for reading files from paths that are not known when the application context is started. That’s why people often end up writing custom tasklets which copies the input files to the path that is given to the used ItemWriter.

      I think that you can solve your problem by following these steps:

      1. Write a tasklet that copies the input file to the known path.
      2. Create a step that invokes your tasklet and ensure that this step is run before the step that processes the input file.

      If you don’t know how you can create custom tasklets, you should take a look at this blog post. Also, if you have any other questions, don’t hesitate to ask them.

      Reply
  • Hi
    what about nested elements ? I mean if a student is represented like that:

    Tony Tester

    Maths
    221

    Chemistry
    222

    tony.tester@gmail.com
    master

    How to retrieves courses ? is it possible to retrieve only courses ?
    Thanks

    Reply
    • sorry xml is not supported, so here is my illustrated xml sample as I talked about in my previous post (without ), I hope It’s pretty clear…

      student
      name Tony Tester name
      courses
      course
      name Maths name
      room 221 room
      course
      course
      name Chemistry name
      room 222 room
      course
      courses
      emailAddress tony.tester@gmail.com emailAddress
      purchasedPackage master purchasedPackage
      student

      Reply
      • Hi,

        I have to admit that I don’t know if it is possible to configure the StaxEventItemReader to read only the child elements of the root element. One option is to extend that class and make the required changes to the subclass. However, I would probably still read the entire object hierarchy and add the required filter logic to the subclass.

        By the way, if you have any additional questions, don’t hesitate to ask them.

        Reply
  • very nice article ,
    you have any article for read/write txt files using spring batch or any java approach

    Reply
    • Hi,

      Unfortunately I don’t have any such blog posts. However, I can help you to find one if you provide additional information about your use case. For example, it would be useful to know what kind of format is used by your input files.

      Reply
      • ok, i’ll send you my input file. can you send me any mail or fb id? it will be easy to send file

        Reply
      • Petri , My use case is like that: for single record(person) i have multiple data and each data have own identification code sush like
        code[10] means details about the person
        code[20] means details about id prof like
        code[30] means name of the zip file which having image of person this file name will be in details [10]
        like this supposed i have 100 persons of records
        10|1|01|BR|01| | |02|22022017125698_1|Mr|syoso||tom
        20|1|A|B8623650|18-12-2020|01|02|
        30|1|22022017125698_1_PHOTO_1.jpg|02|02|BR
        30|1|22022017125698_1_POA_1.jpg|05|02|BR|
        10|2|01|BR|01|02|22022017125698_2|Mr|thoms
        20|2|A|A3654002|18-12-2020|01|02||
        30|2|22022017125698_2_PHOTO_1.jpg|02|02|BR
        30|2|22022017125698_2_POI_1.jpg|09|02|BR|

        Reply
        • Hi,

          You can read your input data by using the technique described in this blog post. However, you must ensure that the created LineTokenizer object uses the ‘|’ character as a delimiter character. You can configure the delimiter character by passing the string: “|” as a method parameter when you invoke the setDelimiter() method of the DelimitedLineTokenizer class.

          Reply
  • There is a lot going on in this class and it is not all written at once, so we are going to go through the code in steps. Visit GitHub to see the code in its entirety. As the Spring Batch documentation states FlatFileIteamReader will “read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma)”.

    Reply
  • Hi ,

    could you please suggest in spring batch ,file by file reader

    Reply
    • Hi,

      What is the format of the source file?

      Reply

Leave a Comment

Cancel reply