Spring Batch Tutorial: Reading Information From a CSV File

The previous parts of my Spring Batch tutorial provided an introduction to Spring Batch and described how you can get the required dependencies by using either Maven or Gradle.

After you have downloaded the required dependencies, you can start writing Spring Batch jobs. The first thing that you have to do is to provide the input data for your batch job. This blog post helps you to read the input data a from CSV file.

After you have read this blog post, you:

  • Can read the input data of your batch job from a CSV file.
  • Understand how you can transform a line read from a CSV file into a domain object.

Let's start by taking a quick look at the example application.

This blog post assumes that:

Introduction to the Example Application

During this blog post you will read the input data of your batch job from a CSV file which contains the student information of an online course. To be more specific, the CSV file contains a student list that provides the following information to your batch job:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

The content of your input file looks as follows:

NAME;EMAIL_ADDRESS;PACKAGE
Tony Tester;tony.tester@gmail.com;master
Nick Newbie;nick.newbie@gmail.com;starter
Ian Intermediate;ian.intermediate@gmail.com;intermediate

The ItemReader which reads the student list from a CSV file must return StudentDTO objects. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {

    private String emailAddress;
    private String name;
    private String purchasedPackage;

    public StudentDTO() {}

    public String getEmailAddress() {
        return emailAddress;
    }

    public String getName() {
        return name;
    }

    public String getPurchasedPackage() {
        return purchasedPackage;
    }

    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Next, you will find out how you can read the input data of your batch job from a CSV file.

Reading the Input Data From a CSV File

You can provide the input data for your batch job by configuring an ItemReader bean. Because you must read the student information from a CSV file, you have to configure this bean by following these steps:

First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {
}

Second, you have to write a private method that returns a LineMapper<StudentDTO> object. This object transforms a String object read from the source CSV file into a domain object. You can write this method by following these steps:

  1. Create a new DefaultLineMapper<StudentDTO> object.
  2. Create a new DelimitedLineTokenizer object. Ensure that the created object splits the student information line into tokens by using semicolon (;) as a delimiter character and configure the names of each token. The names of these tokens must match with the field names of the target class (StudentDTO).
  3. Ensure that the DefaultLineMapper<StudentDTO> object splits each row into tokens by using the created DelimitedLineTokenizer object.
  4. Create a new BeanWrapperFieldSetMapper<StudentDTO> object which maps the tokenized input data into a domain object by using bean property paths. Remember to ensure that the created object creates new StudentDTO objects.
  5. Ensure that the DefaultLineMapper<StudentDTO> object creates new StudentDTO objects by using the created BeanWrapperFieldSetMapper<StudentDTO> object.
  6. Return the created DefaultLineMapper<StudentDTO> object.

After you have written this method, the source code of your configuration class looks as follows:

import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {

    private LineMapper<StudentDTO> createStudentLineMapper() {
        DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>();

        LineTokenizer studentLineTokenizer = createStudentLineTokenizer();
        studentLineMapper.setLineTokenizer(studentLineTokenizer);

        FieldSetMapper<StudentDTO> studentInformationMapper =
                createStudentInformationMapper();
        studentLineMapper.setFieldSetMapper(studentInformationMapper);

        return studentLineMapper;
    }

    private LineTokenizer createStudentLineTokenizer() {
        DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer();
        studentLineTokenizer.setDelimiter(";");
        studentLineTokenizer.setNames(new String[]{
                "name", 
                "emailAddress", 
                "purchasedPackage"
        });
        return studentLineTokenizer;
    }

    private FieldSetMapper<StudentDTO> createStudentInformationMapper() {
        BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper =
                new BeanWrapperFieldSetMapper<>();
        studentInformationMapper.setTargetType(StudentDTO.class);
        return studentInformationMapper;
    }
}

Third, you have to create a method that configures your ItemReader bean and ensure that this method returns an ItemReader<StudentDTO> object. After you have created this method, you have to implement it by following these steps:

  1. Create a new FlatItemReaderBuilder<StudentDTO> object. This builder creates FlatItemReader<StudentDTO> objects which read lines from the specified Resource.
  2. Configure the name of the created ItemReader.
  3. Configure the location of the CSV file which contains the input data of your batch job. Because I wanted to create an example application that’s as easy to run as possible, I ensured that the input file (data/students.csv) of your batch job is found from the classpath
  4. Ignore the header line of the CSV file.
  5. Configure the used LineMapper<StudentDTO> object which transforms a String object read from the CSV file into a domain object (StudentDTO).
  6. Create a new FlatItemReader<StudentDTO> object and return the created object.

After you have configured your ItemReader bean, the source code of your configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.file.LineMapper;
import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder;
import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper;
import org.springframework.batch.item.file.mapping.DefaultLineMapper;
import org.springframework.batch.item.file.mapping.FieldSetMapper;
import org.springframework.batch.item.file.transform.DelimitedLineTokenizer;
import org.springframework.batch.item.file.transform.LineTokenizer;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;

@Configuration
public class SpringBatchExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader() {
        LineMapper<StudentDTO> studentLineMapper = createStudentLineMapper();

        return new FlatFileItemReaderBuilder<StudentDTO>()
                .name("studentReader")
                .resource(new ClassPathResource("data/students.csv"))
                .linesToSkip(1)
                .lineMapper(studentLineMapper)
                .build();
    }

    private LineMapper<StudentDTO> createStudentLineMapper() {
        DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>();

        LineTokenizer studentLineTokenizer = createStudentLineTokenizer();
        studentLineMapper.setLineTokenizer(studentLineTokenizer);

        FieldSetMapper<StudentDTO> studentInformationMapper =
                createStudentInformationMapper();
        studentLineMapper.setFieldSetMapper(studentInformationMapper);

        return studentLineMapper;
    }

    private LineTokenizer createStudentLineTokenizer() {
        DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer();
        studentLineTokenizer.setDelimiter(";");
        studentLineTokenizer.setNames(new String[]{
                "name",
                "emailAddress",
                "purchasedPackage"
        });
        return studentLineTokenizer;
    }

    private FieldSetMapper<StudentDTO> createStudentInformationMapper() {
        BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper =
                new BeanWrapperFieldSetMapper<>();
        studentInformationMapper.setTargetType(StudentDTO.class);
        return studentInformationMapper;
    }
}
You must configure the location of your input file by creating a new ClassPathResource object because I wanted to create an example application that's as easy to run as possible. Typically, the input file of your batch job is found from the file system. This means that you can configure its location by creating a new FileSystemResource object.

Additional Reading:

You can now read the input data of your batch job from a CSV file. Let's summarize what you learned from this blog post.

Summary

This blog post has taught you four things:

  • You can read data from a CSV file by using the FlatItemReader<T> class.
  • The FlatItemReader<T> class transforms lines read from the input file into domain objects by using a LineMapper<T> object.
  • The DelimitedLineTokenizer class can split the input data into tokens by using the specified delimiter character. Also, this class allows you to configure the field names which are used to populate the fields of the created domain object.
  • The BeanWrapperFieldSetMapper<T> class can transform the tokenized input data into a domain object by using bean property paths.

The next part of this tutorial describes how you can read the input data of your batch job from an XML file.

P.S. You can get the example application from Github.

32 comments… add one
  • Charles Jun 24, 2016 @ 17:03

    Being very (just a few days) new to Spring Batch, the first question I have is, once the classes described above are created, how do you run them to actually see the data defined in the XML file in the database? Thank you.

    • Petri Jun 29, 2016 @ 10:37

      I will describe the required steps in my upcoming blog posts, but basically you have to follow these steps:

      1. Configure a Step that has an ItemReader and ItemWriter.
      2. Configure a Job that contains the created Step.
      3. Create a component that runs your Job.
      4. Ensure that the Spring container finds your component during classpath scan.

      For example, if you want see the configuration of a Spring Batch job that reads information from a CSV file and writes it to a database, you should take a look at this package. Also, you might want to take a look at this resource page that contains links to other Spring Batch tutorials.

  • vk Aug 29, 2016 @ 5:49

    Thanks.

    How to read/write multiple txt files (which are not same) using spring batch with single field setmapper class?

  • noopur Sep 27, 2016 @ 14:13

    How to simply read a .csv file and show it to console only

    • Petri Sep 27, 2016 @ 18:16

      If you want to use Spring Batch, you need to create an ItemWriter that prints the information to console. However, you probably don't need Spring Batch for this. If you want to use a simpler approach, take a look at this blog post.

  • Michael Nov 20, 2016 @ 18:44

    1. Do you have any samples with reading multiple FragmentRootElementName from XML file?
    2. I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.

    • Petri Nov 23, 2016 @ 17:04

      Hi,

      Do you have any samples with reading multiple FragmentRootElementName from XML file?

      No :(

      I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.

      Yes. If I remove that annotation, the UnmarshallingFailureException is thrown:

      
      org.springframework.oxm.UnmarshallingFailureException: JAXB unmarshalling exception; 
      nested exception is javax.xml.bind.UnmarshalException
       - with linked exception:
      [com.sun.istack.internal.SAXParseException2; lineNumber: 2; columnNumber: 14; 
      unexpected element (uri:"", local:"student"). Expected elements are (none)]
      
      

      Can you configure the code generator to include these annotations?

  • Derek Dec 14, 2016 @ 18:10

    Suppose StudentDTO has a field that is a Foreign Key? It seems hibernate wants to try to insert an object into the SQL statement?

    • Petri Dec 15, 2016 @ 21:43

      Are you trying to insert something to the database by using Hibernate? If so, you need to transform the StudentDTO object into an entity and persist it to the database. If the entity has relationships with other entities, you need to obtain references to these objects before you persist the created entity to the database.

  • Rahul Aug 20, 2017 @ 5:54

    Hi Petri,

    I have downloaded the source code from Github. Please let me know how to run. Which is the first class that needs to be executed?

    • Petri Aug 22, 2017 @ 18:17

      Hi,

      Each example has a README that explains how you can run the example application by using either Maven or Gradle. If you want to run the Spring example, you should take a look at this README. On the other hand, if you want to run the Spring Boot example, you should take a look at this README.

  • osama Jan 10, 2018 @ 15:10

    How do i first fetch a csv file from any physical location then stream this csv file and read it in spring batch.
    if possible, could explain this without using annotation also.

    • Petri Jan 11, 2018 @ 8:36

      Hi,

      Unfortunately Spring Batch doesn't have a very good support for reading files from paths that are not known when the application context is started. That's why people often end up writing custom tasklets which copies the input files to the path that is given to the used ItemWriter.

      I think that you can solve your problem by following these steps:

      1. Write a tasklet that copies the input file to the known path.
      2. Create a step that invokes your tasklet and ensure that this step is run before the step that processes the input file.

      If you don't know how you can create custom tasklets, you should take a look at this blog post. Also, if you have any other questions, don't hesitate to ask them.

  • JH Apr 30, 2018 @ 2:08

    Hi
    what about nested elements ? I mean if a student is represented like that:

    Tony Tester

    Maths
    221

    Chemistry
    222

    tony.tester@gmail.com
    master

    How to retrieves courses ? is it possible to retrieve only courses ?
    Thanks

    • JH Apr 30, 2018 @ 2:12

      sorry xml is not supported, so here is my illustrated xml sample as I talked about in my previous post (without ), I hope It's pretty clear...

      student
      name Tony Tester name
      courses
      course
      name Maths name
      room 221 room
      course
      course
      name Chemistry name
      room 222 room
      course
      courses
      emailAddress tony.tester@gmail.com emailAddress
      purchasedPackage master purchasedPackage
      student

      • Petri May 2, 2018 @ 21:27

        Hi,

        I have to admit that I don't know if it is possible to configure the StaxEventItemReader to read only the child elements of the root element. One option is to extend that class and make the required changes to the subclass. However, I would probably still read the entire object hierarchy and add the required filter logic to the subclass.

        By the way, if you have any additional questions, don't hesitate to ask them.

  • VRadhe Feb 22, 2019 @ 11:18

    very nice article ,
    you have any article for read/write txt files using spring batch or any java approach

    • Petri Feb 22, 2019 @ 14:49

      Hi,

      Unfortunately I don't have any such blog posts. However, I can help you to find one if you provide additional information about your use case. For example, it would be useful to know what kind of format is used by your input files.

      • vRadhe Feb 24, 2019 @ 11:35

        ok, i'll send you my input file. can you send me any mail or fb id? it will be easy to send file

      • VRadhe Mar 4, 2019 @ 8:09

        Petri , My use case is like that: for single record(person) i have multiple data and each data have own identification code sush like
        code[10] means details about the person
        code[20] means details about id prof like
        code[30] means name of the zip file which having image of person this file name will be in details [10]
        like this supposed i have 100 persons of records
        10|1|01|BR|01| | |02|22022017125698_1|Mr|syoso||tom
        20|1|A|B8623650|18-12-2020|01|02|
        30|1|22022017125698_1_PHOTO_1.jpg|02|02|BR
        30|1|22022017125698_1_POA_1.jpg|05|02|BR|
        10|2|01|BR|01|02|22022017125698_2|Mr|thoms
        20|2|A|A3654002|18-12-2020|01|02||
        30|2|22022017125698_2_PHOTO_1.jpg|02|02|BR
        30|2|22022017125698_2_POI_1.jpg|09|02|BR|

        • Petri Mar 10, 2019 @ 20:48

          Hi,

          You can read your input data by using the technique described in this blog post. However, you must ensure that the created LineTokenizer object uses the '|' character as a delimiter character. You can configure the delimiter character by passing the string: "|" as a method parameter when you invoke the setDelimiter() method of the DelimitedLineTokenizer class.

  • Lee_Kenneth_2020 Apr 6, 2020 @ 5:55

    There is a lot going on in this class and it is not all written at once, so we are going to go through the code in steps. Visit GitHub to see the code in its entirety. As the Spring Batch documentation states FlatFileIteamReader will “read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma)”.

  • Anjan Sep 4, 2020 @ 11:08

    Hi ,

    could you please suggest in spring batch ,file by file reader

    • Petri Sep 10, 2020 @ 17:26

      Hi,

      What is the format of the source file?

  • Maheswar Jul 7, 2021 @ 10:34

    How to read a file which is available in remote location ?

  • Noor Aug 16, 2022 @ 14:50

    what are the Spring batch reader changes to be made read file content from Azure cloud ,instead of reading file from local path?

    • Petri Aug 21, 2022 @ 14:34

      You have to write a custom ItemReader which reads the file from Azure cloud. Unfortunately it's impossible to give a more specific answer because you didn't specify how we can access the file. For example, can we access it via HTTP?

  • bankar Jun 23, 2023 @ 16:58

    Hi ,
    I my use case we need to create object of three classes based on the parameter we get in first column
    Example we have below csv file -

    1,shubham,student
    1,ankit,teacher
    2,vidhya mandir,234 tea store near , nursely to secondary
    2,convent school , 934 anup nagar , only secondary
    3,23,fans ,98,4983,rupees

    if we have first column value as 1 we need to create student object
    if we have first column value as 2 we need to create school object
    if we have first column value as 3 we need to create item object

    how can we achieve this?

Leave a Reply