Are you tired of writing tests which have a lot of boilerplate code? If so, get started with Spock Framework >>

Spring Batch Tutorial: Creating a Custom ItemReader

Spring Batch has a good support for reading the input data of a batch job from different data sources such as files (CSV, XML) and databases.

However, it’s quite common that we have to read the input data from a data source that’s not supported out of the box. This means that we have to implement a component that reads the input data from our data source.

This blog post helps us to solve that problem. After we have read this blog post, we:

  • Understand how we can implement a custom ItemReader.
  • Know how we can configure the ItemReader bean which provides the input data for our batch job.

Let’s begin.

This blog post assumes that:

Creating a Custom ItemReader

We can create a custom ItemReader by following these steps:

First, we have to create a class that implements the ItemReader<T> interface and provide the type of the returned object as a type parameter.

Second, we have to implement the T read() method of the ItemReader<T> interface by following these rules:

  • The read() method returns an object that contains the information of the next item.
  • If the next item isn’t found, the read() method must return null.

Let’s create a custom ItemReader that returns the student information of an online testing course as StudentDTO objects which are read from the memory.

The StudentDTO class is a simple data transfer object, and its source code looks as follows:

public class StudentDTO {
  
    private String emailAddress;
    private String name;
    private String purchasedPackage;
  
    public StudentDTO() {}
  
    public String getEmailAddress() {
        return emailAddress;
    }
  
    public String getName() {
        return name;
    }
  
    public String getPurchasedPackage() {
        return purchasedPackage;
    }
  
    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }
  
    public void setName(String name) {
        this.name = name;
    }
  
    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

We can implement our ItemReader by following these steps:

First, we have to create class that implement the ItemReader<T> interface and specify the type of the objects which are returned by the T read() method. After we have created this class, its source code looks as follows:

import org.springframework.batch.item.ItemReader;

public class InMemoryStudentReader implements ItemReader<StudentDTO> {
}

Second, we have to initialize the input data that’s returned by our ItemReader. We can initialize our input data by following these steps:

  1. Add a List<Student> field to our ItemReader class. This field contains the student information of the course.
  2. Add an int field called nextStudentIndex to our ItemReader class. This field contains the index of the next StudentDTO object that’s returned by our ItemReader.
  3. Add a private initialize() method to our ItemReader class. This method creates the student data and sets the index of the next student to 0.
  4. Create a constructor that invokes the initialize() method.

After we have initialized our input data, the source code of our ItemReader class looks as follows:

import org.springframework.batch.item.ItemReader;

import java.util.Arrays;
import java.util.Collections;
import java.util.List;

public class InMemoryStudentReader implements ItemReader<StudentDTO> {

    private int nextStudentIndex;
    private List<StudentDTO> studentData;

    InMemoryStudentReader() {
        initialize();
    }

    private void initialize() {
        StudentDTO tony = new StudentDTO();
        tony.setEmailAddress("tony.tester@gmail.com");
        tony.setName("Tony Tester");
        tony.setPurchasedPackage("master");

        StudentDTO nick = new StudentDTO();
        nick.setEmailAddress("nick.newbie@gmail.com");
        nick.setName("Nick Newbie");
        nick.setPurchasedPackage("starter");

        StudentDTO ian = new StudentDTO();
        ian.setEmailAddress("ian.intermediate@gmail.com");
        ian.setName("Ian Intermediate");
        ian.setPurchasedPackage("intermediate");

        studentData = Collections.unmodifiableList(Arrays.asList(tony, nick, ian));
        nextStudentIndex = 0;
    }
}

Third, we have to implement the read() method of the ItemReader interface by following these rules:

  • If the next student is found, return the found StudentDTO object and increase the value of the nextStudentIndex field by 1.
  • If the next student isn’t found, set the value of the nextStudentIndex field to 0.
  • If the next student isn’t found, return null.

After we have implemented the read() method, the source code of our ItemReader class looks as follows:

import org.springframework.batch.item.ItemReader;

import java.util.Arrays;
import java.util.Collections;
import java.util.List;

public class InMemoryStudentReader implements ItemReader<StudentDTO> {

    private int nextStudentIndex;
    private List<StudentDTO> studentData;

    InMemoryStudentReader() {
        initialize();
    }

    private void initialize() {
        StudentDTO tony = new StudentDTO();
        tony.setEmailAddress("tony.tester@gmail.com");
        tony.setName("Tony Tester");
        tony.setPurchasedPackage("master");

        StudentDTO nick = new StudentDTO();
        nick.setEmailAddress("nick.newbie@gmail.com");
        nick.setName("Nick Newbie");
        nick.setPurchasedPackage("starter");

        StudentDTO ian = new StudentDTO();
        ian.setEmailAddress("ian.intermediate@gmail.com");
        ian.setName("Ian Intermediate");
        ian.setPurchasedPackage("intermediate");

        studentData = Collections.unmodifiableList(Arrays.asList(tony, nick, ian));
        nextStudentIndex = 0;
    }

    @Override
    public StudentDTO read() throws Exception {
        StudentDTO nextStudent = null;

        if (nextStudentIndex < studentData.size()) {
            nextStudent = studentData.get(nextStudentIndex);
            nextStudentIndex++;
        }
        else {
            nextStudentIndex = 0;
        }

        return nextStudent;
    }
}

After we have created our custom ItemReader class, we have to configure the ItemReader bean that provides the input data for our Spring Batch job. Next, we will find out how we can configure this bean.

Configuring the ItemReader Bean

We can configure our ItemReader bean by following these steps:

First, we have to create the configuration class that contains the beans which describe the flow of our batch job. The source code of our configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {
}

Second, we have to create the method that configures our ItemReader bean. This method must return an ItemReader<StudentDTO> object. After we have created this method, the source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader() {

    }
}

Third, we have to ensure that the ItemReader() method returns a new InMemoryStudentReader object. After we have implemented the ItemReader() method, the source code of our configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader() {
        return new InMemoryStudentReader();
    }
}

We can now create a custom ItemReader class and we understand how we can configure an ItemReader bean which provides the input data for our batch job. Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us four things:

  • We can create a custom ItemReader by implementing the ItemReader<T> interface.
  • When we implement the ItemReader<T> interface, we must provide the type of the returned object as a type parameter.
  • The T read() method of the ItemReader<T> interface must return the next T object. If the next object isn’t found, it must return null.
  • After we have created our custom ItemReader class, we have to configure the ItemReader bean that provides the input data for our Spring Batch job.

The next part of this tutorial describes how we can create a custom ItemReader that reads the input data of our batch job by using an external REST API.

P.S. You can get the example applications of this blog post from Github.

20 comments… add one
  • thank you

    Reply
    • You are welcome!

      Reply
  • Hi Petri,

    I have come across a scenario, where I have declared one exception in skippable exception, to skip the invalid records detected in item processor. Could you please help me with one possible way through which I can keep track of the all skipped records and at the end using job execution listener send a mail of all those skipped records.

    Any suggestions are appreciated.

    Reply
    • Hi Naveen,

      You can use a SkipListener for this purpose. It seems that you should implement your own SkipListener that keeps track of the skipped records (for example, it can save them to database). When the job is finished, you can simply fetch the skipped records and send the email that contains the required information.

      Reply
    • Hi Petri,
      I am using SpringBoot. my integration context xml file has int:jdbc inbound channel where I give database select query. I want Spring batch to load multiple table/multi records and process and write to XML output. Please give me some suggetion

      Reply
      • Hi,

        Take a look at this blog post. It explains how you can use the Spring Integration JDBC inbound channel adapter.

        Reply
  • Hi there,

    I have tried this but am having problems when I run my code with an IllegalArgumentException – Assertion Failure – this argument is required; it must not be null.

    Any ideas what could be causing that?

    Reply
    • Hi,

      Did you try to create a batch job that doesn’t have a reader and a writer?

      Reply
  • I have batch flow which has a reader and a writer. The reader is jdbccursorItemReader. The writer is custom Itemwriter which has a flatfileitemwriter property.

    My question is, if the reader, does not fetch any row, will the writer still execute?

    PS: in case the reader fetches 0 records i have to write an empty file using flatfileitemwriter.

    Reply
    • Hi,

      If the reader cannot find the next input record, its read() must return null. This means that the writer won’t write any lines to the file if the reader cannot find any records from the database. That being said, it does create the file and doesn’t delete it even if it is empty (as long as you use the default settings).

      Reply
  • @Petri,
    What about the thread safety of primitives used in ItemReader(nextStudentIndex )

    Reply
    • Well, since you asked that question, I assume that you realized that the example is not thread safe. If you need to ensure that the primitives are thread safe, you need to replace the int variable (nextStudentIndex) with AtomicInteger.

      Reply
  • Hi Petri,
    I want to create an ItemReader for reading data from database and the query for this reader i need parameter on the query to be replaced with some value i get from previous step (I was able to pass the value using execution context)

    for example my query is “select some_field from some_table where id = X”
    and I got the X value from previous step. How to construct the JdbcCursorItemReader?
    I was thinking maybe I can use setPreparedStatementSetter() like in your github example
    and the value I can get it using execution context from previous step.
    What do you think?

    I’m working with 2 datasource (mysql and postgresql) so I’m thinking maybe I create first step which read a value I needed and then pass that value (Long) to the next step and then in the second step I will contruct it using JdbcCursorItemReader and set prepared statement.
    I was thinking maybe I need a custom ItemReader for this?

    Reply
    • Hi,

      Like you said, if you need to pass information between steps, you can save this information to the job execution context.

      How to construct the JdbcCursorItemReader?
      I was thinking maybe I can use setPreparedStatementSetter() like in your github example
      and the value I can get it using execution context from previous step.
      What do you think?

      You can construct the JdBcCursorItemReader by following the instruction given in this blog post. Also, your idea sounds good to me.

      I was thinking maybe I need a custom ItemReader for this?

      You can read the required information and save it to the job execution context without writing a custom ItemReader (check the Spring Batch reference manual). However, you need to create a custom ItemReader that retrieves the information from the job execution context and finds the processed data by using this information.

      If you have any additional questions, don’t hesitate to ask them.

      Reply
  • Hi Petri,

    I have this spring batch job that fetches data from a mongodb database and stores the data in a postgres database. The job is scheduled to run at intervals (30mins) so newly inserted documents to the mongodb are migrated to the postgres database.

    On first run, records are migrated successfully, but subsequent triggers by the scheduler don’t migrate the new records. I’m using the MongoItemReader implementation in spring and i’ve validated that the queries are fine.

    Any idea what i could be doing wrong? I would appreciate your thoughts on this

    Reply

Leave a Comment