Spring Batch Tutorial: Reading Information From an XML File

The previous part of my Spring Batch tutorial described how you can read information from a CSV File. This time you will learn to read the input data of your Spring Batch job from an XML file.

After you have read this blog post, you:

  • Can identify the dependencies which are required when you want to read the input data of your batch job from an XML file.
  • Can get the required dependencies with Maven and Gradle.
  • Understand how you can configure an ItemReader bean which reads the input data of your batch job from an XML file.

Let's start by taking a quick look at the example application.

This blog post assumes that:

Introduction to the Example Application

During this blog post you will read the input data of your batch job from an XML file which contains the student information of an online course. To be more specific, the XML file contains a student list that provides the following information to your batch job:

  • The name of the student.
  • The email address of the student.
  • The name of the purchased package.

The content of your input file looks as follows:

<students>
    <student>
        <name>Tony Tester</name>
        <emailAddress>tony.tester@gmail.com</emailAddress>
        <purchasedPackage>master</purchasedPackage>
    </student>
    <student>
        <name>Nick Newbie</name>
        <emailAddress>nick.newbie@gmail.com</emailAddress>
        <purchasedPackage>starter</purchasedPackage>
    </student>
    <student>
        <name>Ian Intermediate</name>
        <emailAddress>ian.intermediate@gmail.com</emailAddress>
        <purchasedPackage>intermediate</purchasedPackage>
    </student>
</students>

The ItemReader which reads the student list from an XML file must return StudentDTO objects. The StudentDTO class contains the information of a single student, and its source code looks as follows:

public class StudentDTO {

    private String emailAddress;
    private String name;
    private String purchasedPackage;

    public StudentDTO() {}

    public String getEmailAddress() {
        return emailAddress;
    }

    public String getName() {
        return name;
    }

    public String getPurchasedPackage() {
        return purchasedPackage;
    }

    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}

Next, you will get the required dependencies with Maven and Gradle.

Getting the Required Dependencies

Before you can read information from an XML file, you must get the following dependencies:

  • The spring-oxm dependency helps you to serialize objects to XML documents and deserialize XML documents to objects.
  • The jaxb-api dependency allows you to compile code which uses the JAXB API when you are using Java 11 or newer.
  • The jaxb-runtime dependency allows you to run an application which uses the JAXB API when you are using Java 11 or newer.

If you are using the dependency management of Spring Boot with Maven, you can get these dependencies by adding the following dependency declarations to the dependencies section of your POM file:

<dependency>
    <groupId>javax.xml.bind</groupId>
    <artifactId>jaxb-api</artifactId>
</dependency>
<dependency>
    <groupId>org.glassfish.jaxb</groupId>
    <artifactId>jaxb-runtime</artifactId>
    <scope>runtime</scope>
</dependency>
<dependency>
    <groupId>org.springframework</groupId>
    <artifactId>spring-oxm</artifactId>
</dependency>

If you are using the dependency management of Spring Boot with Gradle, you can get these dependencies by adding the following dependency declarations to your build.gradle file:

dependencies {
    implementation(
            'javax.xml.bind:jaxb-api',
            'org.springframework:spring-oxm'
    )
    runtimeOnly(
            'org.glassfish.jaxb:jaxb-runtime'
    )
}

Let's move and find out how you can read the input data of your batch job from an XML file.

Reading the Input Data From an XML File

You can provide the input data for your batch job by configuring an ItemReader bean. Because you must read the student information from an XML file, you have to configure this bean by following these steps:

First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:

import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {
}

Second, you have to create a method that configures your ItemReader bean and ensure that this method returns an ItemReader<StudentDTO> object. After you have created this method, the source code of your configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;

@Configuration
public class SpringBatchExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader() {
        
    }
}

Third, you have to configure your ItemReader bean by following these steps:

  1. Create a new StaxEventItemReaderBuilder<StudentDTO> object. This builder creates StaxEventItemReader<StudentDTO> objects which read the input data from an XML file by using StAX (the Streaming API for XML).
  2. Configure the name of the ItemReader.
  3. Configure the location of the XML file which contains the input data of your batch job. Because I wanted to create an example application that's as easy to run as possible, I ensured that the input file (data/students.xml) of your batch job is found from the classpath.
  4. Configure the name of the XML element (student) that contains the information of a single student.
  5. Ensure that the StaxEventItemReader<StudentDTO> object transforms the processed XML fragment into a StudentDTO object by using JAXB2.
  6. Create a new StaxEventItemReader<StudentDTO> object and return the created object.

After you have configured your ItemReader bean, the source code of your configuration class looks as follows:

import org.springframework.batch.item.ItemReader;
import org.springframework.batch.item.xml.builder.StaxEventItemReaderBuilder;
import org.springframework.context.annotation.Bean;
import org.springframework.context.annotation.Configuration;
import org.springframework.core.io.ClassPathResource;
import org.springframework.oxm.jaxb.Jaxb2Marshaller;

@Configuration
public class SpringBatchExampleJobConfig {

    @Bean
    public ItemReader<StudentDTO> itemReader() {
        Jaxb2Marshaller studentMarshaller = new Jaxb2Marshaller();
        studentMarshaller.setClassesToBeBound(StudentDTO.class);

        return new StaxEventItemReaderBuilder<StudentDTO>()
                .name("studentReader")
                .resource(new ClassPathResource("data/students.xml"))
                .addFragmentRootElements("student")
                .unmarshaller(studentMarshaller)
                .build();
    }
}
You must configure the location of your input file by creating a new ClassPathResource object because I wanted to create an example application that's as easy to run as possible. Typically, the input file of your batch job is found from the file system. This means that you can configure its location by creating a new FileSystemResource object.

Additional Reading:

Before your ItemReader bean can transform the student information read from the student.xml file into StudentDTO objects, you have to configure the name of the fragment root element in the StudentDTO class. We can do this by following these steps:

  1. Annotate the class with the @XmlRootElement annotation.
  2. Configure the name of the root element by setting the value of the @XmlRootElement annotation's name attribute to: 'student'.

After you have made this change to the StudentDTO class, its source code looks as follows:

import javax.xml.bind.annotation.XmlRootElement;

@XmlRootElement(name="student")
public class StudentDTO {

    private String emailAddress;
    private String name;
    private String purchasedPackage;

    public StudentDTO() {}

    public String getEmailAddress() {
        return emailAddress;
    }

    public String getName() {
        return name;
    }

    public String getPurchasedPackage() {
        return purchasedPackage;
    }

    public void setEmailAddress(String emailAddress) {
        this.emailAddress = emailAddress;
    }

    public void setName(String name) {
        this.name = name;
    }

    public void setPurchasedPackage(String purchasedPackage) {
        this.purchasedPackage = purchasedPackage;
    }
}
Because the element names of the processed XML fragments are the same as the field names of the StudentDTO class, you don't have to add additional annotations to the StudentDTO class. If the element names aren't the same as the field names of the target class, or you have to use custom marshalling, you have to annotate your DTO class with JAXB annotations.

Additional Reading:

You can now read the input data of your batch job from an XML file. Let's summarize what you learned from this blog post.

Summary

This blog post has taught you five things:

  • The spring-oxm dependency helps you to serialize objects to XML documents and deserialize XML documents to objects.
  • The jaxb-api dependency allows you to compile code which uses the JAXB API when you are using Java 11 or newer.
  • The jaxb-runtime dependency allows you to run an application which uses the JAXB API when you are using Java 11 or newer.
  • You can read the input data of your batch job from an XML file by using the StaxEventItemReader<T> class.
  • You must configure the name of the fragment root element by annotating your DTO class with the @XmlRootElement annotation.

The next part of my Spring Batch tutorial describes how you can read the input data of your batch job from a relational database.

P.S. You can get the example application of this blog post from Github.

2 comments… add one
  • mat Nov 29, 2022 @ 8:28

    What is the exact location of the xml file? because I have tried all tricks from Internet to get the find file and it seems to be impossible ... only possibility is to use absolute path which is useless.

    • Petri Dec 8, 2022 @ 21:30

      As this example demonstrates, you can configure the location of the XML file by using a Spring Resource. I read the XML file from the classpath, but you can read it from the file system or read it from a URL. Also, you can naturally implement your own Resource.

Leave a Reply