I released the starter package of my Test With Spring course. Take a look at the course >>

Spring Batch Tutorial: Introduction

data processing

Most enterprise applications rely heavily on batch jobs. They run during the night and do all the time consuming tasks that cannot be done during business hours. These tasks are often critical to the business and errors can cause serious damage (i.e. cost a lot of money).

That is why it is important that we write robust batch jobs that provide the correct output, are fault tolerant, and are as fast as possible. Spring Batch can help us to achieve these goals.

This blog post is the first part of my Spring Batch tutorial. It specifies the term batch job, explains why we should use Spring Batch instead of writing our own batch jobs, and identifies the basic building blocks of a Spring Batch job.

Let’s start by defining the term batch job.

What Is a Batch Job?

A batch job is often defined as follows:

A batch job is a computer program or set of programs processed in batch mode. This means that a sequence of commands to be executed by the operating system is listed in a file (often called a batch file, command file, or shell script) and submitted for execution as a single unit.

However, this definition is not very pragmatic, and it doesn’t help us to understand what kind of a batch jobs are required by a typical enterprise application. That is why I will provide my own definition:

A batch job reads input data, processes the input data, and writes the processed data to the configured output.

The following figure illustrates a simple batch job that fulfills my definition:

simplebatchjob

As we can see, this batch job has only one step. This is perfectly fine if our batch job has only one logical task. For example, if we are implementing an import job that reads information from an input file and writes it in the database, our job has only one logical task.

However, most batch jobs have more than one logical task. For example, we might have to implement a batch job that imports information from an input file and creates an export file that is exported to other applications. In other words, our batch job has two logical tasks. This means that it has two steps as well.

It seems that I have to rewrite my definition. The final version is:

A batch job consists of one or more steps. Each step is responsible of completing one logical task. Every step reads input data, processes the input data, and writes the processed data to the configured output. If the batch job has more one step, the output of a step is often used as an input of the next step.

The following figure illustrates a batch job that has two steps:

batchjobwithmultiplesteps

We have now defined the term batch job. Let’s find out why we should implement our batch jobs by using Spring Batch.

How Can Spring Batch Help Us?

I have written a lot of batch jobs during my career and seen many batch jobs written by other developers. I have noticed that most non-trivial batch jobs (including mine), which don’t use any framework or library, suffer from these problems:

  • The code that implements the batch job is a mess. Because it has only one huge step, no one cannot really understand how the batch job works.
  • The batch job is slow because it does everything inside a HUGE transaction.
  • The batch job doesn’t have a real error handling. If an error occur during a batch job, the job simply fails. However, if we are lucky, the batch job might write an error message to a log file.
  • The batch job doesn’t report its final state. In other words, there is no easy way to figure out if the batch job was finished successfully.

We can (of course) fix everyone of these problems. If we decide to follow this approach, we face two new problems:

  • We have to essentially create an in-house batch job framework and it is extremely hard to get everything right at the first time.
  • Creating an in-house batch job framework is a big task and it takes time that we don’t often have. This means that we cannot fix the problems found from the first version of our batch job framework because we don’t have time to do it. That is why all in-house frameworks have their own oddities.

Luckily, we don’t have to implement our own batch job framework because Spring Batch solves all of these problems. It provides the following features that helps us to solve these problems:

  • It helps us to structure our code in a clean way by providing the infrastructure that is used to implement, configure, and run batch jobs.
  • It uses so called chunk oriented processing where items are processed one by one and the transaction is committed when the chunk size is met. In other words, it provides us an easy way to manage the size of our transactions.
  • It provides proper error handling. For example, we can skip items if an exception is thrown and configure retry logic that is used to determine whether our batch job should retry the failed operation. We can also configure the logic that is used to decide whether or not our transaction is rolled back.
  • It writes comprehensive log in the used database. This log contains the metadata of each job execution and step execution, and we can use it for troubleshooting purposes. We can access this information by using a database client or a graphical admin user interface.

We are now aware of the fact that Spring Batch can help us to solve the problems of handwritten batch jobs. Let’s move on and take a quick look at the anatomy of a Spring Batch job.

The Anatomy of a Spring Batch Job

A Spring Batch job consists of the following components:

  • The Job represents the Spring Batch job. Each job can have one or more steps.
  • The Step represents an independent logical task (i.e. import information from an input file). Each step belongs to one job.
  • The ItemReader reads the input data and provides the found items one by one. An ItemReader belongs to one step and each step must have only one ItemReader.
  • The ItemProcessor transforms items into a form that is understood by the ItemWriter one item at a time. An ItemProcessor belongs to one step and each step can have one ItemProcessor.
  • The ItemWriter writes an information of an item to the output one item at a time. An ItemWriter belongs to one step and a step must have only one ItemWriter

The following figure illustrates the relationships of these concepts:

springbatchjob

Let’s summarize what we learned from this blog post.

Summary

This blog post has taught us three things:

  • We learned how we can specify the term batch job.
  • We learned that Spring Batch solves the problems that are often found from handwritten batch jobs.
  • We learned to identify the basic building blocks of a Spring Batch job.

The next part of this tutorial describes how we can get the required dependencies with Maven.

About the Author

Petri Kainulainen is passionate about software development and continuous improvement. He is specialized in software development with the Spring Framework and is the author of Spring Data book.

About Petri Kainulainen →

25 comments… add one
  • Thanks Petri. Waiting for the next part.

    Reply
    • You are welcome!

      Reply
  • Simple and precise post. Thanks Perri.

    Reply
    • Sorry for the typo. Thanks Petri.

      Reply
      • You are welcome!

        Reply
  • Its Working like Job Queue or not???
    Like I have an 10 number of Job’s to execute.
    Execute 3 job’s parallel and any of the 3 job that execution finish next job pick-up from job Queue.

    Reply
  • The article has explained the purpose of spring batch much simpler and easier.
    Thanks Petri.

    Reply
    • You are welcome!

      Reply
  • Petri,
    Thank you for the article. I was wondering if you have done any reporting on how the batch proceeded (succeess or failure, and where and how) using Spring Batch components?

    Reply
    • Hi,

      Do you need to simply see the end status of a job execution (success or failure), the error message (if the job failed), and the failed step? If this information is enough, you can get it from the batch_job_execution and batch_step_execution database tables. Naturally this means that you have to write the code that shows it on the UI.

      On the other hand, if you need have a “real-time” solution that sends a notification when a job or a step is finished, you could implement a custom JobExecutionListener or StepExecutionListener that sends a notification after a job or a step has finished.

      I hope that this helps. If you have any additional questions, don’t hesitate to ask them.

      Reply
  • Thank you Petri

    Reply
    • You are welcome :)

      Reply
  • Nice post. Thanks
    Excuse, there is this question which is bothering me if I may, I ran the project https://spring.io/guides/gs/batch-processing/ which collect that from a file and writer them in a dababase. It is using chunk to do the magic. So here is my question: how can I get this done parallelly with chunk?
    Thanks.

    Reply
    • Hi,

      Spring Batch does support parallel processing. That being said, it is a bit hard to say for sure if you can achieve your goals with it because every case is a bit different and I don’t know the requirements of your use case.

      Reply
  • Nice, thx

    Reply
    • You are welcome!

      Reply
  • Hi Petri,

    Thank you so much for your tutorials. They are very helpful and I really love them.

    Reply
  • Awesome!, You made It easy.. :) Thank you

    Reply
    • You are welcome!

      Reply
  • Awesome

    Reply
    • Thanks!

      Reply
  • thanks for writing in simple language and with clarity. have one doubt in the diagram that u have created. Why are u showing Step — 1—-0.1 — Item processor, if we have one Item processor per step.
    Thanks

    Reply
    • Hi,

      A step can have one ItemProcessor, but you can also create a step that doesn’t have an ItemProcessor.

      Reply

Leave a Comment