Spring Batch Tutorial: Introduction

Author: Petri Kainulainen Published: January 17, 2016 43 comments

Most enterprise applications rely heavily on batch jobs. They run during the night and do all the time consuming tasks that cannot be done during business hours. These tasks are often critical to the business and errors can cause serious damage (i.e. cost a lot of money).

That's why it is important to write robust batch jobs that provide the correct output, are fault tolerant, and are as fast as possible. Spring Batch can help you to achieve these goals.

This blog post is the first part of my Spring Batch tutorial, and it provides a quick introduction to Spring Batch. After you have read this blog post, you:

Can specify the term batch job.
Understand why you should use Spring Batch instead of writing your own batch jobs.
Can identify the basic building blocks of a Spring Batch job.

Let's start by defining the term batch job.

What Is a Batch Job?

A batch job is often defined as follows:

A batch job is a computer program or set of programs processed in batch mode. This means that a sequence of commands to be executed by the operating system is listed in a file (often called a batch file, command file, or shell script) and submitted for execution as a single unit.

However, this definition is not very pragmatic, and it doesn't help you to understand what kind of batch jobs are required by a typical enterprise application. That's why I will provide my own definition:

A batch job reads input data, processes the input data, and writes the processed data to the configured output.

The following figure illustrates a simple batch job that fulfills my definition:

simplebatchjob

As you can see, this batch job has only one step. This is perfectly fine if your batch job has only one logical task. For example, if you are implementing an import job that reads information from an input file and writes it to the database, your job has only one logical task.

However, some batch jobs have more than one logical task. For example, you might have to implement a batch job that imports information from an input file and creates an export file that's exported to other applications. In other words, your batch job has two logical tasks. This means that it has two steps as well.

It seems that I have to rewrite my definition. The final version is:

A batch job consists of one or more steps. Each step is responsible of completing one logical task. Every step reads input data, processes the input data, and writes the processed data to the configured output. If the batch job has more than one step, the output of a step is often used as an input of the next step.

The following figure illustrates a batch job that has two steps:

batchjobwithmultiplesteps

I have now defined the term batch job. Let's find out why you should implement your batch jobs by using Spring Batch.

How Can Spring Batch Help Us?

I have written a lot of batch jobs during my career and seen many batch jobs written by other developers. I have noticed that most non-trivial batch jobs (including mine), which don't use any framework or library, suffer from these problems:

The code that implements the batch job is a mess. Because it has only one huge step, no one cannot really understand how the batch job works.
The batch job is slow because it does everything inside a HUGE transaction.
The batch job doesn't have a real error handling. If an error occurs during a batch job, the job simply fails. However, if you are lucky, the batch job might write an error message to a log file.
The batch job doesn't clean up the output data that's written to the configured output if it fails. This is a problem because you cannot trust the data that's produced by the batch job. In other words, you have to ensure (manually) that the output data of the batch job is correct. This is a waste of time.
The batch job doesn't report its final state. In other words, there is no easy way to figure out if the batch job was finished successfully.

You can (of course) fix every one of these problems. If you decide to follow this approach, you face two new problems:

You have to essentially create an in-house batch job framework, and it is extremely hard to get everything right at the first time.
Creating an in-house batch job framework is a big task and it takes time that you don't often have. This means that you cannot fix the problems found from the first version of your batch job framework because you don't have time to do it. That's why all in-house frameworks have their own oddities.

Luckily, you don't have to implement your own batch job framework because Spring Batch solves all of these problems. It provides the following features that helps you to solve these problems:

It helps you to structure your code in a clean way by providing the infrastructure that's used to implement, configure, and run batch jobs.
It uses so called chunk oriented processing where items are processed one by one and the transaction is committed when the chunk size is met. In other words, it provides you an easy way to manage the size of your transactions.
It provides proper error handling. For example, you can skip items if an exception is thrown and configure retry logic that's used to determine whether your batch job should retry the failed operation. You can also configure the logic that's used to decide if your transaction should be rolled back.
It writes comprehensive log to the used database. This log contains the metadata of each job and step execution, and it's extremely useful if you have to troubleshoot a failed batch job. Because the log is written to a database, you can access it by using a database client.

You should now understand that Spring Batch solves the problems caused by handwritten batch jobs. Let's move on and take a quick look at the key components of a Spring Batch job.

The Key Components of a Spring Batch Job

A Spring Batch job consists of the following components:

The Job represents a single Spring Batch job. Each job can have one or more steps.
The Step represents an independent logical task (i.e. import information from an input file). Each step belongs to one job.
The ItemReader reads the input data and provides the found items one by one. An ItemReader belongs to one step and each step must have one ItemReader.
The ItemProcessor transforms items into a form that's understood by the ItemWriter one item at a time. An ItemProcessor belongs to one step and each step can have one ItemProcessor.
The ItemWriter writes an information of an item to the output one item at a time. An ItemWriter belongs to one step and each step must have one ItemWriter

The following figure illustrates the relationships of these components:

You can now define the term batch job, you understand why you should use Spring Batch, and you can identify the key components of a Spring Batch job. Let's summarize what you learned from this blog post.

Summary

This blog post has taught you five things:

A batch job consists of one or more steps. Each step is responsible of completing one logical task. Every step reads input data, processes the input data, and writes the processed data to the configured output. If the batch job has more than one step, the output of a step is often used as an input of the next step.
You should use Spring Batch because it solves the problems caused by handwritten batch jobs.
A Spring batch Job can have one or more steps.
A Step must have one ItemReader and ItemWriter.
A Step can have one ItemProcessor.

The next part of this tutorial describes how you can get the required dependencies with Maven.

43 comments… add one

ArunM Jan 19, 2016 @ 7:07

Thanks Petri. Waiting for the next part.

Reply Link
- Petri Jan 19, 2016 @ 15:10
  
  You are welcome!
  
  Reply Link
Loganathan Feb 1, 2016 @ 20:23

Simple and precise post. Thanks Perri.

Reply Link
- Loganathan Feb 1, 2016 @ 20:24
  
  Sorry for the typo. Thanks Petri.
  
  Reply Link
  - Petri Feb 1, 2016 @ 21:34
    
    You are welcome!
    
    Reply Link
mahesh May 19, 2016 @ 16:27

Its Working like Job Queue or not???
Like I have an 10 number of Job's to execute.
Execute 3 job's parallel and any of the 3 job that execution finish next job pick-up from job Queue.

Reply Link
Gahana Jun 12, 2016 @ 9:41

The article has explained the purpose of spring batch much simpler and easier.
Thanks Petri.

Reply Link
- Petri Jun 13, 2016 @ 13:50
  
  You are welcome!
  
  Reply Link
Zeigler White II Jun 23, 2016 @ 19:56

Petri,
Thank you for the article. I was wondering if you have done any reporting on how the batch proceeded (succeess or failure, and where and how) using Spring Batch components?

Reply Link
- Petri Jun 23, 2016 @ 23:41
  
  Hi,
  
  Do you need to simply see the end status of a job execution (success or failure), the error message (if the job failed), and the failed step? If this information is enough, you can get it from the batch_job_execution and batch_step_execution database tables. Naturally this means that you have to write the code that shows it on the UI.
  
  On the other hand, if you need have a "real-time" solution that sends a notification when a job or a step is finished, you could implement a custom JobExecutionListener or StepExecutionListener that sends a notification after a job or a step has finished.
  
  I hope that this helps. If you have any additional questions, don't hesitate to ask them.
  
  Reply Link
Anil Thakur Aug 1, 2016 @ 10:40

Thank you Petri

Reply Link
- Petri Aug 2, 2016 @ 13:51
  
  You are welcome :)
  
  Reply Link
Anonymous Aug 2, 2016 @ 22:22

Nice post. Thanks
Excuse, there is this question which is bothering me if I may, I ran the project https://spring.io/guides/gs/batch-processing/ which collect that from a file and writer them in a dababase. It is using chunk to do the magic. So here is my question: how can I get this done parallelly with chunk?
Thanks.

Reply Link
- Petri Aug 3, 2016 @ 12:33
  
  Hi,
  
  Spring Batch does support parallel processing. That being said, it is a bit hard to say for sure if you can achieve your goals with it because every case is a bit different and I don't know the requirements of your use case.
  
  Reply Link
Victor Oct 8, 2016 @ 15:47

Nice, thx

Reply Link
- Petri Oct 11, 2016 @ 19:52
  
  You are welcome!
  
  Reply Link
Ricky Nov 14, 2016 @ 4:38

Hi Petri,

Thank you so much for your tutorials. They are very helpful and I really love them.

Reply Link
Naveen Jan 28, 2017 @ 20:44

Awesome!, You made It easy.. :) Thank you

Reply Link
- Petri Jan 28, 2017 @ 22:33
  
  You are welcome!
  
  Reply Link
pintu singh Mar 9, 2017 @ 13:46

Awesome

Reply Link
- Petri Mar 9, 2017 @ 18:35
  
  Thanks!
  
  Reply Link
jaya Mar 14, 2017 @ 8:40

thanks for writing in simple language and with clarity. have one doubt in the diagram that u have created. Why are u showing Step --- 1----0.1 --- Item processor, if we have one Item processor per step.
Thanks

Reply Link
- Petri Mar 16, 2017 @ 12:03
  
  Hi,
  
  A step can have one ItemProcessor, but you can also create a step that doesn't have an ItemProcessor.
  
  Reply Link
krishnkant patidar Mar 31, 2017 @ 23:22

Hi Petri,

I have a demo.txt file containing 10000 lines. I want to read this file using batch partitioning .is it possible to use grid-size=5 means create five thread to read a single file.please suggest what should i do in this requirement.

Reply Link
- Petri Apr 19, 2017 @ 19:59
  
  Hi,
  
  Take a look at this StackOverflow answer. It explains the purpose of the grid-size configuration property.
  
  Reply Link
kanaiya Apr 18, 2017 @ 16:26

This is best tutorial i have seen for Spring Job. looking for your next tutorial.

Reply Link
- Petri Apr 19, 2017 @ 20:00
  
  Thank you for your kind words. I really appreciate them.
  
  Reply Link
Noumenon Sep 24, 2017 @ 21:17

Apart from the tutorial, I appreciate the clear argument for why you would want to use this. No point doing the tutorial if I can't convince my coworkers to try it.

Reply Link
- Petri Sep 28, 2017 @ 20:56
  
  Thank you for your kind words. I really appreciate them.
  
  Reply Link
LanhTran Feb 28, 2019 @ 5:20

Thanks for your post, it help me so much understand instead of try to read spring batch document :D

Reply Link
Jon Apr 21, 2019 @ 7:54

Hi there,

How would you go about reading from a database (using mybatis) and then writing to an API? Any help would be appreciated. Thanks!

Reply Link
- Petri Apr 21, 2019 @ 13:04
  
  Hi,
  
  I have never used MyIbatis, but there is a project called MyBatis that helps you to integrate MyIbatis with Spring Batch. I took a look at its documentation, and it seems that all you have to do is to configure the item reader and item writer beans provided by MyBatis.
  
  Also, if you have any additional questions, don't hesitate to ask them!
  
  Reply Link
Vishwanath Jun 14, 2019 @ 8:23

Looks like written from bottom of heart ! Lovely content.

Reply Link
- Petri Oct 22, 2019 @ 18:34
  
  Thank you for your kind words. I really appreciate them.
  
  Reply Link
Manraj Oct 19, 2019 @ 14:40

Thank you for the post. I usually skipped some text or paragraphs in all posts, but your post is awesome and each and every text are import and useful for the engineers to answer the question of why, what and how.

Reply Link
- Petri Oct 22, 2019 @ 18:34
  
  You are welcome. I am happy to hear that this blog post was useful to you.
  
  Reply Link
Fernando Oct 29, 2019 @ 17:13

Thank you!.. great post.

Reply Link
Scott T. Jan 16, 2020 @ 20:27

Great post. Will be reading the whole series. Thank you!

Reply Link
- Petri Jan 23, 2020 @ 20:11
  
  Thank you for your kind words. I really appreciate them.
  
  Reply Link
jprakash May 23, 2020 @ 14:40

Could you provide solution for the scenario like in single flow of batch I need to read from file and write into DB and read from DB and write into file,
thanks

Reply Link
esha dupuguntla Nov 12, 2024 @ 17:43

This is excellent! Thank you so much, can't wait to read the next one.

Reply Link