The previous parts of my Spring Batch tutorial provided an introduction to Spring Batch and described how you can get the required dependencies by using either Maven or Gradle.
After you have downloaded the required dependencies, you can start writing Spring Batch jobs. The first thing that you have to do is to provide the input data for your batch job. This blog post helps you to read the input data a from CSV file.
After you have read this blog post, you:
- Can read the input data of your batch job from a CSV file.
- Understand how you can transform a line read from a CSV file into a domain object.
Let's start by taking a quick look at the example application.
- You are familiar with Spring Batch
- You can get the required dependencies with Maven or Gradle
Introduction to the Example Application
During this blog post you will read the input data of your batch job from a CSV file which contains the student information of an online course. To be more specific, the CSV file contains a student list that provides the following information to your batch job:
- The name of the student.
- The email address of the student.
- The name of the purchased package.
The content of your input file looks as follows:
NAME;EMAIL_ADDRESS;PACKAGE Tony Tester;tony.tester@gmail.com;master Nick Newbie;nick.newbie@gmail.com;starter Ian Intermediate;ian.intermediate@gmail.com;intermediate
The ItemReader
which reads the student list from a CSV file must return StudentDTO
objects. The StudentDTO
class contains the information of a single student, and its source code looks as follows:
public class StudentDTO { private String emailAddress; private String name; private String purchasedPackage; public StudentDTO() {} public String getEmailAddress() { return emailAddress; } public String getName() { return name; } public String getPurchasedPackage() { return purchasedPackage; } public void setEmailAddress(String emailAddress) { this.emailAddress = emailAddress; } public void setName(String name) { this.name = name; } public void setPurchasedPackage(String purchasedPackage) { this.purchasedPackage = purchasedPackage; } }
Next, you will find out how you can read the input data of your batch job from a CSV file.
Reading the Input Data From a CSV File
You can provide the input data for your batch job by configuring an ItemReader
bean. Because you must read the student information from a CSV file, you have to configure this bean by following these steps:
First, you have to create the configuration class that contains the beans which describe the flow of your batch job. The source code of your configuration class looks as follows:
import org.springframework.context.annotation.Configuration; @Configuration public class SpringBatchExampleJobConfig { }
Second, you have to write a private
method that returns a LineMapper<StudentDTO>
object. This object transforms a String
object read from the source CSV file into a domain object. You can write this method by following these steps:
- Create a new
DefaultLineMapper<StudentDTO>
object. - Create a new
DelimitedLineTokenizer
object. Ensure that the created object splits the student information line into tokens by using semicolon (;
) as a delimiter character and configure the names of each token. The names of these tokens must match with the field names of the target class (StudentDTO
). - Ensure that the
DefaultLineMapper<StudentDTO>
object splits each row into tokens by using the createdDelimitedLineTokenizer
object. - Create a new
BeanWrapperFieldSetMapper<StudentDTO>
object which maps the tokenized input data into a domain object by using bean property paths. Remember to ensure that the created object creates newStudentDTO
objects. - Ensure that the
DefaultLineMapper<StudentDTO>
object creates newStudentDTO
objects by using the createdBeanWrapperFieldSetMapper<StudentDTO>
object. - Return the created
DefaultLineMapper<StudentDTO>
object.
After you have written this method, the source code of your configuration class looks as follows:
import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.mapping.FieldSetMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.batch.item.file.transform.LineTokenizer; import org.springframework.context.annotation.Configuration; @Configuration public class SpringBatchExampleJobConfig { private LineMapper<StudentDTO> createStudentLineMapper() { DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>(); LineTokenizer studentLineTokenizer = createStudentLineTokenizer(); studentLineMapper.setLineTokenizer(studentLineTokenizer); FieldSetMapper<StudentDTO> studentInformationMapper = createStudentInformationMapper(); studentLineMapper.setFieldSetMapper(studentInformationMapper); return studentLineMapper; } private LineTokenizer createStudentLineTokenizer() { DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer(); studentLineTokenizer.setDelimiter(";"); studentLineTokenizer.setNames(new String[]{ "name", "emailAddress", "purchasedPackage" }); return studentLineTokenizer; } private FieldSetMapper<StudentDTO> createStudentInformationMapper() { BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper = new BeanWrapperFieldSetMapper<>(); studentInformationMapper.setTargetType(StudentDTO.class); return studentInformationMapper; } }
Third, you have to create a method that configures your ItemReader
bean and ensure that this method returns an ItemReader<StudentDTO>
object. After you have created this method, you have to implement it by following these steps:
- Create a new
FlatItemReaderBuilder<StudentDTO>
object. This builder createsFlatItemReader<StudentDTO>
objects which read lines from the specifiedResource
. - Configure the name of the created
ItemReader
. - Configure the location of the CSV file which contains the input data of your batch job. Because I wanted to create an example application that’s as easy to run as possible, I ensured that the input file (data/students.csv) of your batch job is found from the classpath
- Ignore the header line of the CSV file.
- Configure the used
LineMapper<StudentDTO>
object which transforms aString
object read from the CSV file into a domain object (StudentDTO
). - Create a new
FlatItemReader<StudentDTO>
object and return the created object.
After you have configured your ItemReader
bean, the source code of your configuration class looks as follows:
import org.springframework.batch.item.ItemReader; import org.springframework.batch.item.file.LineMapper; import org.springframework.batch.item.file.builder.FlatFileItemReaderBuilder; import org.springframework.batch.item.file.mapping.BeanWrapperFieldSetMapper; import org.springframework.batch.item.file.mapping.DefaultLineMapper; import org.springframework.batch.item.file.mapping.FieldSetMapper; import org.springframework.batch.item.file.transform.DelimitedLineTokenizer; import org.springframework.batch.item.file.transform.LineTokenizer; import org.springframework.context.annotation.Bean; import org.springframework.context.annotation.Configuration; import org.springframework.core.io.ClassPathResource; @Configuration public class SpringBatchExampleJobConfig { @Bean public ItemReader<StudentDTO> itemReader() { LineMapper<StudentDTO> studentLineMapper = createStudentLineMapper(); return new FlatFileItemReaderBuilder<StudentDTO>() .name("studentReader") .resource(new ClassPathResource("data/students.csv")) .linesToSkip(1) .lineMapper(studentLineMapper) .build(); } private LineMapper<StudentDTO> createStudentLineMapper() { DefaultLineMapper<StudentDTO> studentLineMapper = new DefaultLineMapper<>(); LineTokenizer studentLineTokenizer = createStudentLineTokenizer(); studentLineMapper.setLineTokenizer(studentLineTokenizer); FieldSetMapper<StudentDTO> studentInformationMapper = createStudentInformationMapper(); studentLineMapper.setFieldSetMapper(studentInformationMapper); return studentLineMapper; } private LineTokenizer createStudentLineTokenizer() { DelimitedLineTokenizer studentLineTokenizer = new DelimitedLineTokenizer(); studentLineTokenizer.setDelimiter(";"); studentLineTokenizer.setNames(new String[]{ "name", "emailAddress", "purchasedPackage" }); return studentLineTokenizer; } private FieldSetMapper<StudentDTO> createStudentInformationMapper() { BeanWrapperFieldSetMapper<StudentDTO> studentInformationMapper = new BeanWrapperFieldSetMapper<>(); studentInformationMapper.setTargetType(StudentDTO.class); return studentInformationMapper; } }
ClassPathResource
object because I wanted to create an example application that's as easy to run as possible. Typically, the input file of your batch job is found from the file system. This means that you can configure its location by creating a new FileSystemResource
object.
Additional Reading:
You can now read the input data of your batch job from a CSV file. Let's summarize what you learned from this blog post.
Summary
This blog post has taught you four things:
- You can read data from a CSV file by using the
FlatItemReader<T>
class. - The
FlatItemReader<T>
class transforms lines read from the input file into domain objects by using aLineMapper<T>
object. - The
DelimitedLineTokenizer
class can split the input data into tokens by using the specified delimiter character. Also, this class allows you to configure the field names which are used to populate the fields of the created domain object. - The
BeanWrapperFieldSetMapper<T>
class can transform the tokenized input data into a domain object by using bean property paths.
The next part of this tutorial describes how you can read the input data of your batch job from an XML file.
P.S. You can get the example application from Github.
Being very (just a few days) new to Spring Batch, the first question I have is, once the classes described above are created, how do you run them to actually see the data defined in the XML file in the database? Thank you.
I will describe the required steps in my upcoming blog posts, but basically you have to follow these steps:
Step
that has anItemReader
andItemWriter
.Job
that contains the createdStep
.Job
.For example, if you want see the configuration of a Spring Batch job that reads information from a CSV file and writes it to a database, you should take a look at this package. Also, you might want to take a look at this resource page that contains links to other Spring Batch tutorials.
Thanks.
How to read/write multiple txt files (which are not same) using spring batch with single field setmapper class?
Hi,
You can use the
MultiResourceItemReader
class for this purpose.How to simply read a .csv file and show it to console only
If you want to use Spring Batch, you need to create an
ItemWriter
that prints the information to console. However, you probably don't need Spring Batch for this. If you want to use a simpler approach, take a look at this blog post.1. Do you have any samples with reading multiple FragmentRootElementName from XML file?
2. I have schema xsd file for the xml file I read in Spring Batch. According to your example I have to annotate the class with the @XmlRootElement annotation. But I do not want to modify classes created during build process in the target folder.
Hi,
No :(
Yes. If I remove that annotation, the
UnmarshallingFailureException
is thrown:Can you configure the code generator to include these annotations?
Suppose StudentDTO has a field that is a Foreign Key? It seems hibernate wants to try to insert an object into the SQL statement?
Are you trying to insert something to the database by using Hibernate? If so, you need to transform the
StudentDTO
object into an entity and persist it to the database. If the entity has relationships with other entities, you need to obtain references to these objects before you persist the created entity to the database.Hi Petri,
I have downloaded the source code from Github. Please let me know how to run. Which is the first class that needs to be executed?
Hi,
Each example has a README that explains how you can run the example application by using either Maven or Gradle. If you want to run the Spring example, you should take a look at this README. On the other hand, if you want to run the Spring Boot example, you should take a look at this README.
How do i first fetch a csv file from any physical location then stream this csv file and read it in spring batch.
if possible, could explain this without using annotation also.
Hi,
Unfortunately Spring Batch doesn't have a very good support for reading files from paths that are not known when the application context is started. That's why people often end up writing custom tasklets which copies the input files to the path that is given to the used
ItemWriter
.I think that you can solve your problem by following these steps:
If you don't know how you can create custom tasklets, you should take a look at this blog post. Also, if you have any other questions, don't hesitate to ask them.
Hi
what about nested elements ? I mean if a student is represented like that:
Tony Tester
Maths
221
Chemistry
222
tony.tester@gmail.com
master
How to retrieves courses ? is it possible to retrieve only courses ?
Thanks
sorry xml is not supported, so here is my illustrated xml sample as I talked about in my previous post (without ), I hope It's pretty clear...
student
name Tony Tester name
courses
course
name Maths name
room 221 room
course
course
name Chemistry name
room 222 room
course
courses
emailAddress tony.tester@gmail.com emailAddress
purchasedPackage master purchasedPackage
student
Hi,
I have to admit that I don't know if it is possible to configure the
StaxEventItemReader
to read only the child elements of the root element. One option is to extend that class and make the required changes to the subclass. However, I would probably still read the entire object hierarchy and add the required filter logic to the subclass.By the way, if you have any additional questions, don't hesitate to ask them.
very nice article ,
you have any article for read/write txt files using spring batch or any java approach
Hi,
Unfortunately I don't have any such blog posts. However, I can help you to find one if you provide additional information about your use case. For example, it would be useful to know what kind of format is used by your input files.
ok, i'll send you my input file. can you send me any mail or fb id? it will be easy to send file
Petri , My use case is like that: for single record(person) i have multiple data and each data have own identification code sush like
code[10] means details about the person
code[20] means details about id prof like
code[30] means name of the zip file which having image of person this file name will be in details [10]
like this supposed i have 100 persons of records
10|1|01|BR|01| | |02|22022017125698_1|Mr|syoso||tom
20|1|A|B8623650|18-12-2020|01|02|
30|1|22022017125698_1_PHOTO_1.jpg|02|02|BR
30|1|22022017125698_1_POA_1.jpg|05|02|BR|
10|2|01|BR|01|02|22022017125698_2|Mr|thoms
20|2|A|A3654002|18-12-2020|01|02||
30|2|22022017125698_2_PHOTO_1.jpg|02|02|BR
30|2|22022017125698_2_POI_1.jpg|09|02|BR|
Hi,
You can read your input data by using the technique described in this blog post. However, you must ensure that the created
LineTokenizer
object uses the '|' character as a delimiter character. You can configure the delimiter character by passing the string: "|" as a method parameter when you invoke thesetDelimiter()
method of theDelimitedLineTokenizer
class.There is a lot going on in this class and it is not all written at once, so we are going to go through the code in steps. Visit GitHub to see the code in its entirety. As the Spring Batch documentation states FlatFileIteamReader will “read lines of data from a flat file that typically describe records with fields of data defined by fixed positions in the file or delimited by some special character (e.g. Comma)”.
Hi ,
could you please suggest in spring batch ,file by file reader
Hi,
What is the format of the source file?
How to read a file which is available in remote location ?
Hi,
I would implement a batch job that has three steps:
Also, you should know that you can implement the steps one and three by writing custom tasklets.
what are the Spring batch reader changes to be made read file content from Azure cloud ,instead of reading file from local path?
You have to write a custom
ItemReader
which reads the file from Azure cloud. Unfortunately it's impossible to give a more specific answer because you didn't specify how we can access the file. For example, can we access it via HTTP?Hi ,
I my use case we need to create object of three classes based on the parameter we get in first column
Example we have below csv file -
1,shubham,student
1,ankit,teacher
2,vidhya mandir,234 tea store near , nursely to secondary
2,convent school , 934 anup nagar , only secondary
3,23,fans ,98,4983,rupees
if we have first column value as 1 we need to create student object
if we have first column value as 2 we need to create school object
if we have first column value as 3 we need to create item object
how can we achieve this?