Batch processing in JEE7 (JSR 352)

A new feature in JEE7 is Batch Processing, an implementation of JSR 352 based on the programming model of Spring Batch. With batch processing, we understand the execution of a series of jobs (steps) , sequential or in parallel, without any user interaction. In this blog, we want to give a small overview of the capabilities of this new feature.
The code of this blog can be downloaded from here.
The zip file contains an Eclipse maven project in the folder JobAPI, and a jobapi.war file that can be deployed and tested.

The JSR 352 specification defines 2 different approaches to set up a batch, which can be combined :

The ‘chunk’ approach

This approach implements a common ETL pattern of read/process/write. A chunk is a set of data, a number of items, that is written in one ‘transaction’. So a chunk of 3 items, gives you the following execution :

  1. read item 1
  2. process item 1
  3. read item item 2
  4. process record 2
  5. read item item 3
  6. process record 3
  7. write items 1, 2, 3
  8. start reading item 4, etc…

The ‘batchlet’ approach

This approach gives you the freedom to do as you like. It’s just a piece of code, executed as part of a batch job.

The Code

For this example, I have used GlassFish 4.0 from here (no Eclipse plugin yet for the 4.1). Download the full EE, and not the Web-profile, as that one only contains a subset of the JEE7 spec. More info about the differences between these two can be found here.

As the batch process runs in background, System.out.println is used for the output. If you’re using the GlassFish plugin in Eclipse, you’ll see the output in the console view. Otherwise, check the server.log in the dir : ../glassfish4.0/glassfish/domains/domain1/logs/server.log

In the example we generate a list of 10 Data objects in the reader. A Data object contains 2 operands and the result of a calculation of those 2 that was computed in the processor. The writer prints out the result.
Let’s go over our different classes :

The Data.java object (just a value object)

public class Data {
	private Integer num1;
	private Integer num2;
	private Integer result;

	public Data(Integer num1, Integer num2) {
		super();
		this.num1 = num1;
		this.num2 = num2;
	}
	public String toString() {
		return num1+" en "+num2+(result != null ? " = "+result : "");
	}
	public Integer getResult() {
		return result;
	}
	public void setResult(Integer result) {
		this.result = result;
	}
	public Integer getNum1() {
		return num1;
	}
	public Integer getNum2() {
		return num2;
	}
}

The flow of a batch job is defined in a job xml, that is placed under the /META-INF/batch-jobs directory. We called ours : myJob.xml
As all classes are CDI beans with default names (@Named annotated), we refer to them with their lowerCamelCase name. (don’t forget to add an empty beans.xml file in the WEB-INF dir.)
We define 2 steps: the first one uses the chunked approach, executing the batch in chunks of 3 items; the second step uses the batchlet approach. This batchlet will just print a message.
Note the “next” attribute in the step, declaring what the next step will be after “step1”.

<job id="myJob" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0" >
	<step id="step1" next="step2">
		<chunk item-count="3">
 			<reader ref="myItemReader"/>
 			<processor ref="myItemProcessorPlus"/>
 			<writer ref="myItemWriter"/>
 		</chunk>
	</step>
	<step id="step2">
		<batchlet ref="myBatchlet"/>
	</step>
</job>

MyItemReader : open method is executed once and creates a list. ReadItem method reads 1 item.

@Named
public class MyItemReader extends AbstractItemReader{
	private List<Data> listData = new ArrayList<Data>();
	int count=0;

	public void open(Serializable c) throws Exception {
		System.out.println("FILL LIST");
		 Random randomGenerator = new Random();
		    for (int idx = 1; idx <= 10; ++idx){
		      int randomInt1 = randomGenerator.nextInt(100);
		      int randomInt2 = randomGenerator.nextInt(100);
		      listData.add(new Data(randomInt1,randomInt2));
		    }		

	}
	@Override
	public Data readItem() throws Exception {
		try {
			Data data = listData.get(count++);
			System.out.println(Thread.currentThread().getId()+" | READ  "+data);
			return data;
		} catch (IndexOutOfBoundsException e) {
			// finished
			return null;
		}
	}
}

MyItemProcessorPlus : processes 1 item : calculates the sum and puts it in the result.

@Named
public class MyItemProcessorPlus implements ItemProcessor {

	public Data processItem(Object arg0) throws Exception {
		Data data = (Data)arg0;
		data.setResult(data.getNum1()+data.getNum2());
		System.out.println(Thread.currentThread().getId()+" | PROCESSING PLUS data: "+data.toString());
		return data;
	}
}

MyItemWriter : writes a chunk of items, in our case, just prints it.

@Named
public class MyItemWriter extends AbstractItemWriter{
	List<Data> resultList = new ArrayList<Data>();
	@Override
	public void writeItems(List<Object> arg0) throws Exception {
		System.out.println(Thread.currentThread().getId()+" | start write "+arg0.size()+" elements");
		for (Object o : arg0) {
			Data data = (Data)o;
			System.out.println(Thread.currentThread().getId()+" | WRITE  "+data.toString()  );
			resultList.add(data);
		}
		System.out.println(Thread.currentThread().getId()+" | end write ");
	}
}

MyBatchlet : second step, runs after all the chunks have been written.

@Named
public class MyBatchlet extends AbstractBatchlet {
	@Override
	public String process() throws Exception {
		System.out.println("BATCHLET : Processing something else in this batchlet....");
		return null;
	}
}

Now that we have our classes, we create a servlet to start the batch.
“myJob” in the start method of the JobOperator refers to the name of the batch xml file in the META-INF directory.
When the method is executed, the batch will be started in the background, and the servlet continues.

@WebServlet("/myRun")
public class RunBatch extends HttpServlet {
	@Override
	protected void doGet(HttpServletRequest req, HttpServletResponse resp)
			throws javax.servlet.ServletException ,java.io.IOException {
		JobOperator jo = BatchRuntime.getJobOperator();
		jo.start("myJob", null);
		resp.getWriter().println("Job started, check console for output.");
	};
}

Now when we deploy our war file, and go to http://localhost:8080/jobapi/myRun, it will give the message “Job started, check console for output.” on the screen, but you can still see the output of the batch in the console/server.log :

Info: FILL LIST
Info: 175 | READ 42 en 80
Info: 175 | PROCESSING PLUS data: 42 en 80 = 122
Info: 175 | READ 47 en 41
Info: 175 | PROCESSING PLUS data: 47 en 41 = 88
Info: 175 | READ 74 en 6
Info: 175 | PROCESSING PLUS data: 74 en 6 = 80
Info: 175 | start write 3 elements
Info: 175 | WRITE 42 en 80 = 122
Info: 175 | WRITE 47 en 41 = 88
Info: 175 | WRITE 74 en 6 = 80
Info: 175 | end write
Info: 175 | READ 70 en 79
Info: 175 | PROCESSING PLUS data: 70 en 79 = 149
Info: 175 | READ 67 en 35
Info: 175 | PROCESSING PLUS data: 67 en 35 = 102
Info: 175 | READ 13 en 81
Info: 175 | PROCESSING PLUS data: 13 en 81 = 94
Info: 175 | start write 3 elements
Info: 175 | WRITE 70 en 79 = 149
Info: 175 | WRITE 67 en 35 = 102
Info: 175 | WRITE 13 en 81 = 94
Info: 175 | end write
Info: 175 | READ 89 en 89
Info: 175 | PROCESSING PLUS data: 89 en 89 = 178
Info: 175 | READ 24 en 29
Info: 175 | PROCESSING PLUS data: 24 en 29 = 53
Info: 175 | READ 20 en 18
Info: 175 | PROCESSING PLUS data: 20 en 18 = 38
Info: 175 | start write 3 elements
Info: 175 | WRITE 89 en 89 = 178
Info: 175 | WRITE 24 en 29 = 53
Info: 175 | WRITE 20 en 18 = 38
Info: 175 | end write
Info: 175 | READ 28 en 2
Info: 175 | PROCESSING PLUS data: 28 en 2 = 30
Info: 175 | start write 1 elements
Info: 175 | WRITE 28 en 2 = 30
Info: 175 | end write
Info: BATCHLET : Processing something else in this batchlet….

As you can see, there are 3x calls to read and process, and then 1 call to write, for every chunk of data.
When all 10 Data objects have been processed, the next step – the batchlet – is executed.

Parallel processing

The batch framework contains 2 techniques for processing data in parallel, in 2 or more separate threads. One of them is partitioning, where, for a set of data, you define which parts will be processed in the same step, but in multiple instances, in different threads. This partition definition is done in the reader. We won’t go into detail here.
A second technique can be used to define steps that will run in parallel, before continuing with the next step.

Let’s update our example for this last one. Suppose that we want a second list of data, where we compute the product iso the sum. We will create a new processor for that.

MyItemProcessorMultiply : process 1 item : calculates the product and put’s it in the result.

@Named
public class MyItemProcessorMultiply implements ItemProcessor {
	public Data processItem(Object arg0) throws Exception {
		Data data = (Data)arg0;
		data.setResult(data.getNum1()*data.getNum2());
		System.out.println(Thread.currentThread().getId()+" | PROCESSING MULTIPLY data: "+data.toString());
		return data;
	}
}

Now, in order to use this processor we need to add an extra step to our myJob.xml: use the tag “split” to indicate which steps need to run in parallel. To demonstrate the difference, the product batch wil run in chunks of 5, whereas the sum runs in chunks of 3.

<job id="myJob" xmlns="http://xmlns.jcp.org/xml/ns/javaee" version="1.0" >
	<split id="split1" next="step3">
		<flow id="flow1">
			<step id="step1">
				<chunk item-count="3">
		 			<reader ref="myItemReader"/>
		 			<processor ref="myItemProcessorPlus"/>
		 			<writer ref="myItemWriter"/>
		 		</chunk>
			</step>
		</flow>
 		<flow id="flow2">
			<step id="step2">
				<chunk item-count="5">
		 			<reader ref="myItemReader"/>
		 			<processor ref="myItemProcessorMultiply"/>
		 			<writer ref="myItemWriter"/>
		 		</chunk>
			</step>
		</flow>
	</split>
	<step id="step3">
		<batchlet ref="myBatchlet"/>
	</step>
</job>

Running our servlet will result in following output :

Info: FILL LIST
Info: FILL LIST
Info: 231 | READ 27 en 72
Info: 230 | READ 55 en 72
Info: 231 | PROCESSING MULTIPLY data: 27 en 72 = 1944
Info: 231 | READ 44 en 53
Info: 231 | PROCESSING MULTIPLY data: 44 en 53 = 2332
Info: 231 | READ 31 en 14
Info: 231 | PROCESSING MULTIPLY data: 31 en 14 = 434
Info: 230 | PROCESSING PLUS data: 55 en 72 = 127
Info: 231 | READ 87 en 84
Info: 230 | READ 3 en 8
Info: 231 | PROCESSING MULTIPLY data: 87 en 84 = 7308
Info: 230 | PROCESSING PLUS data: 3 en 8 = 11
Info: 231 | READ 31 en 1
Info: 230 | READ 63 en 28
Info: 231 | PROCESSING MULTIPLY data: 31 en 1 = 31
Info: 230 | PROCESSING PLUS data: 63 en 28 = 91
Info: 230 | start write 3 elements
Info: 231 | start write 5 elements
Info: 231 | WRITE 27 en 72 = 1944
Info: 231 | WRITE 44 en 53 = 2332
Info: 231 | WRITE 31 en 14 = 434
Info: 231 | WRITE 87 en 84 = 7308
Info: 231 | WRITE 31 en 1 = 31
Info: 230 | WRITE 55 en 72 = 127
Info: 230 | WRITE 3 en 8 = 11
Info: 230 | WRITE 63 en 28 = 91
Info: 230 | end write
Info: 231 | end write
Info: 230 | READ 30 en 90
Info: 230 | PROCESSING PLUS data: 30 en 90 = 120
Info: 230 | READ 39 en 71
Info: 230 | PROCESSING PLUS data: 39 en 71 = 110
Info: 230 | READ 48 en 23
Info: 231 | READ 51 en 36
Info: 230 | PROCESSING PLUS data: 48 en 23 = 71
Info: 231 | PROCESSING MULTIPLY data: 51 en 36 = 1836
Info: 231 | READ 49 en 30
Info: 230 | start write 3 elements
Info: 231 | PROCESSING MULTIPLY data: 49 en 30 = 1470
Info: 230 | WRITE 30 en 90 = 120
Info: 230 | WRITE 39 en 71 = 110
Info: 231 | READ 82 en 80
Info: 230 | WRITE 48 en 23 = 71
Info: 230 | end write
Info: 231 | PROCESSING MULTIPLY data: 82 en 80 = 6560
Info: 231 | READ 26 en 87
Info: 231 | PROCESSING MULTIPLY data: 26 en 87 = 2262
Info: 231 | READ 49 en 69
Info: 231 | PROCESSING MULTIPLY data: 49 en 69 = 3381
Info: 231 | start write 5 elements
Info: 231 | WRITE 51 en 36 = 1836
Info: 231 | WRITE 49 en 30 = 1470
Info: 231 | WRITE 82 en 80 = 6560
Info: 231 | WRITE 26 en 87 = 2262
Info: 231 | WRITE 49 en 69 = 3381
Info: 231 | end write
Info: 230 | READ 26 en 15
Info: 230 | PROCESSING PLUS data: 26 en 15 = 41
Info: 230 | READ 53 en 31
Info: 230 | PROCESSING PLUS data: 53 en 31 = 84
Info: 230 | READ 99 en 60
Info: 230 | PROCESSING PLUS data: 99 en 60 = 159
Info: 230 | start write 3 elements
Info: 230 | WRITE 26 en 15 = 41
Info: 230 | WRITE 53 en 31 = 84
Info: 230 | WRITE 99 en 60 = 159
Info: 230 | end write
Info: 230 | READ 54 en 91
Info: 230 | PROCESSING PLUS data: 54 en 91 = 145
Info: 230 | start write 1 elements
Info: 230 | WRITE 54 en 91 = 145
Info: 230 | end write
Info: BATCHLET : Processing something else in this batchlet….

You can see there are now 2 thread IDs : 230 and 231, one for each step.

This was just an introduction of how a batch can be set up in JEE7.
But there are many other usefull features not demonstrated in this blog.

Exception handling

  • You can continue reading/processing/writing after a configured exception is thrown during the run.
  • Configure how many times a certain exception can be thrown, before the batch is aborted.

Listener’s

You can write listeners that are executed

  • before/after an item is read/processed/written.
  • when a skippable exception is thrown in the reader/processor/writer
  • when a retry of a reader/processor/writer is executed

Checkpoints

Add custom checkpoint from where to restart the batch when it was aborted.

Conclusion

The new Batch API gives us a lot of possibilities for setting up a batch in an EE environment. The biggest advantage is probably that it works seamlessly together with all the other EE components in our projects.

For instance, you can use your REST/WS clients to read data, your EJB’s and JPA entities to process the data, your JMS to write results to your queues, etc…

If you’re going to use this batch API, I hope you will share your experiences here, on this blog !

 

 

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

About Chris Noë