Batch Processing in Web Applications

September 26, 2008

I’m currently working on an application that allows an administrative user to upload an input file that contains thousands of records that need to be processed.

Our initial naive solution to this problem was to parse the upload file into individual records, save the record to the database and then send a JMS message containing the PK of the record to a JMS queue to kick off asynchronous processing for each record.

It worked great….until we tried to process a file with more than a few thousand records. Then the transaction that the file processing was running in started timing out. That’s a problem.

You see, our benchmark for file processing is 10k records–that’s the expected size of the files we will start receiving once our system goes live. We’d also like to have some additional breathing room in processing capability in case we see larger files in the wild. But, as it was, not only did we not have any breathing room, we had a giant boulder sitting on our chest crushing us.

What to do?

It was about this time that I reflected on the need for an entirely new approach, not a variation on our existing processing strategy. Spring Batch to the rescue.

The Spring Batch framework is a new addition to the Spring portfolio. The primary committers on the project have extensive experience producing batch processing solutions. It is primarily designed as a Java based batch processing framework using traditional batch processing strategies. In other words, it wasn’t a natural fit for my needs: kicking off batch processing with dynamic input from within a web application.

Many (most) of the collaborators in a spring batch configuration are stateful so they don’t lend themselves to traditional singleton based spring context wiring. I originally tried defining these beans as prototypes but the problem was that one of the stateful collaborators needs to be injected TWICE into one of the components (as a collaborator and to register as a listener). It couldn’t be defined as a prototype or two different objects would be injected and that wouldn’t work.

In the documentation and on the forums, the Spring Batch team has suggested that you just use a new ApplicationContext for each run of your batch job. All of the examples that I’ve found construct the ENTIRE application context for each run of the batch job(s). That won’t work in my case because my application context is being used by the entire web application. I can’t refresh it (and the hundreds of beans it contains) just to breath new life into a handful of stateful spring batch beans.

Enter the ClassPathXmlApplicationContextJobFactory. This class allows you to construct your batch job beans from an existing parent ApplicationContext and a subcontext bean definition file. The key to this is that every time you request a job from the factory, it constructs a brand new sub context and wires up your batch job beans to the rest of it’s singleton collaborators in your parent application context. Yippee!

The only gripe I had about this JobFactory implementation was that it expected the properties to be injected as constructor arguments (including the parent application context) and I wasn’t aware of a way to pass the application context into a bean unless the bean implemented ApplicationContextAware. Hence, I created the following wrapper class that supports property injection making configuration a snap:

public class ContextAwareJobFactory implements JobFactory, ApplicationContextAware, InitializingBean
    private ClassPathXmlApplicationContextJobFactory delegate;
    /* The parent application context */
    private ApplicationContext applicationContext;
    /* The job bean name */
    private String beanName;
    /* resource path to subcontext spring config */
    private String subcontextPath;

    public void setBeanName(String beanName)
        this.beanName = beanName;

    public void setSubcontextPath(String subcontextPath)
        this.subcontextPath = subcontextPath;

    /* (non-Javadoc)
     * @see org.springframework.batch.core.configuration.JobFactory#createJob()
    public Job createJob()
        return delegate.createJob();

    /* (non-Javadoc)
     * @see org.springframework.batch.core.configuration.JobFactory#getJobName()
    public String getJobName()
        return delegate.getJobName();

    /* (non-Javadoc)
     * @see org.springframework.context.ApplicationContextAware#setApplicationContext(org.springframework.context.ApplicationContext)
    public void setApplicationContext(ApplicationContext context) throws BeansException
        this.applicationContext = context;

    /* (non-Javadoc)
     * @see org.springframework.beans.factory.InitializingBean#afterPropertiesSet()
    public void afterPropertiesSet() throws Exception
        delegate = new ClassPathXmlApplicationContextJobFactory(this.beanName, this.subcontextPath, this.applicationContext);

Configuration is then very simple:

    <bean id="jobFactory" class="com.mybatch.ContextAwareJobFactory">
      <property name="beanName" value="myJob"/>
      <property name="subcontextPath" value="classpath:spring/mybatch-processing-prototype-beans.xml"/>

Note that all of the stateless beans that my batch process uses (e.g. FieldSetMappers, LineTokenizers, etc) are stored in the parent context. It’s only beans that are either stateful or are injected with stateful beans that are defined in the sub context “prototype” beans file.

And that’s it! Presto, I’m able to process input files with thousands or tens of thousands of records with no problem. Spring Batch also supports chunking and restart so if your batch job gets interrupted, it can be restarted again and pickup where it left off and continue processing.

If you are Spring addicted and find yourself in need of a batch processing solution, I’d suggest that you give Spring Batch a long look.

7 Responses to “Batch Processing in Web Applications”

  1. raveman Says:

    Will you pay for Spring support now that is has new licence?

  2. heuristicexception Says:

    It’s not really a matter of whether *I* will pay for support, the question is will my *clients* pay for support.

    I typically eschew frameworks, drivers, application servers, etc that have licensing costs. I can’t see that changing for Spring, especially considering how HIGH their licensing costs are ( Frankly, when we were paying this much for BEA Weblogic in the past, we were anxious to look for another alternative (e.g. Jboss). I’m not sure why SpringSource thinks we’ll go back.

    I suppose if we encountered a serious bug past the maintenance release window, we’d be forced to patch it internally until a new version was released. It wouldn’t be the first time I’ve had to do this with open source software.

  3. Ben Says:

    When I faced the same task we were replacing perl scripts with a web application that our customers could use. So we built a workflow engine, a scheduling service, and a distributed processing framework. This was years before Spring Batch or Hadoop, but is quite similar in design yet provides more advanced features. I’m always amazed at how simple concepts are rediscovered with great fanfair and how ugly the alternatives put in production are.

  4. heuristicexception Says:

    I try not to fall into the trap of the NIH anti pattern. I’ll generally search quite a bit to find even a mediocre implementation that I can hack and reuse a fair portion of before I’ll start writing something like you describe from scratch.

    I don’t consider it as much “fanfare” as “psst, in case you were trying to build this yourself…”. I had similar experience with Apache MINA as I did with Spring Batch (needed to implement a couple of custom protocols and didn’t want to have to implement all the glue code myself if I didn’t have to). But that’s another BLOG entry….

  5. Ben Says:

    Sorry, yes, you weren’t as I described. I’m a bit.. annoyed as in the last few weeks Spring Batch and Hadoop are now “cool” for the new architects. As we describe the application, we got:
    – Q: Why aren’t we using XYZ instead?
    A: It didn’t exist then and we had a problem to solve.
    – Q: Why not port everything over and rewrite.
    A: XYZ lacks critical features, needs a lot of verification in production, and offers little extra benefit. We’d love to do it incrementally, though, and remove our own code. We aren’t selling XYZ, so we don’t want to maintain it either.
    – Q: Lets just rewrite everything. You were all dumb for not chosing XYZ five years before it existed.
    A: Sure, but can we do it incrementally?
    – Q: No. I want it done in one release. Kill the old code.

    So moral is, I shouldn’t respond to blog postings on a bad week. 🙂

  6. Noel Says:

    Take a look at the smooks project (

  7. heuristicexception Says:

    We’ve used smooks for EDI to XML processing but that’s about the extent of it.

    I’ll read up on the large message processing capabilities (split, route) that it offers when I have some time. It looks interesting.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: