back to article Death of batch – long live real-time

Remember batch processing? All your vital business reports and reconciliations ran overnight when everyone had gone home; and finished, with a bit of luck, just before everyone arrived back for work in the morning. Well, it's been clear for 20 years that the day of batch processing is over: People work flexibly, 24x7, …


This topic is closed for new posts.
  1. Joe Cincotta

    BOD: Batch On Demand

    This whole event driven thing is pretty much on the mark methinks. But the critical issue about real time is that it often imposes huge overhead on the system when its not entirely necessary. Its like the reverse of true batch - that is, it actually can reduce the overall efficiency of the system slowing down legitimate users (like customers) from doing legitimate things (like buying stuff!) while the system fritters away cycles generating a real time report.

    One of the great things about batch on demand for BI on massive datasets is that it can give you snapshots at regular intervals (ie: daily) of very detailed and complex information. Then if there is an emergency or some other intense need to have quasi real time data, reports can be invoked on demand. Whilst the data is not truely real time it does give a level of flexiblity in the business whilst attempting to proivide the best allocation of processing and infrastructure resources.

    The real issue is how you architect either your real time or batch process. Generally complex reports across massive datasets can either take 1. arseloads of time to run or 2. arseloads of processing power to happen quickly. I have seen lots of strategies like partial caching (blended data warehousing mixed with real time data sources) and loads of other approaches - but whats the easiest way to manage this resource consumption issue whilst maintaining simple to understand business logic in your reports? (Remember reports are generally painful to read code at the best of times)

    At the end of the day the hardware manufacturers have actually created something of a renaissance in batch processing due to their focus on multiple cores. Why? Whats the easiest way to think about threading? Um, batch!

    Whilst there has always been an option to use multithreading in enterprise applications, both languages and hardware have made the use of mutiple threads very attractive and easy to do. The beauty and power of multithreading with batch is that it separates the processing strategy from the code.

    We can have a thread priority set high enough to force a single processor affinity and redline it - or lower the priority of it so much that it will take the slops if things are busy with the rest of the system. No matter what we choose to do, the strategy gives two clear benefits: 1. It seperates efficiency issues from code 2. It allows real-time interaction with the processing pipeline though a management interface (ie: JMX or MOM) to adjust how much processing power the batch process is taking.

    Even though I have concentrated on batch for reporting, there can be similar benefits for other data interchange requirements. Of course there are also many more complexities too - this is where a real time SOA shines!

  2. Anonymous Coward
    Anonymous Coward

    Real time vs Batch

    The main point to batch processing is to perform tasks that are not suitable for being performed on an individual basis.

    Yes, we can improve batch by designing it to co-exist with online/real time processes - hot backups rather than cold, processing that doesn't require a cessation of online access - but there will always be some tasks that are more suited to being performed as one sweep of the database.

    Such tasks include say billing runs, mass updates - the exact nature of the tasks will depend on the nature of the business, but there will always be tasks that make more sense to do in one big hit - as a batch.

    To expect "batch" to eventually die off is to fail to understand why it's done that way in the first place. That's not to say we shouldn't try to improve the process wherever possible - and this is often where batch processes are very slow to be improved.

  3. David Norfolk

    Good points

    Yes, there is a place for batch processing and, certainly, making everrything "realtime" for the sake of it would be awfully expensive.

    But I didn't actually recommend this. I said "Your business intelligence (BI) must be based on appropriate latencies" - there's a continuum from real time to periodic batch and you pick the point which makes business and economic sense.

    It's all down to architecture. If you design for potential realtime response but don't implement it until the business can demonstrate a need, it shouldn't cost (much) more for now and certainly won't stop you running batch jobs designed to co-exist with online work.

    However, if you design for "real" batch - everything stops while you run ETL or whatever - you'll probably have built a cross for your own back.

    And, mind you, if you do the analysis and it tells you that traditional batch is the way to go; and this decision will hold for all forseeable technical advances, the who am I to argue? But I do think that remembering that "we can improve batch by designing it to co-exist with online/real time processes" is pretty important.

This topic is closed for new posts.

Biting the hand that feeds IT © 1998–2022