Batch processing has traditionally been used in many organizations to perform non real-time, data intensive tasks that are executed at specific frequencies and within predefined time periods. For example, loading bulk customer data into a Customer database, creating summary records for a data warehouse, etc. The processing and performance requirements of these batch processes tend to differ from those of real-time, interactive processes. In order to address these specific requirements, organizations have invested in creating custom, homegrown batch processing solutions and this resulted in a heterogeneous mix of technologies with no standardization. Applying the concepts of Service Oriented Architecture to batch processing enables organizations to not only standardize on implementation technologies/approaches but also leverages the many benefits offered by an enterprise wide SOA.
As Ronald Schmelzer of ZapThink puts it aptly in his note,
Service-Oriented Architecture (SOA) presents enterprises with the opportunity to expose information and processes as self-contained Services that can communicate and interoperate with each other in a standard, loosely coupled fashion. Although the common impression is that Services expose business processes, data processes, or application functionality, they are also well-suited for exposing the very processes that drive batch-oriented workloads. SOA enables the business to build flexible compositions of Services that implement either business or IT processes in a loosely coupled manner, which has important ramifications for IT service delivery, and the batch processes that are part of it.
However, there seems to be a common misconception in many enterprises regarding the applicability of service orientation to batch processes. I have heard at least twice in the past week from someone that batch processes do not really lend themselves well to Service-Oriented Architecture (SOA) and that SOA is best suited for building dynamic, interactive, transactional applications. While batch processing remains a critical component in many data intensive environments in today’s real-time enterprises for various reasons, I think the role and potential applications of batch processes has changed with the advent of SOA. Batch processes are evolving from static, stand-alone processes invoked at specific frequencies and time windows to essential, dynamic components of business driven process flows that can be invoked on-demand.
The following are some of the ways batch processes can be first class citizens in enterprise wide service oriented architecture.
- Batch Processes as Service Providers
- Batch processes can be built and exposed as reusable services to be consumed on-demand by other batch and/or real-time business processes and services.
- Batch Processes as Service Consumers
- Batch processes can consume other enterprise services that provide business and/or utility functions as part of the core processing logic. For example, an ETL process that is bulk loading new customer data into an operational database can use a service that returns a unique customer ID for each customer from an authoritative data source.
- Batch Processes as Policy–driven, Configurable Work Flows
- Modern batch processes can be designed, managed and executed very similar to their real-time counter parts using configurable process flows and meta data to control the behavior, frequency, time window and sequence of execution.
- Event Driven invocation of batch processes
- Batch processes do not need to be invoked on a fixed schedule by job schedulers and cron jobs anymore – they can be part of a dynamic, event-driven SOA by responding to business events in addition to scheduled timer events. For example, the same batch process that creates summary records in a data warehouse can be invoked in response to a real-time change by the business users or in response to the arrival of a new batch feed.
- Separation of business logic, rules as reusable components
- Traditionally the processing logic and rules are embedded in batch processes with very little reuse across the processes, let alone across the enterprise. There is tremendous value in extracting the business rules and reusable business logic out of the batch processes and exposing them as reusable components or services at the enterprise level so that real-time and batch processes alike can utilize the same components or services irrespective of the mode of invocation.
However, introducing batch processes into SOA has its limitations and challenges. Not all batch processes may be the best candidates to become service providers or consume services as part of their processing logic. The biggest potential downside of applying service orientation to batch processes is decrease in performance. Batch processes have historically processed large volumes of data in pre-defined time windows overnight and introducing service calls into these processes could affect performance. Hence, the benefits of applying SOA to batch processes such as increased reuse and increased agility must be carefully weighed against the potential implications on performance. However, these concerns can be addressed and mitigated by laying a solid technical foundation that meets the rigorous performance requirements of batch processes and by utilizing work load automation frameworks that enable an organization adapt to changing processing needs without affecting the core functionality of batch processes. For example, batch frameworks such as Spring Batch, written in Java, provide the required abstraction to introduce fault tolerance and sclability into batch processes for high volume processing. In addition, depending on the performance requirements of an organization, other advanced options such as integration with grid computing solutions to partition the batch job over a large number of processors can be considered to ensure high speed processing.
As I mentioned earlier, batch processing is a critical part of standard business operations for many organizations and that will not change any time soon. As organizations move toward modern architectures from legacy systems and processes, batch processing must be looked at from a new perspective and must be treated as an integral part of SOA and Enterprise Architecture in order to realize the true potential of an integrated, agile enterprise.