One of the differences between on-premise implementations of SAP Business Process Automation (BPA) versus cloud implementations, is the way that High Availability (HA) is architected.
The use of database replication solves the database aspect, but database replication is almost a given nowadays.
It’s very rare that an application level HA solution needs to be designed.
Mainly because most applications come with an inbuilt HA capability.
Here we will discuss the HA solution that Aliter Consulting architected for Capita’s SAP BPA implementation and how this was accomplished in Microsoft Azure public cloud using a cloud native solution common across many public cloud offerings.


About SAP Business Process Automation:

SAP Business Process Automation (BPA) is a partnership offering from SAP and Redwood, which can be deployed on-premise or in an IaaS scenario in the cloud.
It’s sold outside of SAP, but under the SAP software umbrella it is known as SAP BPA, and comes pre-bundled with lots of SAP specific connectivity options and capabilities.
In regular use, it functions as an enterprise scheduler, capable of advanced batch job scheduling, but also includes some very clever engineering for providing APIs and exposing functionality that allow it to be integrated with almost any other enterprise applications.


BPA in Netweaver:

The SAP BPA software naturally sits on-top of a SAP application stack.
In this case it is SAP Netweaver AS Java 7.50 (how long for? we don’t really know) that provides the application base, into which the SAP BPA java components are deployed in the usual SAP way through the Software Provisioning Manager (SWPM).
The Netweaver layer provides the User Management Engine (UME) and also the connectivity capabilities for talking to other SAP systems through HTTP and RFC.
All the security and manageability of SAP Netweaver that we have come to expect, is at play here. Plus, naturally, Netweaver is already certified for the cloud.
An outsider would actually imagine that SAP BPA was part and parcel of the SAP suite of products, the interplay between BPA and Netweaver is so well done.


SAP Netweaver High Availability:

The Netweaver AS Java has an inbuilt clustering capability, which lends itself to easy HA deployments, including in the cloud.
Simply, two installations of the application layer within the same SAP system. Each installation of the application layer within a system, is called and instance.
Therefore, 2 instances of 1 system provides a HA setup.
This is the Netweaver layer.
Both of the instances act individually, and both have equal, shared access to the underlying database, which is used as the software repository source for the Java component libraries and sources.
We can see that HA in a Netweaver setup really depends on the application layer.


BPA & Jobs:

With SAP BPA, the HA architecture needs to consider the following points:

  1. Database.
    This is the persistence layer for the executing jobs, which can be resilient to application layer restarts.
  2. File system.
    The job logs are actually persisted on the file system. This is very clever and harks back to the trouble of Binary Large Objects (BLOBs) in database tables and the issues of managing table fragmentation. Databases (in the normal relational guise) are not ideal for unstructured BLOB storage.
  3. Jobs.
    The jobs themselves, within SAP BPA, need to be managed using some form of synchronised control plane.
    We use the word synchronised, because we’re talking about Java thread singularity. It would not be trivial to write an application capable of running with good performance with multiple threads and managing hundreds of jobs. When writing these sorts of applications, the available application locking mechanisms are usually shared resources such as a shared memory region, or some sort of lock table. The complexity of this is not helped when you have to account for multiple application layers in a HA setup.

To provide a resilient, synchronised application, the BPA application removes complexity from the HA design, by only permitting 1 single application instance to act as the master at any one time. Remember, we have a 1-to-1 relationship between BPA and Netweaver instances.
The other BPA instance is known as the slave, and while it is operational it does not execute any jobs.
The Redwood technical term for this master/slave setup, is a “cluster”.
From SAP BPA v9, the default clustering mechanism is called “Redwood Messaging” (RWM).

We can say that the master is actually the job scheduler and the slave is purely for standby purposes.
Both underlying SAP Netweaver instances still have access to the shared database, both are up and running from a pure Netweaver level, but only one of the BPA execution instances within the two Netweaver instances, is able to process/execute the BPA jobs.
It is assumed (we don’t know for sure) that this BPA master/slave capability is provided at the database layer, which probably hosts a temporary table containing a single record entry, designating the master BPA scheduler host. This lock table, is more than likely locked on a first come, first served basis.
The first operational SAP BPA scheduler instance, is the first to lock the table and becomes the master (scheduler).
Conversely, when the lock table is destroyed (or the lock on the table released) the slave has the ability to become the master.

Now we know how BPA provides for HA, with two application instances in a master/slave relationship, we can plan the HA strategy.


High Availability in the Cloud:

Usually, an on-premise HA solution is provided for using operating system (O/S) level clustering of resources.
For SAP BPA, this would involve our two Netweaver application instances being installed locally on two servers.
Across both servers would be an O/S level cluster containing a single IP address resource, which would be the central entry-point to the SAP BPA access URL within the SAP Netweaver layer.
If one of the Netweaver instances fails, the clustered IP address is failed over to the other server in the cluster and SAP BPA becomes accessible (with the assumption that the RWM has already promoted the BPA slave to become master).
The same can be achieved without the cluster software, by using just DNS and auto-switching the DNS record to point to an IP address fixed to the secondary server.

In the cloud, we are limited to certain O/S cluster solutions, sometimes this is due to the actual communication method of the cluster software itself and sometimes this is due to the locking mechanism of the cluster software (to prevent/detect split-brain).
It is still perfectly possible to deploy an O/S level cluster, but in the case of SAP BPA, we don’t actually need to, just to manage 1 IP address resource. It’s over complicating the solution.

Within the cloud, we can make use of the standard load balancer (LB) offerings.
These are paid-for software based methods of balancing load over two or more virtual machines (VMs).
The load balancers are well integrated into the cloud vendor’s provisioning interface to allow easy administration and configuration.


BPA HA in the Cloud:

Instead of a standard O/S level cluster of 1 IP address resource, we will use a cloud hosted software load balancer (LB).
The LB itself is allocated an IP address, which can be mapped in DNS to a virtual hostname.
This provides our central entry-point hostname to the SAP BPA application.
To permit the LB to route traffic to both the application servers of our two Netweaver instances, we just include the two servers on which the instances are installed, as members of the LB.
We then configure the LB to link the entry-point to specific TCP/IP ports on both of the member servers.
For example, we will use port tcp/80 as our entry point port and route it through to tcp/50000 on both application instance servers (we assume Netweaver is listening on tcp/50000).

So that the LB knows when the application is running (when it is alive), we configure a health probe rule.
The health probe rule is executed at defined intervals and checks that the target application is alive (responds accordingly to consecutive probes).
If the application is alive, it receives TCP/IP traffic from the LB, in a round-robin fashion.

Health probes come generally in two fashions, pure TCP or HTTP.
The TCP option means that the target server(s) are considered healthy (working & alive) if they respond positively to a TCP request to a specific TCP port on the member server(s). A positive response is actually the successful connection to the port (a full TCP/IP connection is established).
The HTTP option means that the target server(s) are considered healthy if they respond positively to a HTTP(s) GET request on a specific TCP port on the member server(s). A positive response is actually the reception of a HTTP 200 “OK” from the target server(s).
NOTE: HTTPS is available and functions the same as a HTTP health probe, but the SSL certificates are not validated.

In our dual Netweaver instance setup, we could configure tcp/50000 as the TCP health probe port (we don’t need the overhead of HTTP), but since the HA scenario means that both Netweaver instances will be up and running, both will respond to the health probes and both will therefore receive our TCP/IP traffic. The result, in this case, will be a bouncing of the inbound (to the LB) traffic between the two application servers.
In a regular Netweaver HA setup, this may be acceptable, but as we know, only one of the BPA instances will be the master. The other BPA instance (the slave) will simply refuse to provide a valid response to requests at the BPA User Interface (UI) level, or the scheduler level.

To enable a BPA level health probe, we need to use a BPA application level call, which means ideally a HTTP(s) health probe.
Remember, for a health probe to be considered healthy, we just need to return a HTTP 200 “OK”. Anything else means the health probe is not healthy and the target server will not receive traffic from the LB.
If you look through the SAP BPA administration manual, you will see that one of the licensable features of SAP BPA, is the ability to create Extension Points.
An Extension Point is a custom SAP BPA code extension which yuo can hook your own Java code into. It’s more low-level than the regular process definitions, because it doesn’t execute as a job inside the scheduler.
An Extension Point can sit just behind the J2EE request handler filter of the BPA application, which means it has the ability to directly interact with inbound HTTP calls, and if necessary immediately influence the HTTP return code.
Not only this, but since it sits behind the filter, it doesn’t actually require authentication in the usual BPA sense. It just needs a designated BPA username in which to execute its Java code payload, but it will not prompt for authentication.
This makes a BPA Extension Point the perfect solution for responding to a HTTP(s) health probe from our LB.


Mastering the BPA Extension Point:

The BPA Extension Point that is created, needs to respond to a basic HTTP GET request.
We create the Extension Point and assign it a specific URL, which is a sub-URL of the standard BPA context URL.
We then insert our Extension Point code, which calls the BPA standard APIs to determine whether the current BPA instance is the master.
If the current BPA instance is the master, we return a HTTP 200 “OK”.
If the current BPA instance is not the master, we return a HTTP 503 “Service Unavailable”.
The use of the 503 is a good choice, because this is the same HTTP return code that Netweaver itself will return when it is starting up, but the BPA application has not yet been started.

The Extension Point is then deployed within the BPA application and will be immediately accessible from both Netweaver instances.
Manually calling the Extension Point URL from a web browser, will return a HTTP 200 “OK” from the master BPA instance and a HTTP 503 “Service Unavailable” from the slave BPA instance.
Once the health probe rule is properly configured within the cloud LB, the rule will immediately start running and traffic will be route-able from the LB to the master BPA instance.

Testing the Failover:

With your setup complete, only the testing remains to be done.

If the base state zero (BS-0) is that no SAP Netweaver instances are running in the BPA SAP system, then starting up only one instance, will mean that instance will become master and be accessible through the LB hostname.
Starting up the second Netweaver instance will not affect the service availability and all traffic will continue to be routed to the first (master) BPA instance.

If the base state 1 (BS-1) is that both SAP Netweaver instances are running in the BPA SAP system, then stopping/killing the current master BPA instance, will mean that second instance will become master and be accessible through the LB hostname.
The old master instance will no longer receive traffic from the LB.
Re-starting the old master Netweaver instance will not affect the service availability and all traffic will continue to be routed to the second (new master) BPA instance.

If the base state 2 (BS-2) is that both SAP Netweaver instances are running in the BPA SAP system, then stopping/killing both the Netweaver application instances (or by making the database layer unavailable), will result in no traffic being routed from the LB. The BPA instance will be completely unavailable.
The old master instance will no longer receive traffic from the LB.

See how Aliter Consulting can help you architect your new cloud hosted SAP landscape, to achieve minimal technology spread and spend.


Closing Points to Note:

The SAP BPA application uses an application layer high availability design which can be interfaced with a cloud hosted load balancer to provide application level failover capabilities.
The interface mechanism is via HTTP using the BPA Extension Point feature, to respond to the load balancer accordingly.


SAP Note 2005087 “SAP BPA V9” v1