oXya’s Good Practices to Deploy and Test HA Clusters

HA cluster

Today’s businesses are highly dependent on the reliability and availability of their mission-critical applications. Any downtime can have a significant impact on performance, High Availability (HA) cluster becomes key, making sure there’s no point of failure.

In this article, we will further explore what HA cluster is and how it improves the SLA.
We will then provide an overview of oXya’s tried-and-tested methodology for deploying and testing.

Improve Your SLA with High Availability

A high availability cluster typically refers to two systems located in the same region, but in different zones, replicating all the data and ensuring operational continuity in a finger snap if one system fails.

As we provide more availability, this architecture makes a significant difference in SLA; it increases from 99.9% to 99.95%. This means going from about 10 to 5 minutes of downtime per week!

Reducing the RTO has always a significant financial impact for the customer due to the redundancy of the cloud resources. oXya architects are helping the customers to balance the pros and cons of each HA cluster scenarios to find the right combination.

To make the most of this solution and for successful integration in the existing infrastructure, the customer must ensure that the solution meets specific requirements, such as load balancing, data scalability and geographical diversity.

An HA cluster should not be confused with a Disaster Recovery (DR) site. With an HA cluster, the failover is automated and performed within the same region. A DR failover is triggered manually and started in a different region.

Mastering HA Cluster Deployment: Our Turnkey Solution

HA cluster deployment is a complex task that not only needs technical expertise, but also requires following many steps based on the provider’s documentation. For example, deploying 2 SAP HANA instances in High Availability cluster on Amazon Web Services (AWS) is different than on Google Cloud (different architecture and configurations) and we need to follow the instruction from the different vendors (SAP, Red Hat or Suse, Google, AWS or Azure).

That’s why oXya has developed a turnkey solution to deploy HA clusters efficiently.

In fact, we have developed an internal tool for all cloud providers, whether it’s Google Cloud, Microsoft Azure, Amazon Web Services or oXya Cloud. The solutions we have created use automation tools, like Terraform and IaC Express, to minimize human error risks.

This proven methodology has been successful with multiple customers: our experienced team has fully mastered this process, as we will show in a moment with QA tests.

In the SAP context, we have several types of HA cluster available with specific reference architecture:

  • SAP HANA instances clusters
  • ASCS/ERS clusters,
  • SCS/ERS clusters (for JAVA stacks),
  • SAP ASE clusters,
  • SAP Webdispatcher clusters,
  • SAP Cloud Connector clusters,

We usually explore the architecture and configuration of these HA clusters during the intial project weeks with the oXya Technical Workshops.

SAP HA cluster example on AWS

SAP HA cluster example on Google Cloud

Quality Assurance Test for Resilient Configuration

Any misconfiguration of an HA cluster can have a negative impact on the application SLA. After the deployment, a quality assurance (QA) test is crucial to ensure the configuration is done properly and there are no potential delivery defects remaining. But above all, it allows us to demonstrate we have respected all the provider’s requirements and prevented problems with the SLA.

At oXya, we assign two different teams to the process: the project team, in charge of the deployment, and the delivery team, who tests the cluster itself and the SAP system on artificial data. At this stage, our delivery team conducts a series of tests, like failure, hard shutdown and graceful shutdown and ensure the cluster is properly failing over on the opposite node.


We could compare these tests to Netflix’s Monkey Testing, where a monkey is put inside a system to make trouble. By simulating the actions of the monkey, we’re able to check how the system reacts to different unexpected scenarios, and ensure the cluster is still up and running every time.

Our proactive testing approach ensures your configuration is resilient and compliant with SAP and the public cloud best practices.

Ensuring Continuous Operation with oXya Managed Services

After the QA tests, we use QA test delivery certificates to transfer the HA cluster from the project team to the steady state team. After deployment, good monitoring is crucial: we need to quickly identify any abnormalities to ensure the cluster is always operational.

With our design scripts on ITSM Cockpit, we ensure proactive monitoring to provide you with the best SLA protection. Moreover, HA clusters also facilitate near-Zero Downtime Maintenance (nZDT).

With oXya, you benefit from a proven methodology and extensive experience with many customers, who opted for oXya Managed Services after a successful collaboration. Make the first step to optimize your resilience: contact our team today here.

Share it now: