How to Leverage Automation to Accelerate and Secure Azure Infrastructure Deployment and Maintenance
Nov 30, 2022–
Customer: Global Cosmetic Company
Project: Global S/4 system on Microsoft Azure
Technologies used: Terraform, Ansible/AWX, Azure
Project Goal: Deploy and maintain 200+ virtual machines on Azure for a Global Cosmetic Company
SAP-related Virtual Machines (VMs):
- 50 HANA database servers
- 15 clustered environments
- 50 ABAP servers
- 15 JAVA servers
The philosophy behind using automation is based on the assumption that if a task will repeat multiple times, it makes sense to automate it to save time and human error.
Using automation for deployments with Infrastructure As Code (IAC) can be tedious, with a large investment in time initially required. This investment is, however, recuperated with the following benefits:
- Gain time for future similar tasks
- Guarantees homogeneous systems and configurations (no manual human mistakes leading to unexpected errors)
- Easy server maintenance through automated tasks
- Code allows better control of Azure resources compared to a manual deployment
How did oXya use automation for this project?
In this blog, we will discuss two separate cases where automation is highly used:
- Server Provisioning: How did oXya use automation to deploy and build 200 servers?
- Operating System Patching: Once the servers are built, how do we maintain them? How do we organize the patching in an automated manner for these servers?
Interview Questions (Part 1)
How long did it take to complete this project?
Jean: It was a 2-year project. I have been working on it full-time for 2 years now. However, within these 2 years, new requests and changes from the client have expanded the project scope significantly.
The time to build a server is relatively fast, about half an hour. What usually takes more time are configuration details, such as determining the server name or sizing the Virtual Machine properly. In terms of CPU/RAM/Disk correctly. These details constitute some of the input needed to leverage automation.
After 2 years, we have built 200 servers for the client. However, this is just a small piece of the project.
These 200 servers are relative to the SAP workload but not solely. To give you some approximative figures, within these 200 servers, we have 50 HANA database servers, 15 clustered environments (i.e., 30 VMs), 50 SAP ABAP application servers, and 15 JAVA application servers. This is a brief overview of the SAP landscape we have for this customer.
The point I want to make here is that we have at least a dozen servers of the same type.
When you have so many servers of the same type, using automation allows keeping configuration consistency.
What was the main goal of this project? Why did the client come to oXya for this specific project, and why did they want oXya to leverage automation?
Jean: This customer used to have their own regional IT team and regional SAP systems.
The goal for this project was to build one Global SAP landscape based on the new products S/4HANA and BW/4HANA, accessible by all users worldwide.
The main benefits of this project were:
- Simplify and reduce TCO of the application landscapes
- Bring more agility with Azure Cloud solutions
- Reduce the number of partners to maintain the solution
- Leverage new SAP solutions and Innovation
Historically, this customer knows us because we have been working for them on their regional systems for EMEA. The client had high expectations in terms of automation and trusted us to be the right partner to conduct the implementation in an efficient and timely manner.
Can you describe how you used automation to deploy Azure Infrastructure?
Jean: For Azure deployments, the oXya Cloud team uses Terraform. Terraform is an IAC tool used primarily by DevOps teams to automate various infrastructure tasks.
Once the server is built, Ansible/AWX is used to perform post operations.
Post-op tasks can be as simple as:
- Time zone configuration on the server
- OS user creation for monitoring
- Mounting a disk
Ansible/AWX also allows the automation of more complex operations, such as installing HANA Databases or even an SAP NetWeaver engine.
Can you tell me more about the maintenance process?
Jean: After the creation of these servers, maintenance is the second most important task of this project, where we leverage automation. Operating system patching is the most common maintenance we perform for all our customers’ systems. The patching consists of applying the latest updates provided by the operating system vendors. Frequent and strict patching became a real security focus with the global increase in security threats and cyber-attacks: all environments must have the latest security patches applied and subsequent Zero-Day Vulnerabilities addressed with priority. This must happen every month without fail across all IT to ensure the safety of systems.
Our customers wanted their environments patched monthly, with their sandbox and development environments patched during the first week of the month. After validating that no issues arose due to the patching, we performed the same activity on their quality servers and pre-production systems the following week. Once passed this iteration, production patching follows the week after.
With this process occurring every month, it is quite a big task in terms of maintenance to conduct manually. Additionally, while we might be able to perform the non-production patching during business hours, production must always occur during a maintenance window that typically falls in the late evening or early morning hours of the weekend. This is, of course, due to the downtime required, which directly impacts the customer’s business.
With automation in the picture, these tasks are automated using the Ansible/AWX jobs and workflows. The complexity lies in the fact that SAP is quite a heavy piece of software with a large database, with a lot of users connected and jobs running simultaneously. With SAP, it is not possible to reboot by simply clicking the button: the software and the entire landscape must be graciously shut down by following a typical shutdown and restarting the process of the App first, Database Second, and everything else last.
Another layer of complexity applies to systems running with clusters, typically used in production environments, to ensure the high availability of SAP and its database. When a cluster node cannot deliver the service anymore, the node immediately takes over. The Ansible/AWX workflow disables the cluster at the beginning of the maintenance, followed by a proper shutdown of SAP and its DB before the patching and reboot process. Once the reboot has been initiated, the automation in place will bring back the system, meaning the DB, SAP, and clusters will be re-enabled.
Going further to illustrate how far we can go with automation. Our customer has a lot of PI/PO systems. The workflow can also stop the PI/PO channels ahead of time before shutting down everything else and then restarting the channels at the very end of the process.
OS Patching is a great example of how automation can help us maintain the systems for customers more efficiently. As a result, the typical problem of maintenance effort exponentially growing due to landscape size and complexity goes away.
The main benefits are always the same:
- Gain of time
- Consistency and quality
We will discuss the project challenges and how oXya overcame them in the next blog. If you are interested in oXya’s approach to automation, we would love to hear from you — If you are in the U.S., please reach out to our U.S.-based team here at firstname.lastname@example.org. If you are in the EMEA region, please reach out to our headquarters team at email@example.com.