3 best practices for achieving high availability in cloud computing

Availability is an important part of service level agreements in cloud computing to ensure that the infrastructure can continue to operate even if a component fails. In the event of low availability, a company cannot access its data or applications and potentially loses revenue.

Availability addresses points of failure within systems, databases, and applications. High availability, sometimes referred to as HA, better protects businesses against disruptions and supports productivity and reliability.

Follow these three best practices for achieving high availability in cloud computing.

1. Determine how much uptime you need

Availability is a measure of how long a system is functioning properly. A service level agreement (SLA) between a cloud service provider and a customer will outline the expected cloud availability and the potential consequences for non-compliance.

Large providers, such as AWS, Microsoft Azure, and Google Cloud, have SLAs of at least 99.9% uptime for each paid service. The provider promises its customers that they will experience less than nine hours of downtime in a year. The more nines there are in the count, the less downtime the customer can expect to experience in a year.

Application complexity can affect availability. For example, simple websites might see 99.9999% uptime (about 31.6 seconds of downtime each year) because there are very few points of failure. On the other hand, a more complex monolithic web application that has more components, such as caching servers or object storage, creates more points of failure and can make high availability difficult. Businesses can employ additional redundancies to ensure availability, but this increases costs.

The uptime required for an application largely depends on its importance. For example, users visiting the site of a lawn care e-commerce giant may be more forgiving of downtime than users of an emergency service provider. When negotiating an SLA with a cloud service provider, a company must weigh the consequences of downtime for its users and what it can afford. Not everything needs 99.999999% uptime.

2. Understand the main high availability components

High availability can cost a lot of time and money, but it’s essential for mission-critical applications. However, the key to high availability is applying the right amount of resources to a workload. There are many tools to ensure that workloads remain accessible during internal or external disruptions. Organizations must apply the right resources and availability requirements to a given workload to balance reliability and performance with cost.

There are several components of a public cloud platform that organizations need to understand to weigh the benefits and costs of high availability:

Physical locations. Organizations achieve high availability by finding and eliminating single points of failure and distributing redundant instances across Availability Zones.

Networking. A good network connection is essential when transferring data between the cloud and local storage. Some workloads require dedicated connectivity.

Compute instances. In public clouds, servers take the form of compute instances. A cloud customer can organize these instances into clusters or create backup instances for failover, which may cost more.

Storage Instances. Application data is kept in storage instances and cloud storage services are highly available. This removes the need for replication. However, beware of storage becoming a single point of failure for applications.

Load balancing. Load balancing is how organizations direct traffic to multiple compute instances to accommodate more load on the instances. Load balancers are often the first component to discover, report, and modify an instance failure.

IP failover. When an instance fails, the failed instance’s IP address must be remapped to the alternate instance to redirect traffic.

Monitoring. In terms of SLAs, monitoring can help validate the availability of uptime. It is also used to reveal availability issues and track usage of cloud resources.

3. Assess application needs before adding HA

It’s easy to apply services like load balancing and IP addressing schemes to the cloud. But every application is different, and cloud users should assess their needs before applying high availability. Before adding high availability to an application, ask yourself these questions:

Does the workload benefit from high availability? High availability is not always the best solution, in terms of cost and complexity. An administrator can select a high availability workload type even when it is not needed.

Does cloud high availability justify the cost? Consider expected downtime and user reaction. Then determine the maximum allowable downtime and implement the right high availability policies to ensure this requirement is met. Monitoring and logging cloud uptime and downtime is one way to know acceptable performance.

Is high availability being applied to the right assets? Determine what the organization’s goals are, such as peak performance and workload availability. Evaluate what is most valuable to the cloud workload and how availability requirements will benefit those goals.

Is high availability more complex than necessary? High availability comes from a wide range of technologies and procedures that can be used or combined. Evaluate if there is an easier way to get downtime protection that would cost less.

Is cloud high availability working as expected? Evaluate the high availability configuration to ensure that the deployment was successful. Examine performance against disruptions caused by physical events, such as natural disasters. Audit infrastructure to ensure established requirements are met. If instances fail, they must bounce back within a justifiable time and without data loss, as specified in the SLA.

Sherry J. Basler