How to Manage Disaster Recovery in Cloud Computing

Post-pandemic remote work has become the new norm and cloud adoption has received a boost. Naturally, organizations want to ensure that their cloud infrastructure continues to function even in the event of a disaster. Disasters can be natural like floods, earthquakes, power outages, etc., or man-made like a given cloud provider’s data center being offline due to internal issues.

Accelerating cloud adoption has prompted organizations to evaluate their disaster recovery (DR) policies across AWS, Azure, Google Cloud Platform (GCP), and other public clouds. It is important to note that business continuity and disaster recovery strategies must complement each other for the smooth functioning of organizations.

One of the critical aspects of business continuity for organizations is preparing for technical disasters. Whether it’s hardware or software malfunctions, cyberattacks or natural disasters, it’s important to be vigilant in backing up data.

Data loss can have a significant financial impact on businesses, as well as a negative effect on their reputation due to a lack of customer trust. Therefore, careful preparation and building a roadmap to deal with future disasters is key to limiting a company’s long-term prospects.

A recent research report indicates that 54% of businesses have experienced an extended disruption of a full work day in the past five years due to system failures. Additionally, the research also points out that extended downtime can result in a loss of $10,000 per hour for small businesses to over $5 million per hour for enterprises.

This has prompted organizations to adopt a comprehensive disaster recovery plan. Also, the general awareness among organizations to have a well-defined disaster recovery plan has increased in recent years.

What is cloud-based disaster recovery?

Cloud-based disaster recovery is a mechanism/solution that helps organizations to recover critical systems after any disaster. It also allows remote access to servers and systems in a secure virtual environment. Most companies spend 2-4% of their IT resources on disaster recovery planning, with some organizations devoting up to 25% of their IT spend to mitigating infrastructure risk.

Here is a simple cloud disaster recovery plan illustrated in eight steps to help your organization develop an effective disaster recovery strategy:

1. Know your infrastructure and the risks involved

All organizations need to assess their IT infrastructure consisting of assets, equipment and data. It is also essential to determine where these assets, equipment and data are stored and their net worth.

After assessing the assets and the risks involved such as natural disasters, data theft, and power outages (among others), organizations can design the disaster recovery plan to minimize the effects of these disasters.

2. Conduct a business impact analysis

Business impact analysis helps to understand the limitations of an organization’s business operations after the disaster. The following two parameters play a significant role in the assessment of the situation; Recovery Time Objective (RTO) and Recovery Point Objective (RPO).

OTR is the maximum time an application can remain offline before business operations are interrupted, RPO is the maximum time an organization can sustain loss of data from an application due to a major disaster.

3. Create a DR plan based on your RPO and RTO

After determining the RPO and RTO, the focus can be on designing a system that meets the organization’s DR goals. To put the DR plan into action, an organization can choose from the following options:

Backup and Restore

Pilot light approach

hot standby

Multi-site approach

Multi-cloud approach

All of the aforementioned DR approaches can be explained by considering AWS’s cloud-based DR system.

  1. Backup and Restore: This is a disaster recovery mechanism that works by periodically taking backups of data. This approach relies on the RPO. For example, if your database changes data frequently, such as power consumption during peak hours, it needs a higher RPO, and a static database can be managed with a lower RPO.
  2. Pilot approach: This approach is often compared to the working analogy of gas heating. A gas heater consists of a small flame that can ignite the whole oven, similarly in the Pilot Light approach the cloud database server is always on for incremental backups like the small flame in the analogy of the heating. The application server and caching replica environments are kept in standby mode and can be compared to the entire gas heater oven. In the event of a disaster, application and caching servers are activated and via elastic IP addresses, users are redirected to the ad hoc cloud environment.
  3. Hot standby approach: In this approach, whenever an on-premises data center goes down, multiple EC2 instances are used to ensure that the application and cache environments are upgraded for the production load. With the help of Amazon Route 53, traffic is redirected instantly with almost no downtime.
  4. Multi-site approach: This technique is considered optimal. When a disaster occurs all traffic directed to on-premises servers is redirected to the AWS Cloud and multiple EC2 instances are used to handle full production capacity.
  5. Multi-cloud approach: In this method, we have a primary cloud provider and a backup cloud provider. For example, if AWS is primary, Azure cloud might be for DR. This ensures that whenever a primary cloud is down, your systems can still operate in the secondary cloud. Just a few days ago, the AWS cloud was down for a few hours and services from many sites like Netflix were down. By having multi-cloud disaster recovery, one can solve these types of challenges.

4. Trust the right cloud partner

Once an appropriate disaster recovery plan has been considered, it is important to seek out a trusted cloud service provider who will assist you in its execution. Here are the factors to consider when choosing an ideal cloud service provider: reliability, speed of recovery, ease of installation and recovery, scalability and security compliance.

5. Make sure the cloud disaster recovery infrastructure is in place

After consulting with a cloud disaster recovery partner, you can work with the vendor to bring the idea to life and create the DR framework. For trouble-free business operations, the DR must comply with RTO and RPO requirements.

6. Put your disaster recovery plan on paper

It is essential to establish a quality procedure or process flowchart with explicit instructions for everyone involved in disaster recovery. When disaster strikes, each person must be prepared to take responsibility for their role in the disaster recovery process.

7. Simulate real failures

You want to ensure that the disaster recovery plan can be executed in the event of an actual failure. In many cases, we realize that key components are only missing when the disaster actually occurs, and we are unable to complete the plan. So many organizations simulate a breakdown.

For example, many companies shut down all the servers in a given data center and expect the disaster recovery plan to kick in and deal with the situation. If there are problems, the servers can be restarted, but they will understand all the shortcomings of the plan with such simulations.

8. Revisit and test your disaster recovery plan often

The next step for the organization will be to test its disaster recovery plan to ensure there are no loopholes. Its reliability can only be analyzed after testing.

Since organizations undergo changes in terms of plans, people and management, it is essential to practice the disaster recovery plan after each change and be prepared for any crisis.

Having the right cloud partner is one of the primary responsibilities of every organization to ensure disaster recovery best practices are implemented.

Edited by Affirunisa Kankudti

(Disclaimer: The views and opinions expressed in this article are those of the author and do not necessarily reflect the views of YourStory.)

Sherry J. Basler