What is disaster recovery, really?
The term "disaster recovery" has been bandied around the world of IT for decades. During that time it has meant many different and frequently contradictory things. To some people, disaster recovery means getting data systems up and running after something bad happens. To others, it means finding a way to keep things running until you can get back to your data center after something bad happens. And sometimes disaster recovery means getting an organization ready so that when something bad happens, you’re prepared.
You may notice a common theme in the idea of disaster recovery: something bad. That bad thing can range widely, from a burst pipe in the machine room to a terrorist attack to a natural disaster. You need to prepare for more than just the obvious.
Fortunately, while the preparation process differs depending on the nature of the disaster, most of the recovery is pretty much the same. And fortunately, some aspects of disaster recovery have become much easier. For instance, with the advent of the cloud, it’s now possible to use data that’s already in cloud storage.
What’s a disaster?
When you plan for disaster recovery, it’s important to decide what constitutes a disaster, versus what’s simply an inconvenience. In general, anything that significantly impairs day-to-day work can be considered a disaster, including a major power outage, a hurricane, a snowstorm taking down the roof of the server room, or terrorism.
“Significantly” depends on your organization and how it conducts business. Not everything bad is necessarily a disaster; some events are a disaster to you that are not to others. For example, if your organization has several locations, each of which has access to corporate data and can conduct routine activities for the whole company, then you may encounter inconveniences but few disasters. On the other hand, if your organization has a single location, a single data center, and a minimal staff, then anything that causes a disruption might trigger a disaster recovery response.
But even then, not every event is necessarily a disaster. For example, a power outage might not matter much if your data center is equipped with on-site generating capacity and you have a way to get adequate fuel delivered. However, a minor power outage at another location could be a disaster if it takes out your network access and you don’t have a backup connection.
It’s worth noting that a disaster in this context does not necessarily mean widespread destruction, loss of life, or general catastrophe. What a disaster means to you is defined by what interferes with your operations to the point that it endangers your business and thus requires a disaster recovery response. A disaster recovery response is the set of actions your organization must take to continue operations in the face of an unforeseen event.
But unforeseen isn’t the same thing as unplanned. Even if you have no way to know what the future may hold in terms of bad things, you can still plan for them.
Such planning can take two forms. One is to plan for events that may keep your organization from functioning but would allow business to resume once the disaster conditions end. A flood may block access to your facility but leave it untouched, for example; the real problem isn’t so much recovery as it is business continuity. You still need to use your disaster planning so you can stay in business, but the parts of your plan that involve rebuilding your facility aren’t necessary.
However, there is the other form. Some disasters may damage or destroy all or part of your data center or computing facilities (such as employees’ desktop systems) and may result in significant downtime. While you hope this doesn’t happen, if it does, you need to have put your ducks in a row well in advance. For example, you need to set up procedures with your insurance carrier to expedite claims for rebuilding and ensure you have coverage for downtime and contingency operations ahead of time.
Likewise, if you have more than one business location, then you have a way to keep operating while you rebuild. However, you have to make sure that all functions, not just the critical ones, can be performed at the alternate location.
Finally, the type and extent of the disaster affecting your business matters. For example, think again of the flood example, the one that blocks the road leading to your facility but leaves the facility untouched. If your employees have access and the road is uncovered fairly quickly, the disaster will be limited. On the other hand, if the disaster prevents your employees from getting to work and makes working from home difficult or impossible, then the disaster requires a greater response.
But the nature of that response still depends on the nature of the disaster. Because of this, you need to know what kind of disaster it is, and how long your facilities may be unavailable.
As a result, you need to create a plan for any foreseeable event, from hurricanes to the zombie apocalypse, and document a separate plan for each eventuality. (Fortunately, most plans have a lot of overlap.) Then, if the disaster actually happens, you need to determine which plan to execute. In some cases, you may need to draw a response partly from one plan and partly from another.
This all sounds overwhelming, and indeed it can be. Disaster preparation is a lot of work. However, while it does require a lot of organization, you don’t need to go it alone. Disaster recovery services can provide a turnkey disaster recovery implementation, and disaster recovery consultants are ready to provide as much or as little disaster recovery assistance as you want.
What’s disaster recovery?
Disaster recovery is the process of resuming operations for your organization following a disaster. This includes regaining access to data and the applications and communications you use to access and operate using that data.
Recovery may include finding ways for your employees to return to work, finding alternate work locations, establishing communications, and in many cases, providing everything from desks and computers to transportation. Ultimately, recovery includes resuming normal operations, which may mean building or gaining access to a new data center.
Disaster recovery starts with planning. When a disaster happens, you won’t have time to begin thinking about your response. Instead, your response should be planned for in advance, and it should reflect as closely as possible the nature and scope of the disaster. In addition, the plan must reflect your company’s operations so that you can tailor the response to meet its specific needs.
Because you can’t know the nature or extent of any given disaster in advance, you need to develop several plans. Such plans may be functional in nature, reflecting the range of bad-news possibilities—everything from loss of access to your facility to a total loss of the facility.
You may also choose to design your plan around foreseeable events. That starts with a hurricane plan if you’re located along the coast, or a severe weather plan in areas that can encounter massive snowfalls, severe weather, or wildfires.
Regardless of how you focus your plan, it must include several key components. These components include recovery of the facility, which may involve restoration of power and communications or even reconstruction, depending on the disaster. You should assume you need a facility to stay in operation, such as temporary quarters while you rebuild.
The following are useful resources for a starting point:
- U.S. Department of Homeland Security disaster recovery
- National Institute of Standards and Technology
- Emergency Response Plans
- Independent disaster recovery organizations
There also needs to be a communications recovery plan. Your organization will need to move data and voice communications to a new location while you’re out of operation at your primary facility. This may mean switching to a different facility with a backup phone system, or it may mean allowing employees to take calls at home.
Your employees are also a critical component of your plan. You need to plan for situations in which your staff is unable to travel to a work location. You may also need to plan for a significant loss of staff after some types of disasters such as a major hurricane. This can mean arranging to have your staff receive company phone calls and data access at their homes, arranging for transportation to a new work location, or finding temporary housing for your employees.
Unfortunately, some employees may not survive the disaster, or they may not be able to continue working after the disaster. The disaster plan must include support for survivors and support for new employees hired to take over for previous employees. You need job descriptions, work products, and files available for replacements when they come to work after the disaster.
What about business continuity?
Business continuity (sometimes known as continuity of operations) is related to disaster recovery, but it’s the means you use to keep your organization operational during and immediately after a disaster. As such, it’s part of your disaster recovery plan because it keeps your organization functioning while the disaster recovery plan takes place.
In many cases, business continuity involves the immediate steps you take after a disaster to keep your call center operating, your data systems able to process orders or provide customer support, and your staff able to operate, even if at a reduced level. For example, enabling employees to answer calls from their homes or work from temporary quarters are part of possible continuity plans.
Business continuity also requires a plan. You need to know whether employees can receive phone calls and whether their home Internet access can support business use, and determine what sort of temporary quarters are available. In addition, you need to work with the company’s voice and data providers to develop a plan for switching service to a temporary location.
Remember that the arrangement is only temporary. Part of the continuity plan is a process by which to resume operations at your facility once it’s available or after it is rebuilt.
Why the cloud is important
A significant new factor in disaster recovery is cloud storage. It’s significant because cloud storage can simplify disaster recovery, and it can make real business continuity much easier to achieve. The reason is that much of the work of making your data available for recovery is already done.
Many businesses store at least part of their critical information in the cloud, either as a form of backup or because it helps the organization stay on top of changes. Either way, when business data is available in the cloud, it can also be used for disaster recovery.
But of course there’s more to disaster recovery than just having some of your data in cloud storage and some applications running on cloud servers. For your cloud data to be useful in recovery, it all needs to be available in the cloud except for archival material held in long-term storage. That way, when it’s time to get your data center back up and running, you have all of the data and applications you need.
For business continuity, the need is more complex. In addition to having your organization’s data, you must also have the applications that use that data; you must have computers capable of running those applications; and you must have communications in place capable of supporting the applications in daily use.
However, as handy as having everything in the cloud may be, it’s not the right solution for every organization. For example, a public cloud provider might not be suitable for security reasons or because the latency of a connection to the cloud may impact performance. Instead, some other type of off-site storage may work better. This could mean having your private cloud replicated off-site, for example.
Likewise, an organization may choose remote mirroring of its critical data so that a same disaster doesn’t eliminate everything the company needs to operate. Perhaps the best example of such an arrangement was demonstrated by the financial services company Cantor Fitzgerald, which lost its headquarters and two-thirds of its employees in the 9/11 terrorist attacks. The company had been mirroring its data using a metropolitan fiber connection to another location in New York and then to its office in London. Despite the magnitude of this disaster, Cantor Fitzgerald was able to resume operations in a week.
Ultimately, the key isn’t necessarily the cloud, but rather making sure that critical data and applications are available in an off-site location. For many organizations, the public cloud is probably the easiest and least expensive means of accomplishing this. But it’s not the only way, and for some organizations, it's not the best way.
Disaster recovery checklist: Lessons for leaders
Disaster recovery is highly detail-oriented, it’s complex, and it requires a team to put together all of the moving parts. Adding to the complexity, you need a detailed plan for each eventuality, and you need to practice. Here’s a list:
- Make a plan. The master plan needs to include all of the various permutations of a disaster response that might possibly affect your organization. That plan needs to be written down and available to everyone in your organization, not just senior management. After all, if you don’t make it through the disaster, someone needs to be able to execute your plan.
- Train employees on the plan and the response. Just having the plan isn’t enough. You need to run simulations of your disaster response. Your employees need to have access to the plan, and they need to know what part they are to play in the recovery.
- Practice your recovery plan frequently. Practicing the recovery plan means real practice. It does not mean coffee and donuts in the conference room while you chat about the plan, even if the donuts are really tasty. The practice needs to include your recovery partners, ranging from your recovery consultant (if you have one) to the local fire department. Everyone needs to know their place in the plan in order to carry out the assignments.
- Create a checklist. You should create a checklist tailored for each person or management function so everyone can carry out their part of the plan. The checklists should be available to each person or role, with copies both online and on paper.
- Document your practice sessions. A practice session uncovers things you missed. Documenting the changes is how you make sure to correct what you missed the first time.
Disaster recovery can be a daunting task, but it’s necessary for the survival of your organization. Break it down into small jobs, work with a team, and start with the parts that are the highest priority for your specific part of the organization.