Four Steps to a Successful DR Strategy

As often as not, disaster recovery expectations are unrealistic. Even with a well thought out disaster recovery (DR) plan, the ability of an organization to execute is suspect. The hope that DR can be executed to the defined RTO (recovery-time objective) and RPO (recovery-point objective) is often be a dream rather than an expectation.

The reality is that if a disaster should occur, nothing short of Herculean efforts by the IT staff would be required to have the slightest chance of getting back online in any reasonable period of time, much less the targeted RTO. But, don’t give up hope. You may be able to salvage your organization (and maybe your job) if disaster strikes.

This article provides a disaster recovery reality check and although it may be difficult to achieve, you can accomplish the DR vision; but you will need to understand four basic principles:

1. Establish the business requirements. Establishing defined requirements and expectations is vital to the success of any disaster recovery and business continuity initiative.

Disaster recovery focuses on how IT continues to operate while business continuance focuses on how the business continues to make money (e.g., operate). The two should be highly interconnected.

It is important to document the business requirements and agree on what are “needs” and not “wants.” Documenting and recording the defined requirements and expectations facilitates the proper communication and “handshake” between the planning and execution groups.

Make sure your recovery goals are realistic. Typically, the recovery goals are defined by the metrics RTO and RPO: how long can the company stand to be down and how much data can it stand to loose.

It is key to make sure that you have the appropriate resources to match the requirements. Otherwise, you might want to modify requirements to match resources. If all you have is a single tape drive to recover a significant application, establish and document that it will take four days to recover.

2. Deploy effective resources to meet the requirements. Effective resources are architected technology and trained personnel that are prepared for action; ready for deployment and operation.

From the requirements established in the first principle, the technology can be architected to meet these expectations. Failing this, the requirements should be modified, or additional resources should be acquired in accordance with Principle 4 (as discussed below).

The technology should be capable of achieving the desired results. If you are expecting to have a system up and running in a matter of moments hundreds of miles away, then tape may be the wrong choice for a technology. Likewise the technology should be configured appropriately.

Your staff should be trained to be ready in the case of a disaster. The specific actions and steps necessary should be clearly understood. I find it helpful if the groups expected to execute the plan are involved during the planning phase since it’s the little things that often make or break a DR plan such as, “How do I get to the relocation site if all the bridges/roads are closed?”

3. Ensure mature processes that succeed in making the resources effective. Developing mature management processes is critical to a successful disaster recovery plan. Mature management processes are well established documented tasks that are routinely reviewed. It is almost as simple as the old adage: Practice, practice, practice.

Keep the plan current. Constantly update the plan as factors change; new equipment added, new staff, new applications, amount of data, etc. A perfect plan from two years ago probably won’t do you much good today. I don’t know of any environment that some factor doesn’t change on a monthly basis

Test the plan and get formal sign-off documentation indicating that the tests were successful. If they were not successful, capture what did not work, then turn right around and do it again until you do get it right.

The test is the opportunity to feel confident and either know that it will work or understand what needs to be fixed. Remember, at some point, you won’t get another chance.

4. Institute cost accountability that provides efficiency. Cost accountability ensures that the appropriate economics are matched to the business value of the data.

We spoke earlier (Principal 1) about the cost of requirements. It is essential that the cost of the requirement does not outweigh the financial benefits it bears.

Some people say that DR is just too expensive. If the costs of recovery outweigh the benefit of the recover, maybe it is. Maybe it is cheaper to create some data from scratch.

The cost associated with the recovery from a disaster should be included in the total cost of ownership of the data. Only then can you tell if you are in balance between the necessary expenditure versus the financial benefit that the data will bring.

If in the utilization of an application you can, in essence, generate revenue of $1000 per hour and the DR costs run about $100 per hour, then the revenue is really $900 per hour.

If, on the other hand, an application can generate revenue of $50 per hour and the DR costs run the same $100 per hour, then careful consideration should be given to the value of the application.

Each of the principles discussed above must balance each other. Each principle cannot be successful without the others. What is the point of a process if there is not a set of objectives or business requirements it meets? What is the purpose of a resource, either personnel or technology, if it is not facilitating a process? And, of course, to each principle there is a financial aspect, nothing is free.

In closing, keep in mind that disasters come in all shapes and sizes; not all receive the press coverage of Hurricane Katrina. Servers fall to the floor; air conditioners fail; and multiple drives fail in a RAID configuration (it happens). The most common disaster is the power to the data center being disrupted because of a utility company digging in the area.

John Haight is a principal consultant at Glasshouse Technologies where he is responsible for providing storage strategies and technical solutions for key client engagements.