by Paddy Falls, CTO, Neverfail
We’ve all heard that “April showers bring May flowers.” But unfortunately, where Mother Nature is concerned, that’s not always the case. Too often, April showers spawn more threatening weather incidents such as severe storms and tornadoes. Once June rolls around, there is also hurricane season to contend with.
Business continuity and storm season
From a business continuity perspective, the start of storm season is a good time to think about how well your business is prepared in case of disaster. Of course, statistically, your organization is much more likely to be disrupted by the “little-d” disasters such as server crashes, network outages or even, yes, routine maintenance. But fair or not, it is the “Big-D” disasters such as tornadoes, hurricanes and earthquakes that make businesses sit up and take notice, and spur them to take action to ensure they are prepared, should the unthinkable strike.
If you don’t have a business continuity plan in place, put it on your immediate to-do list. If you already have one, think about the last time you updated it. If it was more than a year ago, it is probably time to update it, or at least revisit it to ensure its relevance.
Even if you last updated your business continuity plan six months ago, have you installed new servers since then? Or made other significant changes to your network infrastructure, such as introduced virtualization? If you have, then you need to take another look at your plan to make sure you are adequately protected.
As you think about business continuity for your company, it’s important to keep the following in mind:
- Take a pre-emptive approach to disaster recovery (DR) planning;
- Define priorities;
- Ensure solutions can protect applications no matter the network environment; and
- Test the failover process.
Let’s take a look at these four areas in some more detail.
#1: Define your priorities in a disaster
Think about what business continuity means to your company. Does it mean getting all your data back if something goes wrong? Or does it mean ensuring that all of your business applications stay up and running? Too many disaster recovery plans are storage-centric and only focus on data rather than being business-focused on the applications.
For those applications that are core to your business, what you really want is to be able to continue working as if nothing ever happened. At the other extreme, for some less critical back office services, it may be sufficient to not lose more than eight hours of data and have it back up within a day.
What this means is that you need to prioritize the applications in terms of both potential loss of data, and time to becoming available again. The formal terms for these two business metrics are recovery point objective (RPO) and recovery time objective (RTO). An RPO of one hour means you could lose up to one hour’s worth of data for that application. An RTO of four hours means it could take up to four hours to make the application available to end users.
One reason for prioritizing your applications according to their RPOs and RTOs is that the cost of solutions for lower RPOs and RTOs is greater. A typical rule of thumb is that 80 percent of business continuity investment is spent on 20 percent of the most critical applications.
One great way to define your priorities is to put together a “roundtable” of employees from different areas of your company. What is important to management? To accounting? To sales? To IT? You should focus on getting a measure of the potential financial loss to the business for every hour of data that is lost for an application, and every hour of downtime.
One of the reasons for having broad representation is that it’s not only direct costs such as lost revenue from sales, but also indirect costs such as loss of reputation that need to be considered.
The outcome of this analysis is a grouping of your business applications into tiers representing their business criticality. For each tier, different RTOs and RPOs define the level of protection required for any failure, including a disaster. For example:
- Tier-1: < 10 seconds RPO, < 5 minutes RTO
- Tier-2: <15 minutes RPO, < 1 hour RTO
- Tier-3: < 1 hour RPO, < 4 hours RTO
- Tier-4: < 8 hours RPO, < 24 hours RTO
#2: Take a preemptive approach to DR planning
Perhaps you don’t work in Tornado Alley or along the San Andreas Fault, so you think the chance of a “Big-D” disaster is extremely remote. While that may be the case, there is a very high chance that your business will be impacted several times a year by the “little-d” disasters such as software failure, network outages, server crashes or routine maintenance. The reality is you need protection from both the “Big-D” and “little-d” disasters, because either can be equally costly to your business.
#3: Ensure solutions work no matter the environment
Virtualization has taken off in the last couple of years. In fact, Gartner estimates that by 2014, 60 percent of server workloads will be virtualized. That’s a five-fold increase from just four years ago.
What this means for your business is that you need to make sure that your business continuity solutions work whether your environment is physical, virtual or a hybrid.
Many companies that are virtualizing still have physical servers for Tier-1 applications, due to issues of either vendor support or performance. Also, over time companies will tend to deploy different virtualization technologies. Some may be using VMware for some of their Tier-1 and Tier-2 applications, but then find that it’s simpler to use the Hyper-V virtualization that comes with their Windows servers for Tier-3 and/or Tier-4. If this is true for your company, it’s important that any DR solution extends across physical and any virtual environment in the same comprehensive manner.
#4: Test the failover process
A disaster recovery solution is no good if it doesn’t work as planned in a disaster. Or if your staff doesn’t know how to properly execute your DR plan. Remember all those fire drills you used to have in grade school? Your business needs to do the same with your disaster recovery plan and regularly test the failover process to ensure it works.
If your solution isn’t working as planned, you need to figure out why and how to get it fixed so you’re not caught off guard if something goes wrong. Or if the process takes too long, it’s important to figure out solutions to shorten the recovery window. Every minute that your business is down is lost revenue and productivity, so you can’t afford to be down for more time than anticipated during a disaster.
A good rule of thumb is to test your disaster recovery plan quarterly in order to ensure that everything is working smoothly and you can execute it when you need to.
The steps outlined above are critical steps that any company should take in business continuity planning. While the chances of your business being impacted by a major disaster might be relatively small, the chances of your organization being down for any period of time are quite high and happen relatively frequently.
Business continuity planning should be one of the first initiatives any company should undertake. Much like insurance, you hope you’ll never have to deploy your plan, but you’ll certainly be happy it’s there when disasters, of whatever size, strike.
Paddy Falls joined Neverfail, a business continuity vendor, in May 2007 bringing 30 years of software industry experience. As group CTO, Falls is responsible for Neverfail’s technology development and product roadmap. Falls most recently served as the CTO and Co-Founder of UK-based iOra, a division of Corpora, which provides accelerated access and replication services to remote servers and laptops. Prior to iOra, Falls was a general manager at Novell’s European Development Center, where he led a team that built Novell Replication Services.