Managing the Costs of System Downtime

A million dollars an hour. That’s what IT system downtime costs American business, according to a 2002 keynote address by the META group. It’s a stunning figure until one considers the degree to which modern business relies on IT systems. Much of the value of these systems derives from mountains of business intelligence stored in today’s sophisticated data warehouses.

Companies that once used their data warehouse only to support strategic business analysis and reporting have turned to enterprise data warehousing, where the warehouse supplies data to mission-critical business applications that support nearly every core business function. Sales and service, securities trading, supply chain management, call centers — in short, the full range of revenue-producing and customer service-related functions — all now rely on having the data they need at the fingertips of individuals throughout the enterprise.

The most common type of downtime is planned, which generally causes minimal impact to productivity and profitability as it can be scheduled in such a way as to be cost-effective. But when the system is unavailable for external reasons such as natural disasters and power outages, companies are at enormous financial risk. Lost revenue, reduced employee productivity, and regulatory penalties can all contribute to direct costs, to say nothing of the impact on customer satisfaction and the reputation of the business.

In that context, a million dollars an hour sounds about right, though it remains a disturbing number. As such, it makes clear the need for companies to protect their critical business systems from events that cause system unavailability or unacceptable performance constraints.

The Three Core Challenges of Business Continuity

There are three main challenges about which every company that relies on a data warehouse should be concerned.

The first is system and data availability. Planning for availability is about all the things a company does to prevent an outage from occurring, thereby ensuring ongoing availability during events that might otherwise cause an outage. Because the stakes are so high, companies must ensure that when downtime of any kind occurs — either planned or unplanned — the elements of the system and the data that cannot afford to be down remain operational. Just as a standby generator can prevent the loss of power during an outage, multiple, redundant failover systems can help maintain system availability in the face of challenging environmental factors.

The second concern is system recoverability. The ability to rapidly recover from an outage caused by a natural disaster or other unplanned event is critical to minimize the negative impact on the business and the resulting financial consequences.

If your company’s system is lost in a flood or a hurricane, how quickly can you recover access to your data and continue business operations? What are your customer’s expectations? Can they wait for weeks for your business to recover? Will they?

The third concern is performance continuity. Companies must maintain adequate customer service levels during component failures and during peak processing periods to ensure optimal levels of processing capability is available.

If the system is “up” and running but critical applications are not functioning can it effectively respond to the businesses’ needs? Yet at many companies, when part of the system has an outage the performance throughout the rest of the system degrades. How many companies can afford to run their business at half of normal processing power for an extended length of time?

How much weight to give each of these concerns will differ from one company to the next. For example, if a business can withstand an outage of a few days — and if it does not depend heavily on the warehouse for core business functions — then perhaps a shared system in a recovery center in a remote location is the best solution.

Assessing the Needs

The key is to understand the cost of downtime and degraded performance to the organization. Numerous questions must be asked and answered, including:

How long can your business withstand an outage before revenue or another critical area is affected? Some companies may need system recoverability in what they call real-time — within minutes. Others need to be up within hours, and still others might be able to withstand an outage of several days or more.

Do all of your applications affect your core business functions equally? Or are there some clear priorities that you can map and feed into a solution alternative?

How complex is your computing environment?

What are the financial constraints you need to work within?

Solutions to these issues do exist and each one carries its own risk factors and individual price tag. Deciding which to use is a process that should be undertaken after thorough risk-benefit analysis. Also consider whether the solution can be tailored to fit your particular needs.

Whether you choose an off-site recovery center, a redundant system, a Back-up and Recovery (BAR) system, the costs of business downtime are an issue that cannot be ignored.

Bob Manning is marketing manager of Teradata, a division of NCR Corp.