Before Trouble Strikes

In 1992 Hurricane Andrew put 39 major data centers out of commission. And in 1993 the World Trade Center bombing caused 21 data centers to shut down. While you don’t like to think about it, every organization, regardless of its size, runs the risk of a major systems outage, such as a tornado demolishing a data center or a building fire destroying the facility and everything in it. A study by the University of Texas found that 85 percent of businesses depend totally or heavily on information technology systems to stay in business, and that a loss of those systems would cost businesses up to 40 percent of their daily revenues.

Disaster can strike at any time. In fact, there are more than 35 types of disasters, ranging from the most common, such as power outages, to the most catastrophic, such as earthquakes. In essence, a disaster includes any type of interruption of service that results from some force beyond the organization’s control. Disaster recovery provides systematic procedures for how to react to and how to recover from that ominous external or internal force. Disaster recovery planning, which complements business continuity and contingency planning, ensures the ability of the organization to function effectively if an unforeseen event severely disrupted normal operations.

The following checklist will help the key individuals in your organization prepare a disaster recovery plan. The objective is to restore all critical business functions, rather than just such disparate functions as the data center.

Gather Information

Organize the Project
A successful initiative of this magnitude requires support from senior management associated with the organization, a dedicated disaster recovery team whose members have knowledge of critical business systems, and a well thought out planning and testing strategy.

Senior executives responsible for disaster recovery planning will perform the first two steps. The disaster recovery coordinator, working with the appropriate team leaders, should perform steps 3 to 7.

  1. Determine which senior executive(s) will have overall responsibility for disaster recovery.
  2. Have this executive appoint disaster recovery coordinator.
  3. Appoint a disaster recovery team leader for each operational unit, such as server backup or telephone system.
  4. Convene disaster recovery planning team and sub-teams as appropriate.
  5. Working with senior executives responsible for disaster recovery, the disaster recovery coordinator should identify the following:
    • Scope — the areas to be covered by the disaster recovery plan
    • Objectives — what is worked towards and what is the course of action that the disaster recovery team intends to follow
    • Assumptions — what is being taken for granted or accepted as true without proof?
  6. Set project timetable and draft project plan, including assignment of task responsibilities.
  7. Obtain senior management’s approval for scope, assumptions, and project plan.

Conduct Business Impact Analysis
The disaster recovery planning team should perform this step to identify which business departments, functions, or systems are most vulnerable to potential threats, what are the potential types of threat, and what effect would each identified potential threat have on each of the vulnerable areas within the organization.

  1. Identify functions, processes, and systems.
  2. Interview information systems support personnel.
  3. Interview business unit personnel.
  4. Analyze results to determine critical systems, applications, and business processes.
  5. Prepare impact analysis on interruption on critical systems.

Conduct Risk Assessment

The disaster recovery planning team should work with the organization’s technical and security person to determine the probability of each functional business units’ critical systems becoming severely disrupted and to document the amount of acceptable risk the business unit can tolerate. For each critical system, provide the following information:

  1. Review physical security, i.e. secure office, building access off hours, etc.
  2. Review backup systems and data security.
  3. Review policies on personnel termination and transfer.
  4. Identify systems supporting mission critical functions.
  5. Identify vulnerabilities, such as physical attacks, or acts of God, such as floods.
  6. Assess probability of system failure or disruption.
  7. Prepare risk and security analysis.

Develop Strategic Outline for Recovery

The steps outlined here provide all of the components necessary to perform a recovery. These steps will help pull together information about the operations of all systems, especially those owned or managed by non-technical managers with help from technical support personnel. Steps one through four mainly apply to functional business units that manage technology systems to process critical functions. The disaster planning recovery team and the functional business unit may wish to appoint other appropriate individuals to perform subsequent tasks.

  1. Assemble groups as appropriate for the following:
    • Hardware and operating systems
    • Communications
    • Applications
    • Facilities
    • Other critical functions and business processes as identified in
    • the Business Impact Analysis step.
  2. For each system/process above quantify the following processing requirements.
    • Light, normal, and heavy processing days
    • Transaction volumes
    • Dollar volume, if any
    • Estimated process time
    • Allowable delays (days, hours, minutes, etc.)
  3. Detail all the steps in your workflow for each critical business functions. (For example, for payroll processing include each step that must be complete and the order in which to complete them.
  4. Identify systems and applications.
    • Component name and technical identification if any
    • Type (online, batch process, script)
    • Frequency
    • Run time
    • Allowable delay (days, hours, minutes, etc.)
  5. Identify all vital records.
    • Name and description
    • Type (backup, original, master, history)
    • Where are they stored?
    • Source of item or record
    • Can the record be easily replaced by another source?
    • Backup and backup generation frequency
    • Number of backup generations available onsite and off-site
    • Location of backups
    • Media key, retention period, rotation cycle
    • Who is authorized to retrieve the backups?
  6. Identify if a severe disruption occurred what would be the minimum
    requirements or replacement of the critical function during the
    • Type (server hardware, software, research materials, etc.
    • Item name and description
    • Quantity required
    • Location of inventory, alternative, or off-site storage
    • Vendor/supplier
  7. Identify if alternative methods of process either exist or could be developed, quantifying on processing (include manual processes).
  8. Identify person(s) who support the system or the application.
  9. Identify primary person to contact if system or application cannot function as normal.
  10. Identify secondary person to contract if system or application cannot function as normal.
  11. Identify all vendors associated with the system or application.
  12. Document business unit strategy during recovery (conceptually how will the unit function?).
  13. Quantify resources required for recovery by time frame.
  14. Develop and document recovery strategy, including priorities for recovering system/function components, and recovery schedule.

Review On-site and Off-Site Backup and Recovery Procedures
The disaster recovery planning team should perform this task to provide for a current backup of critical program and data that can be used in the even of a disaster. To this end, the disaster recovery planning time can reduce downtime and speed recovery.

  1. Review current records (operating systems, code).
  2. Review current off-site storage facility or arrange for one.
  3. Review backup and off-site backup storage policy or create one.
  4. Present to functional business unit leader for approval.

Select Alternate Facility
The disaster recovery should perform the task of looking for a location, other than the normal facility, used to process data and or conduct business, in the event of a disaster.

  1. Determine resource requirements.
  2. Assess platform uniqueness of unit systems (Macintosh, IBM, Oracle, etc.).
  3. Identify alternative facilities.
  4. Review cost/benefit.
  5. Evaluate and make recommendation.
  6. Present to business unit leader for approval.
  7. Make selection.

Plan Development and Testing

Develop Recovery Plan
This document defines the resources, actions, tasks and data required to manage the recovery in the event of an interruption. The plan is designed to assist in restoring the business process within the stated recovery goals. The disaster recovery coordinator should perform these steps assisted by the disaster planning committee as needed.

  1. Objective — This may have been documented in the Information Gathering phase. Establish information for each business unit
  2. Plan Assumptions
  3. Criteria for invoking the plan:
    • Document emergency response procedures to occur during and after an emergency is declared for that business unit, and after the emergency check the building before allowing individuals to enter.
    • Document procedures for assessment and declaring a state of emergency.
    • Document notification procedures for alerting unit all senior management executives, disaster recovery team members, and business unit executives.
    • Document notification procedures for alerting business unit’s personnel of alternate location.
  4. Role Responsibilities and Authority
    • Identify disaster recovery team and business unit personnel.
    • Recovery team description and charge
    • Recovery team staffing
    • Transportation schedules for media and teams
  5. Procedures for operating in contingency mode
    • Process descriptions
    • Minimum processing requirements
    • Determine categories for vital records
    • Identify location of vital records
    • Identify forms requirements
    • Document critical forms
    • Establish equipment descriptions
    • Document equipment — in the recovery site and in the business unit
    • Software descriptions
    • Software used in recovery and in production
    • Produce logical drawings of communication and data networks in the business unit
    • Produce logical drawings of communication and data networks during recovery
    • Vendor list
    • Review vendor restrictions
    • Miscellaneous inventory
    • Communications needs — production and in the recovery site
  6. Resource plan for operating in contingency mode
  7. Criteria for returning to normal operating mode
  8. Procedures for returning to normal operating mode
  9. Testing and Training
    • Document testing data
    • Complete disaster/disruption scenarios
    • Develop action plans for each scenario
  10. Plan Maintenance
    • Document maintenance review schedule (yearly, quarterly, etc.)
    • Maintenance review action plans
    • Maintenance review recovery teams
    • Maintenance review team activities
    • Maintenance review/revise tasks
    • Maintenance review/revise documentation
  11. Appendices for inclusion
    • Inventory and report forms
    • Maintenance forms
    • Hardware lists and serial numbers
    • Software lists and license numbers
    • Contact list for vendors
    • Contact list for all staff with telephone numbers for home, work numbers, cell phone, and pager
    • Network schematic diagrams
    • Equipment room floor grid diagrams
    • Contract and maintenance agreements
    • Special operating instructions for sensitive equipment
    • Cellular telephone inventory and agreement

Test the Plan
Testing the plan enables the disaster recovery planning team to see how their recovery plan and procedures work in practice. It enables everyone to get a reasonable assurance that a plan will make the grade when it really counts — in an actual disaster.

  1. Develop test strategy.
  2. Develop test plans.
  3. Conduct tests.
  4. Modify the plan as necessary.

On-going Maintenance

Maintain the Plan

Disaster recovery plans can have a shelf life between six and 12 months depending on the changes in the organization’s procedures, systems, and personnel. Having a program in place to maintain the plan will ensure that everyone, especially the disaster recovery planning team, will be ready if a real emergency occurs.

The senior management executive responsible for disaster recovery assisted by the disaster recovery coordinator should oversee this step:

  1. Review changes in the environment, technology, and procedures.
  2. Develop maintenance triggers and procedures.
  3. Submit changes for systems development procedures.
  4. Modify unit change management procedures.
  5. Produce plan updates and distribute.
  6. Establish period review and update procedures.

Elizabeth M. Ferrarini is a free-lance writer based in Arlington, Mass. This story first appeared in CrossNodes, an site.