7.11 Implement Disaster Recovery (DR) processes

  • Response
  • Personnel
  • Communications

There are two types of disasters – natural and man made.  We should think about the types of natural disasters that can occur in our area.  We should obtain assistance from the local emergency response when creating a disaster recovery plan because they will have more experience.

  • Natural Disasters Include

    • Earthquake

    • Flood or Tsunami.  A flood can happen when a dam fails or when there is a storm.

    • Storms

    • Hurricanes and Tornadoes

    • Fires

    • Volcanos

    • Pandemics

  • Man Made Disasters Include

    • Fires

    • Terrorism.  Sometimes an insurance policy won’t cover terrorism.

    • Bombings or Explosions

    • Power Outages.  Even if we have a UPS, the power outage might be long term, and/or we might not have fuel available to run the generator.

    • Network failure.  There should be redundancy so that the internet connectivity can continue operating.

    • Hardware or Software failure.  We should have redundant equipment.

    • Strikes/Labor Disruption.

    • Theft/Vandalism

Recovery Strategy

  • The business units that have the highest priority should recover first

  • We should identify the most critical business units and functions ahead of time

  • We might complete a Business Impact Analysis that tells us what types of failures we might have and what they will cost – this could include loss of revenue, loss of equipment, damage to reputation, and loss of business.  Some impacts are irreparable and cannot be quantified, but it is important to attempt to obtain a financial impact so that we can weigh our responses.

  • We should create a list of business functions and the resources required to recover them.  We should also determine the amount of time it would take to recover and the Maximum Tolerable Outage (MTO).  How long can we withstand an outage?

Crisis Management

  • The first rule of crisis management is to not let people panic.  It is more important than resolving the crisis itself.

  • Who is going to notice the emergency first?  People on the front lines.  They should have the training to detect and respond to the emergency and to alert others.  Training is important because people forget what to do during a crisis and they respond to instincts.  If the training is repeated enough, it becomes muscle memory.

  • Emergency Communications – the organization must be able to communicate with the public and with its customers, so that they know that the organization is still operational.  Employees need to be able to communicate with their employer and get back to work. 

    We have to remember that the natural disasters can disrupt the communications infrastructure.  We might set up a special website or hotline for employees to access during a crisis.

  • It helps to retain an experienced public relations firm.

Emergency Response

Some elements that the plan will have

  • When we recognize a disaster, what do we do?

  • Who will respond to the incident?

  • How much time do we have before we must evacuate or shut down the systems?

  • Are there checklists for managing the response?  Put the most important tasks at the top of the list

  • Who should we notify about the disaster?

    • Customers

    • Vendors

    • Management

    • Government agencies

    • Insurance company

Personnel

During the disaster, we must have a disaster response team.  The team consists of members, and each member has a specific role.  When developing the team, we should ask

  • What are all of the roles and what responsibility does each role have?  The decision-making authority should be clearly defined so that there is no confusion.

  • How do the different roles report to each other?

  • Is there a pathway for escalating issues?

  • Which person is assigned to each role?  Is there an alternate person for each role?

  • How can we ensure that each person is aware of their role beforehand? 

  • How can we ensure that each person has proper training to fulfill his role?

  • Are all the different departments and functions represented?

  • The health and safety of every person is paramount.  We might also need to ensure the health and safety of every employee’s family members.

Assessment

  • We evaluate how bad the damage is.

  • We determine the best course of action to put things back to normal.  Do we have the time, money, and/or resources to fix everything?

  • Some questions that management will have

    • How much will it cost?

    • How long will it take to repair?

    • Is there any irreparable harm?  For example, data that is lost forever, or people who were injured or killed?

    • Do we need to bring in experts or external vendors?

Restoration

  • We move our operations to a hot/warm/cold site if required and available.  Recovery means we are operational again (we might be running from a back up site).  We have a limited time to successfully implement the recovery and be operational or we will go out of business.

  • Restoration means we are back to normal.  We might go back to our old office or we might set up a new office.  As soon as we have recovered, we begin the restoration process.

  • The restoration process might be contingent on obtaining materials, personnel, or funding from our insurance company.

  • If the outage was caused by a natural disaster, and the disaster is ongoing, we need to wait until it subsides.

Training and Awareness

  • Every person in the organization should be aware of the disaster recovery process.

  • There should be some training given to each new employee, and refresher training given on a regular basis.

  • People who actively participate in the disaster response should have detailed training in accordance with their role.

Lessons Learned

Once we are back to normal, we need to have a meeting and ask some questions

  • What did we learn from the disaster?

  • What caused the disaster?

  • How can we prevent the disaster from happening again or mitigate its effects?

  • How can we detect the disaster earlier next time?

  • How can we respond to the disaster earlier next time?

  • What parts of our disaster recovery plan were effective?

  • What parts of our disaster recovery plan were ineffective or counterproductive? How can we improve our process?

  • Was the training provided to employees adequate?

  • Was the organization able to effectively communicate with its stakeholders?