1.8 Identify, analyze, and prioritize Business Continuity (BC) requirements
- Develop and document scope and plan
- Business Impact Analysis (BIA)
Business Continuity Planning or BCP means evaluating risks to our organization and attempting to reduce those risks. It allows us to continue operating in the event of a disaster.
There are four steps
- Scope and Planning
- Impact Assessment
- Continuity Planning
- Approval and Implementation
We can think about
- Analysing the business to determine the areas that require protection
- Creating a team to develop BC policies
- Determine the resources available to protect the business
- Evaluate the laws and regulations that govern the BCP response
When we evaluate the business, we should think about
- Operational functions that provide products or services to the organization’s clients
- Support functions such as IT, HR, and legal
- Security teams and first responders
- Senior executives and board members
- Contractors and vendors
- Branch offices
Once we know the people and areas that we want to protect, then we can figure out how we are going to protect them. We should review this evaluation to ensure that we did not miss any areas. Of course, each person or department will insist that they are essential, and that they are the most important part of the business, but we need to be objective.
We should develop a BCP team that is comprised of individuals from all areas and levels of the business. Why? Those people have specialized knowledge about their own operations that are necessary for a successful plan. The CEO knows how to run the business, but he won’t know specifics about each department.
Also, people on the front lines must be aware of the plan, so that they are prepared when it comes time to implement it. Who should be part of our team?
- People from each operational department
- People from each support department
- IT experts
- Cybersecurity experts
- Physical security experts
- Facilities management
- Legal department
- Human resources
- Public relations
Each person brings their own point of view, experiences, and biases. Obviously, each person will consider their role or department to be the most vital. That means that the team must have effective leadership. Each person on the team must participate effectively, especially senior executives. If they don’t take the process seriously, then nobody else will.
What resources do we need to perform BCP?
- Most resources are consumed during implementation, but we need some resources to plan BCP and select a team
- We need many resources when there is a disaster
- The most important resource is the people and their time. Pulling people off their existing jobs and having them work on BCP costs money and reduces productivity. That means we need to think about the cost of the plan and the cost of the labor to develop the plan and compare it against the potential benefit.
What are the legal requirements? Do we have to make a plan?
- We might be required by federal or state law to implement a BCP
- We might have a fiduciary duty to our shareholders or investors to implement the BCP
- Our organization might affect the life or safety of others – for example if we’re a healthcare provider, emergency response service, electrical power grid, bank, or pharmaceutical manufacturer, then we have a moral obligation to continue operating.
- We might have a contract or service level agreement to provide services to our clients – for example, we might provide internet or web hosting services. We would lose money if we didn’t provide the services at an acceptable level (and the loss could bankrupt us).
- Having a good BCP can help us gain new clients. A prospective client might check that we have a good BCP before purchasing services from us. They will want to make sure that they can depend on us no matter the circumstances.
- Lawyers should always be involved because they can help us verify the legal and regulatory requirements
A Business Impact Assessment or BIA
- identifies the resources that are critical to our organization
- identifies the threats that each resource faces
- determines the likelihood that each threat will occur
We can use the BIA to determine the types of measures that we should implement to protect our organization, and the areas that we should prioritize. How do we do that?
- We need to think about the support structures that we use every day. We can create a list of business processes (for example, IT, finance, HR, etc.) and then rank them by importance.
- Each team member can be assigned a business unit to rank – we should gather data from the entire business
- We then give each asset a value, called an Asset Value or AV
- Then we think about how much downtime we can tolerate for each asset before the business is harmed. This is called the Maximum Tolerable Downtime or MTD. We might also call it the Maximum Tolerable Outage or MTO.
- The second metric is called the Recovery Time Objective or RTO. This is the length of time it would take to recover the asset.
- The third metric is the Recovery Point Objective or RPO. This is the amount of data loss that we can tolerate.
If we can tolerate a loss of up to one day worth of data, then we should create a back up every day. That means that at most, we will lose one day worth of data.
If we can tolerate a loss of up to one month worth of data, then we should create a back up every month. That means that at most, we will lose one month worth of data.
- When we know the MTD and the RTO, we can come up with a recovery plan. That means, the time it takes for us to notice that an asset is down, and the time that it takes for us to put it back to normal must be less than the Maximum Tolerable Downtime.
- For example
- The loss of power to our office will cause $1,000,000 in damage, that is the asset value.
- If our business can tolerate a downtime of 24 hours, that becomes our MTD.
- Now, if it would take 48 hours to restore power in an outage, we have an RTO of 48 hours.
- That is bad because our business would collapse in the event of a power outage. We can only tolerate a downtime of 24 hours.
- Our plan might be to add a power generator or some other type of equipment so that we can quickly restore power and change that RTO to less than 24 hours.
- The loss of power to our office will cause $1,000,000 in damage, that is the asset value.
There are two types of risks
- Natural risks like tornadoes and earthquakes
- Man-made risks like war, theft, fire, and network outages
We should make a list of all the risks faced by our organization.
Once we have a list of risks, we should calculate the likelihood that each risk will occur. This is called the Annualized Rate of Occurrence or ARO. What is the percent chance that this risk will occur in a year (or how many times this year?)? We can use historical data, expert judgement and our experience to calculate this.
Finally, we should calculate the impact that each risk would have. How much damage will the risk cause if it occurs?
- The Exposure Factor (EF) is the percent damage that a risk will cause to the value of an asset. It the asset is worth $100 and $50 of damage is caused, then the EF is 50%
- The Single Loss Expectancy or SLE is the dollar amount of damage that the risk will cause each time that it happens. In the above example, it is $50.
- The Annualized Loss Expectancy or ALE is the loss that we expect to see each year. We calculate ALE by multiplying SLE and ARO.
For example, if the SLE is $50 and the risk takes place once per year, then the total ALE is $50. If the SLE is $50 and the risk takes place twice per year, then the total ALE is $100. If the SLE is $50 and the risk takes place once every two years, then the total ALE is $25.
When we have an ARO that is less than one year, we might need to factor it in to our budget over multiple years.
We should think about the reputation and goodwill that would be harmed by a disaster and the negative publicity that would result from a disaster. Sometimes we can’t put a monetary value on these items.
Once we know where the harm is, we can allocate specific resources to the assets with the highest priorities. We might not have enough resources to protect us against all the risks.
The Strategy Development phase is when we take the ideas we developed during our analysis and develop a strategy. Remember that we cannot possibly prevent every possible risk. We need to think about how to protect against unacceptable risks. There are three types of assets
- People – we need to make sure that people are always safe even during an emergency. Once the people are safe, we can focus on having them return to operating the business. In the event of a disaster, the people need to have access to supplies and food.
- Buildings – if we need specific buildings and equipment to perform our role, like factories, call centers, or warehouses, then we need to protect them. We can think about ways to make our buildings strong enough to withstand disasters.
If we can’t make our building strong enough, we can find another building, called an alternate site, where we can operate in case our existing site is harmed.
- Infrastructure – we need to think about the equipment that we use such as computer servers. How do we protect them? We can physically harden them through redundant power, security systems, and fire suppression systems.
We can also provide alternative redundant systems.
Once we’ve figured out the design, we need to obtain approval from senior management – from the highest possible person such as the CEO. This gives us the commitment of the organization and gives the plan credibility. It shows that the organization is taking the recovery plan seriously.
Once the plan is approved, we need to implement the plan by deploying the required resources.
Training is an important part. Each person who is involved will be trained in their role and might also need an idea of the bigger picture. At the minimum, each person in the company should receive a summary of the plan. This gives them assurance that the organization is prepared to protect them.
For every task in the plan, a specific person should be designated as the responsible person. We must make sure that each person has a back up person. We should evaluate the main person and the back up to ensure that they know how to perform their tasks.
The plan must be documented at the end. Why? We will have a specific document that can be referenced during an emergency so that we know exactly how to act. We also have a record that reminds us of the reasons for implementing the plan.
The goal of the plan is to make sure that the business continues to operate in the event of an emergency. What does the plan contain?
- A Statement of Importance, which tells the reader how important the plan is to the business. This statement might be a letter signed by the CEO.
- The Statement of Priorities tells us the list of business functions that must be given priority. This prevents fighting between departments in the event that the business is unable to provide resources to all of them.
- The Statement of Organizational Responsibility reminds us that the business is responsible for maintaining continuity
- The Statement of Urgency provides a timeframe for implementing the plan
- The Risk Assessment tells us the risks that we considered when developing the plan and their metrics (AV, EF, ARO, SLE, and ALE)
- The Risk Acceptance section provides a list of risks. For each risk, if it acceptable, we write down the reasons why it is acceptable. If it is unacceptable, we write the process for preventing or managing the risk. When we say that a risk is “acceptable”, that means that we are aware of the risk, but aren’t doing anything about it, because the harm that the risk would cause is lower than the cost of preventing the risk, or because the likelihood of the risk occurring is so low.
- The Vital Records provides a storage location for the business records and the type of records to be stored. We should think about what types of documentation are necessary to keep the business operating. We might have multiple copies of the vital records.
- Emergency-Response Guidelines tells us the first steps we will take to respond to an emergency. That includes security and safety response, fire response, and notification. It also includes secondary response procedures and a list of people who must be notified in the event of an emergency.
- Maintenance procedures tell us how to maintain the documentation. We must continue to review and revise the Business Continuity Plan so that it remains valid. We should destroy any older versions of the BCP so that people don’t get confused or use the wrong one in an emergency.
- Testing procedures tell us how to test the plan regularly so that we know it is valid.