1.10 Understand and apply risk management concepts
- Identify threats and vulnerabilities
- Risk assessment/analysis
- Risk response
- Countermeasure selection and implementation
- Applicable types of controls (e.g., preventive, detective, corrective)
- Control assessments
- Monitoring and measurement
- Reporting
- Continuous improvement (risk maturity model)
- Risk frameworks
A risk is something harmful that could take place. Risk management is the procedure of identifying risks, determining their likelihood and the amount of damage that they could cause, and implementing strategies for avoiding those risks.
The level of risk that an organization can accept is called its risk appetite. The risk depends on the company’s culture, budget, size, and industry. We can’t identify all risks and we can’t prevent all risks. But we can implement a strategy to reduce most risks.
The first step is Risk Analysis, where we
- Identify each risk
- Determine the amount of damage that it could cause. We need to know what everything is worth so that we can determine how much we will lose when it is damaged.
- Determine the likelihood that it will occur
- Determine methods that we could use to reduce the likelihood that the risk would occur or the damage that it would cause. These are called mitigation techniques.
- Determine the cost of each mitigation technique
An Asset is an item of value that must be protected. An asset might be something physical or something intangible, including
- Equipment
- Tools
- Vehicles
- Buildings
- Trade secrets
- Business processes
- Data
- Software licenses
- Good will/reputation
When an asset is lost or damaged, our organization will either collapse, be unable to conduct its business, lose money, or spend money to replace the asset.
Each asset has a Valuation, which is the cost of the asset. Sometimes, it is difficult to put a dollar value on an asset (such as intellectual property, goodwill, reputation, trade secrets, or physical assets that can’t be replaced – an expensive painting for example).
A threat is an action that damages an asset.
- A threat could be something that we do or something that we neglect to do. For example, if we don’t lock the door to our warehouse and people come to steal from us, that is a threat. We neglected to lock the door.
- A threat could be big or small. A small threat could have big consequences.
- A threat might be accidental or on purpose.
- A threat could be natural (such as the weather) or caused by human error.
A vulnerability is a weakness in an asset that the threat uses to damage the asset.
Exposure is saying that the asset could be threatened. If the asset has a vulnerability and there is a threat that could use the vulnerability to damage the asset, then we say that there is an exposure. In the cybersecurity world, we might refer to exposures as exploits.
For example, if your house isn’t hurricane proof (vulnerability) and there is a hurricane (threat), then the exposure is the possibility that the hurricane will damage your house.
If your cell phone isn’t waterproof and falls into the water and gets damaged, the lack of waterproofing is the vulnerability. The threat is it falling into the water. The cell phone is the asset. The cost of to buy a new cell phone and copy the data is its valuation.
If your financial transaction isn’t encrypted and a hacker intercepts it and uses it to steal your money, the unencrypted communication is the vulnerability. The threat is the interception of your data and theft of your money. The asset is your money. The valuation of your asset is the balance of your bank account.
or in other words, Risk is the chance that a threat will exploit a vulnerability resulting in damage to the asset. We can reduce risk by reducing or eliminating the threat, the vulnerability, or both.
A safeguard is a tool that removes the vulnerability or that protects an asset from a threat. We need safeguards to reduce or remove risks.
When a risk takes place (when a threat exploits a vulnerability to damage an asset), we say that the risk has been realized. This is also known as an attack. A violation of our security policy is also an attack.
A breach is when a security feature has been bypassed. An intrusion is when an attack is used with a breach. That means that a threat actor has entered the organization’s secure system.
What are some threats?
- Viruses, and other types of malware
- Fraud, theft, and other crimes
- Hackers
- Environmental activists
- Rogue employees or contractors
- Natural disasters
- Poorly designed hardware or software
- Manufacturer defects
- Untrained staff
- User errors
- The government
- Temperature or humidity extremes
- Power surges
- Changes to the law or policies
- Theft
- Social engineering
A single person cannot come up with a comprehensive list of risks. Instead, multiple people from different departments and roles should contribute to the risk assessment. This allows for a diverse set of opinions.
Once we have a list of threats, we can perform a risk assessment. The Risk Assessment might be handled by specialists in the field or by management, but ultimately, the management is responsible for understanding the risks and the outcome of the risk assessment.
We can’t eliminate all risks, so instead we have to think about which risks we can accept, and which risks we can’t accept. We do this by understanding the consequences of each type of risk.
A Quantitative Risk Assessment assigns a monetary value to the loss of an asset. The result is that we have a dollar value for each risk. We can ask: How much money will we lose if the risk is realized?
We can follow these steps
- Perform an inventory of all our assets
- Assign an Asset Value to each asset (AV)
- Create a list of threats for each asset
- For each threat, we must calculate the Exposure Factor (EF) and Single Loss Expectancy (SLE)
- For each threat, we calculate the likelihood that the threat is realized in one year. We call this the Annualized Rate of Occurrence (ARO)
- We calculate the overall loss for each threat. We call this the Annualized Loss Expectancy (ALE).
- We find a way to mitigate each threat and calculate a revised ARO and ALE. We call these countermeasures or mitigation strategies. We might have multiple strategies for each threat.
- We perform a cost/benefit analysis for each mitigation strategy and select the best one.
The Exposure Factor or EF is the percentage loss that we would experience if an asset were damaged by a risk.
For example, if an asset is worth $100 and $50 worth of damage results, then the EF is 50%.
The cost (valuation) of the asset might be determined from
- The cost to purchase the asset
- The cost to maintain the asset
- The value of the asset to its owners/users/competitors/marketplace
- The market value of the asset if we tried to sell it
- The cost to replace the asset
- The benefit derived from the asset over its lifetime
The Single Loss Expectancy (SLE) is the cost of a risk against an asset.
If the asset is worth $100, and the EF is 50%, then the SLE is $50. That is, we expect to lose $50 in value when the threat occurs.
Annualized Rate of Occurrence or ARO is the expected frequency that a risk will occur in a year. If the threat is expected to occur once per ten years, then the ARO is 0.1. If the threat is expected to occur once per year, then the ARO is 1. If a threat occurs more than once per year, then the ARO might be higher than 1.
Annualized Loss Expectancy or ALE is the yearly cost of the threat. For example, if the SLE is $50 and we expect to have 10 losses per year, then the total ALE is $50 x 10 = $500. We would expect to lose $500 per year.
If the SLE is $50 and we expect to have one threat every ten years, then the total ALE is $50 x 0.1 = $5. That means that we would expect to lose $5 per year. We wouldn’t actually lose $5 per year – we would lose $50 all at once when the threat takes place. But from an accounting perspective, we need an annualized rate so that we can weigh our response.
It is not easy to calculate the EF, SLE, ARO, and ALE for each threat, especially in a large organization. We must use special software to estimate and track each threat.
A Qualitative Risk Assessment is the process of predicting the impact of a risk by relying on expert opinions and experience. A qualitative risk assessment is subjective.
There are many Qualitative techniques
- Brainstorming
- Delphi technique
- Storyboarding
- Focus Groups
- Questionnaires
- Checklists
- Interviews
A Scenario is a story about a threat taking place that we can use to role play. The Scenario tells us what the threat is and what effect it would have on one of our assets. We give the Scenario to a number of people and have them prepare responses, which include the threat level and potential losses. We can take the responses from the different participants to develop a detailed risk analysis. The more people who participate, the more accurate the response.
The Delphi technique is another method for achieving consensus. We gather a group of people and invite to anonymously respond to a scenario. If we don’t have a consensus, then we take the results and present them to the group. The members each produce another anonymous response. We continue this process until we reach a consensus.
For any risk, we can do the following
- Accept the risk – do nothing. The cost of safeguarding against the risk is more than the cost of the risk. If we choose to accept a risk, we must document the reasons for accepting it. The level of risk that an organization is willing to take is called the Risk Appetite.
- Reduce the risk – take action to reduce the damage that the risk will cause or the likelihood that it will occur.
- Transfer the risk – move the risk to another person or party. For example, insurance allows us to transfer the cost of the risk to the insurance company.
- Deter the risk – take measures that make the risk less likely to occur. Risk deterrence usually is effective on hackers and criminal threats. For example, a security camera is likely to deter a theft.
- Avoid the risk – engage in alternative activities where the risk is not present. For example, joining a meeting via phone instead of driving to the meeting avoids the risk of getting into a car accident.
- Reject the risk – do not accept the risk. This means that we deny that the risk will happen. For example, a risk that has an ARO of 1% might be rejected.
Once we have implemented the countermeasure, the residual risk is the risk that is left over. If we have a residual risk, that means that the cost of protecting against the risk is more expensive than the risk itself.
The Total Risk = Threats * Vulnerabilities * Asset Value
The Residual Risk = Total Risk – Controls Gap
We must repeat the Risk Management process often. The risk assessment represents a point in time. It is the risk profile of the organization at the time that the assessment was conducted.
A safeguard is something that mitigates the risk. The safeguard might reduce the risk that a threat can damage the asset. It reduces the ARO but keeps the EF the same, because if the threat penetrates the safeguard, it will still cause the same damage to the asset.
Or it might reduce the damage that occurs when the threat is realized. It keeps the ARO the same but reduces the EF.
Or it might reduce both the risk that the threat can damage the asset and the amount of damage. It reduces both the ARO and the AF.
When we are deciding whether to install a safeguard, we can make two calculations. First, we calculate the ALE before the safeguard, and then we calculate the ALE after the safeguard.
For example, if the ALE is $1000/year before the safeguard, and $100/year after the safeguard, then the safeguard will provide $900 of protection per year. We must use the safeguard if it costs less than $900 per year.
If the safeguard costs us $4500 up front, then it would pay for itself after five years ($900 per year x 5 years = $4500). If the safeguard has a lifespan of two years, then it would not be cost effective (because it is $2250/year).
Even if the safeguard is not cost effective, maybe we are required to use it to comply with a law or regulation. For example, a safeguard that protects human life might be required no matter the cost.
We should also think about the possible damage to the company’s reputation, which might be more than the value of the asset.
The cost of the safeguard comes from
- The upfront cost to purchase it
- The maintenance cost
- The operating cost
- The labor cost for employees to install, monitor, maintain, and repair it
We call this a Cost Benefit Analysis:
The ALE Before the Safeguard – ALE after the Safeguard – ACS (Annual Cost of Safeguard) = Value of the Safeguard
We might have several safeguards to choose from. We might select the safeguard that provides the best value, or we might select the safeguard using other factors. We might use a combination of safeguards.
Some factors we might consider
- The cost of the safeguard must be less than the damage caused to the asset
- The safeguard should make the cost of an attack larger than the benefit obtained from the attack
- The safeguard must protect a real asset that has a threat
- The safeguard should not provide protection through secrecy or obscurity
- We should be able to test and verify the safeguard
- The safeguard should not depend on other devices or technology
- The safeguard should not require human intervention
- We should be able to implement safeguard that fail safe or provide redundancies
- A safeguard can be physical, logical, or administrative
A control is a mechanism that is used to prevent a behavior. There are different types of controls. We might use a control to protect our assets. Controls prevent unsafe, illegal, or undesired behaviors. From a safety perspective, the best control is one that physically removes the hazard.
All controls can be bypassed. There should always be an administrative control, which provides legal consequences for violating or damaging a Technical or Physical control. Undesired behavior is a risk, and the use of a control reduces the organization’s risk.
NIST Special Publication 800-53 revision 4 lists 600 controls in 18 categories and is an excellent reference. We will not cover these controls in detail.
A deterrent control is a method that discourages a behavior. For example, a user could be fired for sharing sensitive data. The deterrent control does not prevent the user from engaging in the activity, but it makes the consequences of that activity discouraging. The organization should consider the benefit that an undesired behavior will bring to the perpetrator and implement consequences that are greater than the benefit. Deterrents do not work well by themselves because there are always people who do not expect to get caught.
A preventative control is one that stops a user from engaging in a specific behavior. For example, elevator doors close and lock when the car is moving so that people do not fall into the shaft. It is physically impossible to open an elevator door while it is moving (take my word for it). Encryption prevents an eavesdropper from reading your confidential conversation.
People will try to break preventative controls. People try to pick locks and break windows all the time. The cost of the preventative control must be weighed against the asset that it is supposed to protect. A more expensive control takes more effort to bypass.
Preventative controls can be installed in layers. For example, a locked server room, inside a locked building, behind fence with a locked gate has three layers of preventative controls. Even if one layer fails (the thief breaks the gate or the administrator leaves the server room unlocked, for example), the other layers will continue to protect the asset.
A Detective control only detects undesired behavior. It does not deter or prevent the behavior. It is useful when the organization wants to monitor behaviors. A detective control allows an organization to respond to undesired behavior.
The organization may follow up with individuals who engaged in the undesired behavior. The organization may have many violators and may want to monitor trends to better address the problem. Or the cost of a preventative control might be too expensive.
For example, the city installs a camera at an intersection to catch speeding motorists, who are later fined. Drivers who speed too often lose their licenses. The camera does not stop people from speeding. The fine could also be considered a deterrent control.
An alarm with a siren and a motion sensor is a better example of a detective control. If an intruder passes by the motion sensor, the alarm is triggered. The alarm does not prevent the intruder from trespassing, but it may alert a security guard of the violation so that he can respond and apprehend the individual.
If the intruder knew about the presence of the alarm, he may be reluctant to trespass. Thus, an alarm could also be a deterrent. Most detective controls are also deterrents.
A corrective control is one that reverses a behavior. For example, a door with a spring-loaded hinge is a corrective control. If a user leaves the door open, the hinge will automatically close it.
A corrective control may reverse the behavior quickly or slowly. A backup of a storage appliance is a corrective control. If the storage appliance fails, the data can be restored from backup.
A recovery control is a more advanced corrective control. It can include disaster recovery systems and redundant systems.
A directive control confines the actions of a person to a limited space. It includes security policies, monitoring, and procedures.
A compensating control counteracts a behavior. If the actual control is not available, or if the organization is not able to implement the original control because of a legitimate technical or business restriction, then the organization will implement a compensating control, which
- Meets the original intent of the requirement
- Provides similar levels of control as the original requirement
- Does not cause additional risk to the organization
If the organization is unable to implement a valid control, then they may need to stop the activity.
A fire suppression system is an example of a compensating control. It won’t deter, prevent, or detect the fire, but it will reduce the damage that the fire causes (and create a flood in the process).
Another example is a rescue plan for a person working in a confined space. Confined spaces are dangerous because there is a potential for high levels of toxic gas build up, a lack of oxygen, and/or an explosion. Confined spaces exist in manholes, sewers, oil wells, mines, and many other places. Sometimes work must be performed in these places. By law, when an organization sends a person into a confined space, a dedicated rescue team must be standing by to pull him out should the conditions warrant it. The organization could not prevent the risky conditions, so they created a compensating control. If they could not assemble a rescue team (the control), they would not be able to send a worker into the confined space.
A technical control is also known as a logical control. The technical control does not physically prevent a person engaging in a behavior, but it might technically prevent him.
A technical control can be bypassed if it contains a security vulnerability. It should be backed up by an administrative control. When a technical control is operating correctly, it can be as strong as, or stronger than a physical control. If you store sensitive data on a hard drive, and then encrypt that hard drive with BitLocker or the RSA algorithm, and then store the hard drive in a safe, you have used a physical control and a technical control. A thief might be able to break the safe, but he won’t be able to defeat the algorithm.
Technical controls include access control system, a firewall, an access control list, or a malware detector.
An administrative control is one that is established in policy. It is not physical.
For example, an employee could be fired if they violate a policy.
A physical control physically prevents a user from engaging in a behavior. For example, storing sensitive data in a locked filing cabinet would prevent a user from accessing or sharing sensitive data stored within.
A physical control can be bypassed if there is enough brute force. The physical control should be backed up by an administrative control so that there are consequences.
A Security Control Assessment or SCA is an evaluation of the security infrastructure against a baseline. We use the SCA to verify that the controls are effective. A specific policy for implementing an SCA is defined by NIST 800-53A, Guide for Assessing the Security Controls in Federal Information Systems.
If we can’t measure or verify that a security control is working, then it is not actually working.
A Risk Framework or Risk Management Framework (RMF) is a guideline for how our organization will monitor risks, assess risks, and resolve risks.
An important RMF is defined by the NIST 800-37.
The NIST RMF provides
- Categorization
- Security control selection, implementation, and assessment and monitoring
- Real-time risk management
- Robust continuous monitoring processes
- Data for executives to make decisions based on risk
- Ability to integrate security information into the enterprise data systems
- Establishment of responsibility within the organization
- Accountability for security controls
- A link between the information system and the organization level
- A process to categorize risk management information
- A process to select, implement, assess, authorize, and monitor a set of baseline security controls
ISO 31000 provides guidelines for risk management
- The framework should be customized to the organization
- Stakeholders should be included
- The framework should be comprehensive
- The framework should be integrated into the organization’s activities
- Risk management should be dynamic – it should respond to changes
- We should identify the limitations of any information we use to evaluate risks
- We should consider culture and human factors when conducting risk management
- We should strive to continually improve our risk management
COBIT or Control Objectives for Information and Related Technology is a risk management framework. It helps us align our IT goals with our business goals.
RiskIT is another risk management framework. It has three domains
- How to develop a risk management governance
- How to evaluate risks
- How to respond to risks