2.1 Explain the importance of security concepts in an enterprise environment

Configuration Management
- Diagrams
- Baseline Configuration
- Standard Naming Conventions
- Internet Protocol (IP) Schema
Data Sovereignty
Data Protection
- Data Loss Protection (DLP)
- Masking
- Encryption
- At Rest
- In Transit/Motion
- In Processing
- Tokenization
- Rights Management
Geographical Considerations
Response and Recovery Controls
Secure Sockets Layer (SSL) / Transport Layer Security (TLS) Inspection
Hashing
API Considerations
Site Resiliency
- Hot Site
- Cold Site
- Warm Site
Deception and Disruption
- Honeypots
- Honeyfiles
- Honeynets
- Fake Telemetry
- DNS Sinkhole

Configuration Management

Change Management is the process for managing changes to assets and systems. Change management is also known as configuration management (configuration in the engineering sense, not configuration in the computer sense).

Consider a petrochemical refinery with one million valves, pipes, sensors, and wiring. Each item in the plant is documented so that if it fails or requires maintenance, the refinery knows exactly what it is they are replacing and where it is. Imagine if you had to drill into a pipeline and you didn’t know what it was carrying? Imagine if somebody took a water pipeline and replaced it with a flammable gas pipeline but forgot to document the change? Another worker came along later to perform work on this pipeline, but didn’t have the correct tools or safety equipment, because he assumed that the pipeline was full of water. This is an overly simplified example, of course, because pipelines are labeled with their contents. But people have drilled into pipelines that were full of steam or gas, even though the documentation said that those pipelines were empty. We need to prevent this.

In a large organization, a single employee cannot make a change. He must seek approval from a committee, known as the Change Control Board. The CCB decides whether a change is approved or denied. If it is approved, the CCB ensures that the employee who performs the change does so in accordance with the organization’s policies. When the change is complete, the CCB documents the change.

The PMP (Project Management Professional) Body of Knowledge covers Risk Management and Change Management in greater depth.

In an IT environment, change management applies to network hardware configuration, switch configuration, security policies, the physical location of infrastructure, and many other items.

We need to be able to keep track of the configuration of each device. We need to have a framework for requesting, approving, and implementing a change to any configuration. That could include a centralized application or database. That way, when we want to replace, upgrade, or troubleshoot a device, it is well documented.

The configuration management should be supplemented with a diagram that shows where each device is physically located and how they are logically connected.

The Baseline Configuration shows us the way that a device is normally configured. We can compare the baseline configuration against the current configuration to determine whether any changes have been made. We might have a different baseline for each point in time.

When we want to make a change to the configuration, we are making the change against the baseline. People who are responsible for approving the change compare it against the baseline to see what will be affected. The change may affect only one device or all the devices. When it affects all the devices, then we might be creating a new baseline.

We should give each device a standard name so that we can identify it. The name of the device should be the hostname of the device and should also be physically labelled on the device. By reading the name, we should know

What type of device we are looking at, whether it is a router, a switch, a server, a laptop, a desktop, a wireless access point, or a camera
The physical location of the device, if we have multiple locations
Some other information as deemed necessary to give the device a unique name or to further identify it.

For example, an organization with offices in Edmonton and Calgary names their routers

rtr-edm-001
rtr-cal-001

“rtr” tells us that the device is a router, “edm” tells us that the router is in Edmonton, and “cal” tells us that the router is in Calgary. If we add a second router in Edmonton, we might give it the name rtr-edm-002.

And names their switches

sw-core-edm-001
sw-access-edm-001
sw-core-cal-001
sw-access-cal-001
sw-access-cal-002

“sw” tells us that the device is a switch, “core” tells us that it is a core switch, “access” tells us that it is an access switch, and again “edm” and “cal” tells us where it is. We might use the serial number of the device in place of the “001” or “002” although both are acceptable.

We should also have a clearly defined Internet Protocol address scheme. Before we create the scheme, we should think about

The quantity and type of each device that requires an IP address (a router, a switch, a server, a laptop, a desktop, a wireless access point, or a camera)
How many devices we will add in the future
The physical location of each device
The number of VLANs (logical network segments) required
Whether DHCP is required
What IP address range we are working with

If we have only one physical office, then we should divide the network into as many VLANs as are required. We select an IP address range for the entire network and then subdivide it based on the number of VLANs that are present. We should make sure that each VLAN has enough IP addresses to accommodate all current and future devices.

If we many physical offices on a WAN, then we should divide the network into pieces, one for each office. Within each office, we should divide the network into as many VLANs as are required. We select an IP address range for the entire network and then subdivide it based on the number of offices that are present, and then subdivide it again for as many VLANs that are present. We should make sure that each VLAN has enough IP addresses to accommodate all current and future devices.

We should keep the VLANs consistent across multiple offices.

Consider the following example

I have two offices – one in Edmonton and one in Calgary
I decide to create four VLANs in each office
- VLAN 8 for security cameras
- VLAN 10 for management of network devices
- VLAN 15 for telephones
- VLAN 22 for computers
I have chosen a Class A Private Network with an IP address range of 10.0.0.0 to 10.255.255.255. I won’t get into the details of subnetting because it is beyond the scope of this book.
- Edmonton’s network is given the range 10.0.0.0 to 10.0.255.255
- Calgary’s network is given the range of 10.1.0.0 to 10.1.255.255
- That gives us a range of over 65,000 IP addresses for each office, which is plenty for future growth
- It also gives us the ability to create 253 more networks such as 10.2.0.0 to 10.2.255.255, each of which has over 65,000 IP addresses in it. That means we can add 253 more offices without worrying about running out of IP addresses.
- Within Edmonton’s network, I decide that
  - VLAN 8 will use the range of 10.0.8.0 to 10.0.8.255
    - The first security camera will have an IP address 10.0.8.1
    - The second security camera will have an IP address of 10.0.8.2, and s on
    - These addresses will be assigned statically
    - The gateway IP address will be 10.0.8.254
  - VLAN 10 will use the range of 10.0.10.0 to 10.0.10.255
    - The first router will have an IP address 10.0.10.1.
    - We will put the routers in the range of 10.0.10.1 to 10.0.10.10. An office won’t have more than one or two routers.
    - The first switch will have an IP address of 10.0.10.11
    - The second switch will have an IP address of 10.0.10.12
    - The switch IP address range can be 10.0.10.11 to 10.0.10.50
    - The first access point will have an IP address of 10.0.10.51
    - The access point IP address range will be 10.0.10.51 to 10.0.10.100
    - The first server will have an IP address of 10.0.10.101
    - The server IP address range will be 10.0.10.101 to 10.0.10.150
    - These addresses will be assigned statically
    - The gateway IP address will be 10.0.10.254
  - VLAN 15 will use the range of 10.0.15.0 to 10.0.15.255
    - These are telephones and will receive their addresses over DHCP
    - The DHCP server will be 10.0.15.254 (same as the gateway)
    - The gateway IP address will be 10.0.15.254
    - We have enough addresses for 254 devices
  - VLAN 22 will use the range of 10.0.22.0 to 10.0.22.255
    - These are computers and will receive their addresses over DHCP
    - The DHCP server will be 10.0.15.254 (same as the gateway)
    - The gateway IP address will be 10.0.15.254
    - We have enough addresses for 254 devices
- Within Calgary’s network, I decide that
  - VLAN 8 will use the range of 10.1.8.0 to 10.1.8.255
  - VLAN 10 will use the range of 10.1.10.0 to 10.1.10.255
  - VLAN 15 will use the range of 10.0.15.0 to 10.1.15.255
  - VLAN 22 will use the range of 10.1.22.0 to 10.1.22.255
  - I would create the same IP address scheme for the Calgary office as I did for the Edmonton office. As they say in some math textbooks, the Calgary office is left as an exercise for the reader.

Data Protection

How do we protect our data?

DLP or Data Leak Prevention (also known as Data Loss Protection) is a technique used to prevent data from leaving an organization. Data leaks can be accidental or deliberate. Data leaves an organization in three ways
- Electronically.
  - A user can attach sensitive data to an e-mail. For example, a user can accidentally e-mail sensitive customer data to the wrong person.
  - A user can upload sensitive data to a file sharing website or blog.
- Physically
  - A user can take physical copies of data (such as documents, blueprints, charts, etc.) from the organization.
  - A user can copy data onto a USB drive and take it out of the organization.
  - A user can photograph sensitive data with a cellular telephone.
- Intellectually.
  - Most of the data leaves the organization through the brains of the employees. Data can include client lists, trade secrets, and other intellectual property.

A Data Leak Prevention appliance is a physical network device that scans outgoing network transmissions and prevents data leaks.

The appliance is designed to recognize patterns within the data such as credit card numbers (which have 16 digits) or phone numbers (which have 10 digits)
- The appliance may have advanced heuristics to analyse the context of each data transmission, including the contents, the sender, and the recipient, to determine if the data can be sent.
- The appliance may block the transmission, allow the transmission, or trigger a manual review.
- When the data being transmitted is encrypted between the end user’s computer and an external network, then the DLP appliance will not be able to read the data. An organization will typically have full, unencrypted access to the e-mail accounts of its users, but will not be able to filter encrypted traffic (such as GMAIL or file sharing websites). These types of websites should be blocked.
- When combined with an SIEM, we can use artificial intelligence to detect users who are accessing too much data or data that is not required for their role.

The use of USB keys should be prohibited.

A USB drive allows a user to copy data from a computer and take it out of the organization.
- USB drives can contain viruses, including firmware viruses that cannot be detected by antivirus programs
- At a minimum, an organization should force all users to encrypt their USB drives before being permitted to copy any data onto them

A DLP appliance will only detect patterns in data leaving the network. An organization can take the following additional preventative measures

Document control system, which logs each time a user views, edits, or prints a document. This will not prevent a user from taking a sensitive document but can aid in detecting a leak after the fact. It will also deter a user from copying many documents if he knows he is being monitored. An organization can monitor and detect users who are opening or printing large quantities of documents or viewing documents that do not relate to their job duties.
- Digital Rights Management, which can permit users access to only the documents that they require to perform their job. DRM can also prevent users from editing or printing documents.
- Prohibit users from bringing cell phones to work. Cell phones can be used to copy sensitive data.
- Searching users before they leave work.
Masking. Data Masking or Data Obfuscation is when we modify sensitive data to hide its true contents. It allows a person to work with the data without seeing too much of it. That reduces the risk that too much data will be stored in the person’s brain.

For example, say somebody working in the tax office is processing your return. When the return is received, it is scanned and stored in a database. The tax examiner needs to see all the dollar amounts that you reported, but they don’t necessarily need to know your name, address, or social insurance number (social security number). Thus, the unnecessary data is obfuscated.

There are many other schemes for hiding data
- Substitution. With substitution we are trying to disguise the data but keep it looking realistic. If we have a database that contains customer personal information, we might change the people’s names to fake names that look real. We might create a list of fake names and use a script to replace the real names with fake ones.
- Shuffling. Shuffling data means rearranging the data within a column. If we have a list of customers and how much money each one spent, we can shuffle the customer names. Now each customer has the wrong dollar value associated with their name.
- Variance. We can modify numerical data so that it is like the original value but not exact.
  
  For example, if we have a database of people and their dates of birth, we can add a random amount of days to their date of birth. Now we still know each person’s approximate age, but we don’t know their true date of birth.
  
  In another example, we have a database of people and their salaries. Instead of reporting their exact salaries, we can change each one to a range. For example, if a person makes $86,434 per year, we can report that he makes between $80,000 and $90,000 per year.
Encryption

There are different ways and scenarios to encrypt data. At any time, data is either at rest (being stored), in transit, or in processing. Data should always be encrypted.

At Rest. At rest, data should be encrypted. We encrypt data at rest because there is a risk that the storage medium can be stolen. When stolen, the data can be read.

We first select an encryption algorithm and create an encryption key. When the data is written to the storage medium, the algorithm is used to encrypt the data. We may break up our storage medium into different partitions, and we may use a different key for encrypting the data in each algorithm.

The keys are stored in a secure location. Access to a key is controlled and logged. In other words, when you attempt to access the data, the system checks whether you have the right to access the data. If you do, then the system obtains the corresponding key and decrypts the data. You – the user – will probably never see the actual key.

Encryption at rest is available for all major cloud storage services.
- In Transit. In transit, data should be encrypted. When transporting data, the sender and receiver should agree on an encryption method and generate a key. The sender encrypts the data and the receiver decrypts the data.
  
  As soon as the receiver decrypts the data, it encrypts it with a new algorithm and key appropriate for storing it.
- In Processing. In processing (also known as in use), data should be encrypted. In practice, this is more difficult to implement than encryption at rest or encryption in transit. It attempts to encrypt data that is stored in the RAM or in a CPU cache.
  
  It is important to secure data in use because it could contain encryption keys and personal information. If the RAM is removed from the computer and quickly frozen, it can be read. This would allow a hacker to extract encryption keys and other types of information.
  
  The technology is still developing, but encrypted RAM does exist. Intel Total Memory Encryption (Intel TME) is a new feature available on some processors that encrypts data in the memory.
Tokenization. Data tokenization is a process of replacing a sensitive piece of data with a non-sensitive piece of data that uniquely identifies it. The token must match the data type and length. If we replace a piece of data with another that is the wrong length or type, then we might have an error in the database that stores it.

One common use of a token is to process a credit card transaction.
- We want to use our credit card at a merchant through a payment app, but we don’t want to give them the credit card number
- The credit card processor gives us a new credit card number. This is our token. The token has the same length as a real credit card number (16 digits), and it is linked to our real credit card number.
- We go to the store and pay with a credit card. We give the merchant the new credit card number.
- The merchant processes the transaction with the new credit card number (the token). When the server sees the new credit card number, it understands that it is a token linked to our real credit card.
- We can destroy the token after the transaction.
Rights Management. Rights management is a concept which identifies
- Who created the data? This is known as the data owner. The data owner doesn’t always have the right to delete or modify the data after they have created it.
- Who has custody of the data? The data custodian enforces the encryption, transport, and storage of the data.
- Who can access the data? We may enforce access to the data through an access control list or group policy.

There are three main ways to develop/implement encryption algorithms that are guaranteed to fail

Developing a cryptographic algorithm or writing an implementation of an algorithm yourself. It takes years to develop, test, and trust cryptographic algorithms, and even then, many cryptographic algorithms are regularly exploited. No organization has the technical capability to develop their own cryptographic algorithm.

Cryptographic algorithms use complicated math. Cryptographic algorithms are supposed to rely on random numbers, but computers are not capable of truly generating random numbers. Instead, algorithms contain random number generators (code that attempts to generate numbers as randomly as possible).

The random number generation can be frequently exploited in an algorithm. These exploits are not usually detectable until the algorithm has been in use for many years and patterns in the encrypted data emerge. At that point, the algorithm is known to be flawed and all the data encrypted by it becomes exploitable.

Keep in mind that hackers (and government agencies) intercept and store encrypted data, waiting for a time when the algorithm becomes exploitable.
Using a proprietary algorithm. A proprietary algorithm is one whose inner workings are kept a secret. The proprietary algorithms must not be used because it is never possible to understand whether it is functioning properly or not. In addition, the manufacturer of the proprietary algorithm may have inserted a backdoor that is undetectable.
Using a weak algorithm. A weak algorithm is one that was previously accepted but is no longer considered secure.

As computing power increases, it becomes possible to crack algorithms with keys of longer and longer lengths. Algorithms that were once considered uncrackable are now easily exploitable.

Eventually (due to advances in computing power), every form of encryption used today will be cracked, and the data that was encrypted will be exploited. In theory, many of the forms of encryption in use today won’t be cracked for at least 100 years, at which point, the data protected by them will be considered worthless.

A strong algorithm can be implemented weakly. For example, the algorithm could be incorporated into a software program that does not randomly generate keys or that uses the same keys over and over. Each person who uses the software program encrypts their data with the same key.

How to prevent?

One should assume that all encrypted data (and unencrypted data) in transit is being intercepted and stored forever.
Select an algorithm that is well known to be secure and open-source.
Ensure that the implementation of the algorithm is also secure and open-source.
The longer the key, the harder it is to crack the data encrypted by the algorithm. At the same time, the longer the key, the longer it takes to encrypt the data. It is important to select a key length that balances those two concerns. Think about how much time it will take a hacker to decrypt the data (1 year, 10 years, 100 years?) and whether the data will still have value at that time.

Data Sovereignty and Geographical Considerations

Where should data be stored? Consider an organization that backs up its data to removable tapes or hard disk drive cartridges. Should they store the tapes in their office? Of course not. If the office burns down, the servers will be destroyed, and the tapes will also be destroyed.

The tapes should be stored “off site”. There are several options

Storage service such as Iron Mountain
Another office location (where the organization has multiple offices)
At a bank safe deposit box

How far away should the data be stored? If there is a data loss, the organization must retrieve the back up, bring it back to the office, connect it to their equipment, and restore the data. The further the data, the longer it will take to return to the office, and the longer the organization will be without its data.

If the data is stored too close, there is a risk that a natural disaster will destroy (or make inaccessible) both the office and the data. Therefore, the organization may choose to store their data in another state. For example, an organization with an office in Miami, Florida may send their data to an office in New York, New York. If they kept their data in Miami, both the office and the data could be affected by a hurricane.

If the data can be backed up over a fast, dedicated internet connection, then the time it would take to retrieve the data is no longer relevant. An electronic back up may be more advantageous than a physical back up. Examples of electronic back up services include Amazon Glacier and Carbonite.

If the data is sent to a storage service, then the organization must consider

How much will it cost to store the data?
How long will it take to retrieve the data? One hour? One day? One week?
Can the storage service be trusted with the data, or does the data need to be encrypted? Data should always be encrypted before being sent to a third party and this question should not even be asked.

Consider that an organization in Florida sends its data to New York or California. Florida does not have strong privacy laws, but California and New York do. The organization must be aware that its data will be subject to the laws of the jurisdiction where it stores its data

Data must be stored in accordance with the privacy laws of the state that it is stored in
Data could be subject to disclosure (for example, a state court in California could demand production of the data in response to a civil or criminal subpoena, which would have no effect if the data remained in Florida)
The organization may be required to seek consent from the users whose data is moved to another country
We may need to store the data in the same state or country where it was collected, or seek explicit permission from its owner to store it elsewhere

It is difficult enough to move data from one state or province to another, but it is even more difficult (or even legally impossible) to move data from one country to another.

Data sovereignty is an idea that data should be subject to the laws of the country in which it is stored, and that people should have the right to determine where their data is stored.

Microsoft Corp. v. United States (In the Matter of a Warrant to Search a Certain E‐Mail Account Controlled and Maintained by Microsoft Corporation) before the United States Court of Appeals for Second Circuit was an important case regarding data sovereignty:

When a user signs up for a Microsoft service, such as Outlook e-mail, Microsoft creates an account for that user
Microsoft stores the user’s basic data (username, password, billing information, etc.) on a server in the United States
However, Microsoft stores the bulk of the user’s data (e-mails, photographs, etc.) on a server geographically closest to the user (this server could be in Canada, the United States, Ireland, etc.). By storing the data in a geographic location closest to the user, network latency is reduced.
In this case, a user who was suspected of drug trafficking signed up for an e-mail account, and Microsoft’s servers automatically chose to store the data in Ireland.
A United States Magistrate Judge in the Southern District of New York issued a search warrant ordering Microsoft to hand over the e-mails for this user.
- Recall from earlier that e-mails must not be disclosed in response to a subpoena (only a search warrant under the Stored Communications Act or SCA).
- In practice, search warrants issued under the SCA are not executed by force. An investigating agency serves the warrant on the service provider via hand delivery or fax. The service provider then electronically discloses the e-mails requested through the warrant.
- So why does the SCA require a search warrant? It is because e-mails are considered highly private and not subject to disclosure through a subpoena. A government agency can obtain a subpoena easily (not much proof is required), but a search warrant requires a higher legal standard (more proof is required).
  
  When congress wrote the law, they didn’t want to define a new legal standard for obtaining e-mails, so they used the search warrant standard, which is well defined in case law.
Microsoft refused to comply with the subpoena
- Microsoft said that the e-mails were stored in Ireland and therefore, a search warrant could not be used to obtain the e-mails (since the government could not physically enforce a search warrant outside of the United States)
- The court disagreed, considering that the SCA warrant was “subpoena like” in practice, and that Microsoft technically had control over the e-mails stored in Ireland
Microsoft appealed to the Circuit Court of Appeals
- The court agreed with Microsoft that e-mails stored outside the United States cannot be disclosed due to a warrant under the SCA
- The court said that laws, in general, apply only inside the territory of the United States and that the focus of the SCA was to protect the privacy of users, which is why it required the use of a search warrant to obtain e-mails
- The government of Ireland stated that it could provide the e-mails to the United States Department of Justice through a request under MLAT (Mutual Legal Assistance Treaty) and that the e-mails should only be disclosed to the government of Ireland
The case was appealed to the Supreme Court
- During the appeal, the Clarifying Lawful Overseas Use of Data Act or CLOUD Act was passed
- The CLOUD Act amended the SCA to require the production of e-mails stored overseas but under the control of US-based companies
- The CLOUD Act allows the executive branch to enter into data sharing agreements with other countries
- Since the CLOUD Act rendered the appeal moot, the case was dismissed by the Supreme Court
- Microsoft had to disclose the e-mails under the new CLOUD Act

Secure Sockets Layer (SSL) / Transport Layer Security (TLS) Inspection

SSL inspection intercepts encrypted communications on our network, decrypts them, and inspects the contents. It is also known as Deep SSL Inspection or Full SSL Inspection. It uses a device that can decrypt the SSL and TLS communications, which could be a firewall, router, or separate device.

Remember that encryption on the internet happens end to end. When you visit a website, your computer and the server hosting the website agree on an encryption key. Nobody can break the encryption, nobody on your network, nobody in your office, nobody at the ISP.

If a hacker wants to infiltrate our network without us knowing, he might send some malware, but do so through an encrypted communication with one of our users. Since the malware is encrypted, a firewall won’t be able to understand the contents. As we encourage encryption, over 80% of web traffic is now encrypted. So, we need a way to see through it. And we can.

So, if this encryption is so great, how can we crack it and read the encrypted contents so easily? In fact, what happens is that the firewall acts like a man in the middle. When you visit a website,

Your computer agrees to an encryption scheme with the firewall, thinking that the firewall is the website’s server
Your firewall agrees to an encryption scheme with the website, tricking it into thinking that it is actually your computer
Your firewall passes data between itself and the website and between itself and your computer
Data between the firewall and the website is encrypted
Data between your computer and the firewall is encrypted
The firewall can read all the data passing through it while it decrypts and encrypts it

For this scheme to function, your firewall must be configured with an SSL certificate. Then your administrator must configure your computer to trust the certificate (this can be done automatically). When you visit a secure website, your firewall substitutes the website’s certificate for its own.

If you brought your computer from home and connected it to your corporate network, the firewall’s decryption scheme would fail. Your home computer doesn’t trust the firewall’s certificate. If the firewall substituted a website’s certificate for its own, you would see an error message.

We should be careful about the quantity and type of data that we are decrypting because there is a risk that an administrator could see sensitive user data. For example, we may choose not to decrypt traffic between users and trusted banking or healthcare websites.

Decrypting data also increases the workload of the firewall. We must be sure that we have enough capacity to decrypt all the data that passes through our network.

Hashing

I mentioned hashing in earlier parts of the book. A hash is a one-way mathematical function. We use hash functions when storing passwords and other types of sensitive data.

Since it is difficult or impossible to reverse a hash, if the hash data is compromised, a hacker won’t be able to use it to guess the original data.

API Considerations

I mentioned APIs (Application Programming Interfaces) earlier as well. We should take care to protect our APIs

Only allow access to APIs via encrypted methods
Strictly enforce permissions regarding the types of data that each user can read, write, or modify
Use one-time tokens to prevent hackers from replaying the contents of an API communication

Site Resiliency

A Recovery Site is a location that a company can use to resume operations when their main site is harmed. The recovery site might be an office, a factory, or a data center. It contains all the technology and equipment that the company requires to resume operations should their existing facilities be damaged or inaccessible.

An organization must weigh the cost and benefit of the type of recovery site they will operate. A hot site allows an organization to resume operations immediately (without a cost to its business) but is more expensive. A cold site forces an organization to wait to resume operations (at a substantial cost) but is much cheaper.

There are three types of recovery sites

A hot site is a site that is continually running. With the use of a hot site, an organization has multiple locations that are operating and staffed. For example, an insurance company may have a call center in New Jersey, a call center in Florida, and a call center in California. The insurance company staffs all three centers 24/7. If the California call center is affected by an earthquake, the insurance company diverts calls to New Jersey and Florida, and operations are not disrupted.

in the case of a data center, the organization will maintain data centers in multiple geographic locations. These data centers are connected to each other over WAN links. Data is replicated across multiple data centers, so that damage to one data center does not compromise the data. For example, an insurance company stores customer data in data centers at California, Utah, and Virginia. The Virginia data center is hit by a tornado, but all the data has been replicated to the other two centers. The organization and its customers can continue accessing their data.

A hot site is expensive to maintain. In the example of the insurance company, they can staff the three sites cost-effectively. A smaller organization (such as a restaurant or warehouse) that operates out of a single location may not find it cost-effective to operate a second site.
A cold site is a location that does not contain staff or equipment. An organization hit with a disaster must send employees to the cold site, bring in supplies, and configure equipment. The cold site does not contain any data; the organization must restore its data from back up.

A cold site is cheaper to operate than a hot site. In the event of a disaster, the cold site can be used to operate the business. The cold site may be an empty office, an abandoned warehouse or a trailer.

Companies such as Regus provide immediate short-term office space in the event of a disaster.
A warm site is a compromise between a cold site and a hot site. A warm site may contain some hardware and preconfigured equipment. The organization may need to bring in staff and/or specialized equipment for the warm site to become operational. The warm site may contain copies of data, but they will not be current.

Deception and Disruption

In addition to all the defensive techniques I mentioned, we can (and should) take an offensive approach against bad actors.

A Honeypot is a network device that appears to be vulnerable but is in fact designed to detect hackers. A network security administrator creates a honeypot to identify hackers and/or to distract them from legitimate network resources. A honeypot allows an organization to understand the motives behind the attacks (which can be used to better protect network and other resources), and the type and sophistication of the hackers.

There are several types of honeypots

Pure honeypot – a production system with a monitoring device on the network interface. The pure honeypot pretends that it is a legitimate machine. The pure honeypot may be detected by some hackers.
High interaction honeypot – runs on a physical or virtual machine and imitates many production different systems. The high interaction honeypot consumes a substantial amount of resources due to its sophistication. When run on a virtual machine, the honeypot can be quickly regenerated.
Low interaction honeypot – simulates only necessary services, allowing more honeypots to operate with fewer resources. Low interaction honeypots may be detected by some hackers.
SPAM honeypot – spammers will locate servers that use open relays (an open relay is an e-mail server that allows an unauthenticated user to send an e-mail) and use them to send e-mails. The spammer will attempt to send e-mail test messages through the SPAM honeypot; if successful, the spammer will continue to send e-mail through the honeypot. The SPAM honeypot tricks the hacker into thinking that his e-mails were successfully delivered. The honeypot can detect the SPAM messages and detect the spammer.

A honeyfile is a fake file that we place on a shared drive. A legitimate user will not access the honeyfile because it serves no legitimate purpose, but hacker who is stealing data will. Once the honeyfile is accessed, an alarm is triggered. We can determine who accessed the file.

A honeynet takes the honeypot one step further. A honeynet is an entire fake network, complete with fake servers, user devices, and file shares. Each of the fake devices may itself be a honeypot, and the file shares may contain honeyfiles. A full high interaction honeynet may take a long time to create but can capture many hackers.

Earlier I mentioned how newer malware detection programs use machine learning to detect the malware. If hackers could control the data that the program used to learn, then they could manipulate it into allowing back doors and malicious software.

Well, if we set up a honeypot with a machine learning algorithm, the hackers would be tempted to feed it fake data to trick it. We could capture the fake data, known as fake telemetry, and feed it to our real machine learning algorithm. This would make the real algorithm even smarter because it would be able to ignore fake data.

Finally, we can set up a DNS sinkhole. Remember that DNS converts a domain name into an IP address. Some types of malware force a user’s computer to visit specific websites either to show advertisements or to upload data; this happens in the background without the user’s consent or awareness.

In order to visit the site, the user’s computer must first contact a DNS to obtain the correct IP address. A DNS sinkhole is a Domain Name Server that returns false data. If we create a DNS sinkhole and populate it with domain names that only malware-infected computers will request, and then return an IP address of a monitoring device that is on our network, we can capture the traffic from computers that are infected.