4.3 Given a scenario, implement workstation backup and recovery methods

  • Backup and Recovery
    • Full
    • Incremental
    • Differential
    • Synthetic
  • Backup Testing
    • Frequency
  • Backup Rotation Schemes
    • On Site vs Off Site
    • Grandfather-Father-Son (GFS)
    • 3-2-1 Backup Rule

It is important to back up data regularly.  A large organization may have an individual or group dedicated to maintaining back ups.

  • Back up all data regularly (incremental and full back ups)

  • Verify that the data has been backed up

  • Retain a copy of the backed-up data on site and retain a copy off site (in case of a natural disaster)

When planning a back-up strategy, think about whether it allows the organization to resume normal operations, and how quickly.  The speed of the recovery should be weighed against the cost of the back-up strategy.  

There are four main types of back ups: Full, Differential, Incremental, and Snapshots.  The type of back up affects the way that data is backed up and the way that data is restored.

A Full backup is a backup of the entire set of data.  The first time a back up is performed, it must be a full backup.  An organization may perform a full back up once per week or once per month, or at some other interval.  A Bare Metal back up is a full backup of a logical drive, which includes the server operating system.  A Bare Metal back up can be used to restore the server’s operating system and applications, whereas a normal full back up may contain only user-generated data.

A Differential Backup is a backup of the data that has changed since the last full backup.  The organization must be careful to ensure that it is able to accurately keep track of data that has changed.

An Incremental Backup is a backup of the data that has changed since the last Full Backup or Incremental Backup.  Why use Incremental or Differential backups?  Which is better?  How does it work?

For example, an organization performs Full and Differential backups.  They perform

  • A full back up on Monday (all the data is backed up)
  • A differential back up on Tuesday (the data that was changed between Monday and Tuesday is backed up)
  • A differential back up on Wednesday (the data that was changed between Monday and Wednesday is backed up)
  • A differential back up on Thursday (the data that was changed between Monday and Thursday is backed up)
  • A differential back up on Friday (the data that was changed between Monday and Friday is backed up)

If the organization performs Full and Incremental backups, then they perform

  • A full back up on Monday (all the data is backed up)
  • An incremental back up on Tuesday (the data that was changed between Monday and Tuesday is backed up)
  • An incremental back up on Wednesday (the data that was changed between Tuesday and Wednesday is backed up)
  • An incremental back up on Thursday (the data that was changed between Wednesday and Thursday is backed up)
  • An incremental back up on Friday (the data that was changed between Thursday and Friday is backed up)

An incremental backup generates less data than a differential backup, and is faster to perform, but it is faster to restore data from a differential backup.  If the organization uses differential backups and experiences data loss on Thursday

  • It must restore the data that from Monday’s full back up
  • Then it must restore the data from Thursday’s differential back up

If the organization uses incremental backups and experiences data loss on Thursday

  • It must restore the data that from Monday’s full back up
  • Then it must restore the data from Tuesday’s incremental back up
  • Then it must restore the data from Wednesday’s incremental back up
  • Then it must restore the data from Thursday’s incremental back up

Notice that in every process, the full backup must first be restored.  In the case of a differential backup, the most recent differential backup must then be restored.  In the event of an incremental backup, all the incremental backups created after the full backup must be restored.  An incremental backup takes less time to create than a differential backup but takes longer to restore.

If the organization creates a full backup each week, then the organization would (at most) restore six incremental backups.  If the organization creates a full backup each month, then they would have to restore up to thirty incremental backups.

Why use a combination of full and incremental back ups?  Why not perform a full back up every day?  A full back up may take a long time to run and take up a large amount of space.  What if the full back up takes 28 hours to run – then it’s impossible to create a full back up every day? 

What if the organization maintains 10,000TB of data, but only changes approximately 100TB per week?  Should the organization generate 70,000TB of data back ups every week?  If the back up location is in the cloud, then the organization will need to pay for 70,000TB of storage and bandwidth each week.

Somebody figured out that if we took a full back up on Monday, and then a bunch of incremental back ups on Tuesday, Wednesday, Thursday, and Friday, we could run this data through algorithm to generate what looks like a full back up, as if we ran it on Friday (or any other day).  We call this a synthetic backup.  When we don’t have time to run full back ups, we can just run incremental back ups forever and use them to generate full backups that we can restore in the event of data loss.

A snapshot is an image of a virtual machine or a disk.  A snapshot allows an organization to restore a server or application to a previous state in the event of a hardware failure or corruption of the software.  The benefits of a snapshot

  • A server can be restored to an exact state, which could include its operating system, applications, configuration, and data (even while running).

  • It would otherwise take hours or days to restore a server to its original state, especially if the application installers are no longer available, or if the installation process was not documented

  • If a user makes changes to the system that cause damage or undesired operation, the system can be restored to a working state.  We can take a snapshot before making changes to the system.

It may not always be possible to take a snapshot.  A hypervisor can take a snapshot of a live virtualized system while it is running, but it may not be possible to image a physical system without shutting it down (which could affect operations).

How often does an organization need to perform a back up?  The organization must weigh the cost of the back up against the cost of the potential data loss, and the time that it will take to restore the data.

  • If the back up is performed daily, the organization could risk losing a day’s worth of data.

  • If the back up is performed weekly (say on a Monday), and data loss occurs on a Friday, the organization could lose all the data generated between Monday and Friday.

  • If the back up is performed in real time (i.e., replicated to another site), then the organization will not lose any data, but replication is expensive.

What are all the methods that we can use to back up our data?

  • Cloud.  There are many services including Amazon S3 and Amazon Glacier.  Back ups can be configured automatically.

  • Replication over SAN (Storage Area Network).  When having multiple locations, the SAN can replicate the data to each location.  This is good for massive volumes of data.

  • NAS (Network Attached Storage).  This is good for medium sized volumes of data (up to 10 TB)

  • Disk Cartridge.  A disk cartridge is like a removable hard drive that you can store.  Disk cartridge back ups are good for small volumes of data.

  • Removable Disk (USB Drive).  You can connect a USB drive and back up the data manually

We don’t need to have the same back up strategy for the entire organization.  Some data may be more valuable than others.  We can also archive old data that we maintain for historical purposes but don’t access or don’t access often.  Think about the following

  • How much money will the organization lose if the data is lost?

  • How much time (in hours or days) can the organization wait before having the data restored?    How much money will the organization lose per hour or per day?

  • What is the volume of data to be backed up?

  • Based on this information

    • We know how much we can afford to spend on the data back up

    • We know how much data needs to be backed up in GB or TB or PB

    • We know how quickly we need to restore our data.  The time to restore the data is the time to bring the data back up medium to the facility and the time to complete the restoration process.

      • If the back up is in the cloud, then we can calculate the bandwidth we require

      • If the back up is at a storage vendor, then we can calculate out the maximum distance of the storage location

      • We can determine how often to run the back up and whether we can use incremental or differential back ups

      • We can decide whether the back up is online or offline.  An online back up is one that is physically or logically connected to the system.  An offline back up is one that is on a storage medium such as a magnetic tape, or a hard disk cartridge.

        It is usually faster to restore data from an online system because the data is already accessible.  We just need to copy it.  The offline back up must be physically connected to the system and then copied.  If it is offsite, then it must first be brought to site, and then connected.

  • What is the organization’s risk appetite?  Does the organization like to spend money to avoid risks?  Or does it like to save money and take big risks?

    • If the organization has a high-risk appetite, then they may not want to spend the money on multiple back ups.

    • A common strategy is called 3-2-1. – we have three copies of the data, and two types of media, with one off site.  How it works

      • We have one copy of our data in production (this is the live data)

      • We keep two copies of our data back up in the cloud, each in a separate region

      • We keep two physical copies of our back up.  Each one should be on a separate type of medium.

      • One physical copy should be stored on site and one physical copy should be stored off site (either at another office or at a vendor like Iron Mountain)

        • If we use a Storage Area Network (SAN) with physical replication, then that might be considered the off-site physical copy

  • We must rotate our physical storage media.  There are different techniques designed to reduce the total number of media in use (to reduce costs).  A common rotation scheme is called Grandfather-Father-Son

    • The grandfather is the monthly back up.  We have a set of media for monthly back ups that we rotate.  We might keep the last twelve months of grandfather back ups.

    • The father is the weekly back up.    We have a set of media for weekly back ups that we rotate.  We might keep eight weeks of father back ups.

    • The son is the daily back up.    We have a set of media for daily back ups that we rotate.  We might keep thirty days of son back ups.
    • In this example, we have back ups stretching back one year, but we only must keep 50 pieces of back up media.