2.8 Summarize the basics of cryptographic concepts.

2.8 Summarize the basics of cryptographic concepts

Digital Signatures
Key Length
Key Stretching
Salting
Hashing
Key Exchange
Elliptic-Curve Cryptography
Perfect Forward Secrecy
Quantum
- Communications
- Computing
Post-Quantum
Ephemeral
Modes of Operation
- Authenticated
- Unauthenticated
- Counter
Blockchain
- Public Ledgers
Cipher Suites
- Stream
- Block
Symmetric vs Asymmetric
Lightweight Cryptography
Steganography
- Audio
- Video
- Image
Homomorphic Encryption
Common Use Cases
- Low Power Devices
- Low Latency
- High Resiliency
- Supporting Confidentiality
- Supporting Integrity
- Supporting Obfuscation
- Supporting Authentication
- Supporting Non-Repudiation
Limitations
- Speed
- Size
- Weak Keys
- Time
- Longevity
- Predictability
- Reuse
- Entropy
- Computational Overheads
- Resource vs Security Constraints

Consider two parties, Alice and Bob, who need to communicate privately. Carol also needs to communicate privately with Bob. Eve is attempting to spy on them.

The plaintext is the information that has not been encrypted. The ciphertext is information that has been encrypted.

An encryption algorithm works in conjunction with a key. Think of an algorithm like a lock on a door. There are many models of locks. Even if you and your neighbor have the same model lock, you will have different keys.

Vulnerabilities

The key is too short. In encryption, somebody can guess the key and decrypt the text. In the real world, somebody can guess the key and 3D print a new one to open the lock.
The lock is too weak. In encryption, somebody can break the algorithm even without knowing the key. In the real world, somebody can pick the lock because the key only has a couple of teeth.
There is a backdoor. In encryption, the system has a security vulnerability unrelated to the lock. For example, a person installed a keylogger on the computer where the encryption takes place. In the real world, somebody left a back door or window unlocked.

Digital Signature

A digital signature proves that a specific person signed a specific message. A valid signature proves that the message was created by the signer and that it was not modified in transit.

An electronic signature is not legally accepted in all countries. Some documents or transactions may require a “wet signature” in ink.

A digital signature has the following

A key generation algorithm
A signing algorithm
A verification algorithm

As before, Alice would like to provide Bob with a signed document.

Alice selects a private key
Alice uses the private key to create a public key, which is published and available
Alice uses the private key to sign the document
For the key to be valid, it must be impossible to generate a signature without knowing the private key
Alice sends the signed document to Bob
Bob retrieves Alice’s public key and uses it to verify her signature

Typically, Alice will not sign the entire document. She will generate a hash of the document and then sign the hash.

It is more efficient to calculate a signature on a hash than on the entire document
The document may be too large to sign, in which case it must be broken into separate portions, each of which must be signed separately
The signature algorithm may not be able to sign/perform calculations on the entire content of the document, especially if it contains attachments, photographs, or other types of data. Some signature algorithms only work on text.

How to forge a digital signature

Choose a random signature
Use the verification algorithm to generate a different message that could correspond to the signature (or the hash that could correspond to the signature) – a collision. This is practically impossible when the users are using a strong hash.
There are three types of forgery
- Existential forgery
  - Create a message and signature pair that was not created by the legitimate signer
  - This is considered the weakest hacking method. An algorithm that prevents existential forgery is practically unforgeable.
- Selective forgery
  - Create a message and signature pair for a pre-selected message
- Universal forgery
  - Create a valid signature for any message
  - This is the strongest hacking method

Other ways to forge a digital signature – What You See is What You Sign

The security of the computing platform is more important than the security of the key
A user is signing what he sees on the screen, but an untrusted application can display a different document than what is saved
Therefore, the user’s signature can be applied to a fake document. We can create an app that shows the user one document but applies his digital signature to another.
The user is not actually inspecting the bits that are signed (or may not inspect the document after it has been saved but before transmitting it)

Benefits of a digital signature

A user can prove that the document was signed by the person who claimed to sign it
A user can prove that the document was not changed during transmission
The signer cannot later deny that they signed the document

The digital signature algorithm

The algorithm must be secure
The users must properly use their signatures
The private key must be kept private
- Place the key on a smart card or token
- Only use your digital signature with trusted applications
The owner of the public key must be verifiable. That means there must be some way to trust that the public key belongs to the owner. Typically, to obtain a trusted digital signature, you must buy it from a company that verifies your identity. The issuer signs your signature with their signature, attesting that your signature does in fact belong to you.

There are many laws related to signatures. Originally, a facsimile of a signed document was accepted like the original, but an e-mailed/scanned document was not. Governments have passed laws requiring the acceptance of digital/electronic signature.

For example, Title 21 CFR Part 11 in the United States requires the government to accept electronic signatures.

An electronic signature could be a printed name or an image of a signature. It is not necessarily a digital signature, nor is it necessarily cryptographically secure.

The Electronic Signatures in Global and National Commerce Act or ESIGN Act allows electronic signatures to be used for transactions that affect interstate or foreign commerce. Consumers have the right to opt out of an electronic signature requirement.

Key Stretching

Key Stretching is a technique to make a weak key stronger. A human might choose a short password. The hacker would use a brute force attack to break the password (type in every possible password until he gets the right one).

The key is made longer by being inserted into an algorithm. The algorithm outputs a stretched key (known as the enhanced key). The enhanced key cannot be broken via brute force even if the original password could.

The hacker would have to attempt to brute force the enhanced key. The hacker would not be able to brute force the original password because he would have to calculate the enhanced key from each one first, which would consume more computing power than he has.

There are several key stretching (password hashing) algorithms

BCRYPT
- Used by many Linux distributions
- The password is converted to a 184-bit value
- The function iterates thousands or even millions of times
- BCRYPT adds a 128-bit salt
PDKDF2
- Password-Based Key Derivation Function 2
- Superseded PBDKF1, which could produce keys up to 160 bits long
- The function iterates multiple times, between 1000 and 100,000 times
- PDKF2 is easier to brute force than BCRYPT because it uses very little RAM; a specific processor or GPU can be tuned to brute force a PDFK2

In the forward mode, running an algorithm millions of cycles to stretch a password does not occupy much computing power (a few seconds). When attempting to brute force the algorithm in reverse, a hacker’s computer is slowed down by a factor of 1000 to 1,000,000 and will not be successful.

Implementation vs Algorithm Selection

The type of algorithm selected must be based on

The types of data being encrypted
The crypto service provider
- A software library that provides cryptographic functions
- Instead of manually writing code to implement a cryptographic function in an application (which may introduce security vulnerabilities), a developer can incorporate a pre-existing library
- The library takes a plaintext input and outputs a ciphertext output or vice versa. The developer does not need to worry about the internal function of the library.
- The library can be replaced when newer or more secure libraries are made available
Crypto module
- A crypto module is a physical device that performs encryption and decryption
- The module may have specially designed hardware that is specifically suited to perform the encryption and decryption functions
- An example is an AES engine in the Apple iPhone. This engine is a physical processor that performs encryption and decryption, and no other function. It is specially designed to remove encryption and decryption workloads from the iPhone’s main processor.
- When designing a device, we might physically install a crypto module

Diffusion

Consider a text that needs to be encrypted with a cryptographic key. The input is plaintext and the result is a cyphertext. It should be difficult for the ciphertext to be reversed into plaintext by an eavesdropper.

In diffusion, the relationship between the plaintext and the cyphertext is not clear or cannot be determined easily. If a single bit in the plaintext is changed, then on average, half of the bits in the cyphertext should change. If a single bit in the cyphertext is changed, then on average, half of the bits in the plaintext should change. When an algorithm has strong diffusion, then it is difficult to reverse.

Confusion

In confusion, the key and the text are not related, or at least, do not appear to be related. It is difficult to determine the key from the cyphertext. If a bit in the key is changed, most of the cyphertext is changed.

Salt

Recall that it is bad security practice to store passwords in plain text. Passwords are typically hashed, and the hash is stored (the hash is not reversible).

But a hacker could generate a dictionary of passwords (common and uncommon) and calculate the corresponding hash for each one. This is known as a rainbow table. The hacker could then steal a hash and look up the corresponding password for each one.

Rainbow tables are readily available on the internet for passwords up to eight characters (every possible combination!) and rainbow tables of even longer passwords can be computed.

To prevent the use of rainbow table attacks, modern password hash functions incorporate a ‘salt’. The salt is a random set of characters appended to the end of each password before the hash is calculated. The hash and the salt are stored in plain text. If the hash database is compromised, the hacker would have to regenerate each rainbow table incorporating the salt into every password to make any sense of it.

Nonce

A Nonce is a bit of data added to a cryptographic algorithm to prevent replay attacks.

Consider that Alice is communicating with Bob

Bob wants to verify Alice’s identity
Bob requests Alice’s password
Alice provides her password, but may hash it first
Eve intercepts the password or the hash
Eve wants to impersonate Alice, so she sends the intercepted password or the hash to Bob later
This is known as a replay attack

If Alice adds a nonce to the password and sends it to Bob, then when Even intercepts and attempts to resend the same data, Bob will notice that the nonce is the same and will not accept the message. Eve would need to somehow intercept Alice’s message and send it to Bob in a way that is faster than Alice’s method so that Bob receives Eve’s message before he receives Alice’s.

A nonce should not be reused or not reused often (the nonce should be much larger than the number of times it is expected to be reused).

A nonce can be randomly generated or sequential.

A sequential nonce guarantees that it will not be reused.
A sequential nonce prevents replay attacks when it is verified that the nonce is incrementing
The best nonce has a random portion and a sequential portion. Bob might first send Alice some data that she incorporates into the nonce.

An Initialization Vector is like a nonce, except that it cannot be sequential. The Initialization Vector is not good for replay attacks because it could be faked.

Hashing

A Hash function converts a piece of data from one format into another. A Hash function is typically one-way. The goal of the hash function

Convert a piece of data from a variable length to a fixed length. An encryption algorithm may require an input of a fixed length. For example, an encryption algorithm may require a key that is 32-bits long, but users can select passwords of different lengths. The user’s password can be converted into a hash that fits into the algorithm.
Hide the original data. Original, plaintext passwords should not be stored. A user password can be converted into a hash that can be stored. If the hash is compromised, then the password will not be compromised. Many website databases are compromised, but if the user passwords are hashed and salted, then the original passwords are not determined. This only applies to a non-reversible hash.

Since the hash is not reversible, then multiple inputs may result in the same hash.

There are many hash functions. Attributes of a good hash function

Consider the mathematical space of the hash output. A 32-bit hash for example is 32 characters long. If the hash is only letters and numbers, then the hash would have 32⁶⁶ possible values.

The number of possible password input combinations is always infinite.

A hash should be continuous along the entire space, and there should be an equal probability of generating each output value.
Be impractical to guess two inputs that produce the same output. Given an input m₁, it should be difficult to find a different input m₂ such that hash(m₁) = hash(m₂). This is known as weak collision resistance.

It should be difficult to find two different messages m₁ and m₂ such that hash(m₁) = hash(m₂). This is known as strong collision resistance.
A hash should be efficient. There is a trade-off between how fast a hash can be searched and how much disk storage space it occupies. A large hash index can be searched quickly but occupies more space on the disk.

It is better to have a faster hash function over a slow one.
Deterministic. The hash must always return the same output for a given input. There shouldn’t be any randomness in the hash algorithm.
Not practical to be able to predict the input that created a given output. Given a hash value h it should be difficult to find any message m such that h = hash(m). This is known as pre-image resistance.
A small change to the input should result in a large change to the output. It can be assumed if two strings appear similar and have the same hash output, then they are identical.
A hash should be difficult
- Mathematically speaking, difficult means that it is not solvable in asymptotic polynomial time
- Cryptographically speaking, difficult means that it is secure against any person for as long as the system must be operational

The use of the hash

Message integrity. When a message is first sent, a hash of the message is computed by the sender. The hash is shared with the recipient. The recipient can calculate the hash on the message received. If the calculated hash matches the original hash, the recipient knows that the message did not changed during transmission.
Digital Signature. A hash is calculated over the entire message that needs to be signed. The hash is then signed. A calculation to produce a digital signature is complicated, and it would save computing power to calculate the signature on the hash instead of on the entire message.
Password. Passwords are stored as hashes instead of plain text. As discussed earlier, when a user account is created, the password is hashed, and the hash is stored. During a log in attempt, the hash of the entered password is calculated and compared to the original hash. If they are the same, then the user is authenticated. A hash allows a service provider to avoid storing the password in plaintext.
Proof-of-work. Proof-of-work is a system that is used to prevent denial of service attacks and to prevent automated systems from accessing websites. The system that is attempting to access a resource is presented with a difficult mathematical equation that involves calculating a hash. If the hash is correct, then the system is provided access. The equation must be difficult to calculate but easy to verify.

The machine requiring access is forced to use some of its computing resources before gaining access. If the machine is operated by a legitimate human user, then the calculation represents a minor inconvenience of a few seconds. If the machine is a bot making thousands of requests, the calculation would consume its resources.

Proof-of-work is also used by Bitcoin mining.
File identifier. A hash of a file can be calculated to uniquely identify the file. In a file system with millions of files, each file can be retrieved by providing its hash. File hashing is used by online file sharing systems.

The hash functions that are commonly used

MD5
- Produces a 128-bit hash
- MD5 was used to calculate checksums for files that have been transferred over the internet. It would verify that the file that was sent was also the one that was received. MD5 was also used to store passwords.
- MD5 is not secure because it is easy to find a collision and should not be used.
SHA-1
- Produces a 160-bit hash
- Used by the government to protect sensitive, unclassified material
- Computing power to find a collision can be obtained for less than $100,000
- In the process of being replaced by SHA-2
SHA-2
- Produces a 256-bit or 512-bit hash
- Used by Bitcoin
- SHA-2 is still in use and considered secure
SHA-3
- Produces a 256-bit or 512-bit hash
- SHA-3 is not designed to replace SHA-2 as SHA-2 is still considered secure
- Instead, SHA-3 is designed as an alternative that is available should SHA-2 be compromised in the future (and it will be!)
- SHA-3 uses a sponge construction. Data is absorbed, transformed, and then pushed out.
Blake, Blake2, and Blake3
- Blake functions were developed in various cryptographic competitions
- Blake2 is faster and more secure than SHA-2. It is faster and similarly secure to SHA-3.

MD5 converts inputs of different lengths into hashes that are 128 bits long
- The hashed value is also known as a message digest
HMAC
- HMAC is also known as hash-based message authentication code
- HMAC is a framework, not an encryption algorithm; any hash function can be used (such as SHA-1)
- It is used to authenticate a message and verify its integrity. It does not encrypt a message.
- Procedure
  - A secret key is obtained and used to generate two keys, an inner key and an outer key
  - The message is broken into blocks which are hashed
  - The first hash produces an inner hash using the message and the inner key
  - The second hash produces a final HMAC code from the first hash and the second key
  - The message is sent along with the hash
  - A recipient will calculate the hash again using the secret key and compare it with the hash that was received. If the hashes match, then the recipient can confirm that the correct message was received.
RIPEMD
- RIPEMD stands for Race Integrity Primitives Evaluation Message Digest
- RIPE Message Digest is a family of hash functions (RIPEMD, RIPEMD-128, RIPEMD-160, RIPEMD-256, and RIPEMD-320), each of which produces a different hash length
- The original RIPEMD produced a 128-bit hash but was affected by collisions
- RIPEMD-160 is the most common

Obfuscation

Recall that an obfuscation algorithm does not actually encrypt the plaintext, it merely hides it. There are several obfuscation algorithms in place, but an obfuscation algorithm should never be used by itself.

. XOR is a cipher that is also known as modulus 2 addition.

Collision

As mentioned previously, a collision is when two input values produce the same hash value. For any given hash value, there is a limited set of outputs but an infinite set of inputs. Therefore, mathematically, there are an infinite set of inputs that can result in each output.

By definition, a collision is when plaintexts m₁ and m₂ exist such that hash(m₁) = hash(m₂).

Consider that Alice created a document and plans to send it to Bob. Alice previously hashed the document and signed the hash. Mallory wants to trick Bob. Mallory will create a different document, which has an identical hash to Alice’s document and attach Alice’s signature to it, and then send it to Bob.

It would be difficult to create a meaningful document that also has the same hash as the original. Mallory may need to try millions or billions of different combinations of documents until she finds one that matches Alice’s hash. If the hash function was weak (if diffusion or confusion did not apply), Mallory would be able to tweak the document until it matched the hash.

Modifications to a collision attack

Birthday Attack
- Mallory wants Bob to sign Document B (the fraudulent contract)
- She first creates Document A (the legitimate contract)
- Mallory creates different versions of Document A, all of which look like the original, but are slightly different. The changes may include white space, commas, and other hidden formatting.
- Mallory then creates different versions of Document B
- She calculates the hash for each version of Document A and Document B until she finds a version of Document A that has the same hash as a version of Document B
- She presents this version of Document A for Bob to sign
- She then copies Bob’s signature onto the version of Document B with the same hash
- Bob can protect himself by
  - Keeping a copy of the document he signed
  - Making random changes to the document before he signs it
  - Using a large enough signature algorithm such that the hash calculations become impossible
Chosen Prefix Collision
- A hacker would append two prefixes, p₁, p₂ to the documents m₁ and m₂ (which have different hashes), such that the resulting documents have the same hash
- Mathematically speaking, such that hash(p₁ ∥ m₁) = hash(p₂ ∥ m₂)
Conditional Formatting
- Word and PDF documents can employ conditional formatting where certain text can display or not display depending on the location of the file, the date/time, or other attributes
- A hacker can insert hidden content into the document
- The victim would sign the document while the text is hidden, and then in the future, the hidden text would appear

Symmetric Algorithms

A symmetric algorithm is one that uses the same cryptographic key to encrypt and decrypt text. The keys may be identical or easily transformed from an encryption key to a decryption key and represent a shared secret between the two parties attempting to communicate.

A symmetric algorithm is less convenient than an asymmetric algorithm (public key cryptography) because both parties must have access to the same key.

Alice and Bob need to communicate. Alice and Bob agree on a shared secret (the key), but how can Alice share the key with Bob without anybody seeing it? She must put it on a USB drive and physically give it to Bob, or she must send it over the internet. If Alice encrypts the key with another key, then she would have to share that key as well.

If Carol needs to communicate with Bob, but Carol is thousands of miles away, and has never met Bob, then both Carol and Bob must exchange their shared secret over the internet.

Bob must use a different key with Carol than he did with Alice. If not, then Alice could intercept and decrypt a message between Carol and Bob. If Bob needs to communicate with many people, then Bob needs many keys.

Thus, the symmetric algorithm can be compromised regardless of the length of the key or the strength of the algorithm. On the internet, it is difficult to distribute a symmetric key with another party.

Key Exchange

A Key Exchange is a way for two people to exchange keys. Alice and Bob want to talk securely. Alice and Bob need to exchange encryption keys over the internet, but the internet is not secure. How can Alice and Bob securely exchange their keys, so that the further communication between them is encrypted? How can they be sure that the key is not stolen or intercepted?

There are several algorithms

Diffie-Helman Key Exchange. This allows two people to exchange keys even when there is an eavesdropper who sees all their communications.
Public Key Infrastructure. A Certification Authority (CA) holds the public keys for all users. A limitation of PKI is that there is no way to verify that the central authority has provided the key corresponding to the user with whom you want to speak to (a malicious authority could substitute their own key so that they can intercept your communications).
Web of Trust or PGP. PGP allows two users to directly exchange keys without the use of a CA.
Quantum Key Exchange. Quantum Key Exchange uses the quantum theory that an observation/measurement of an object in a quantum state modifies the object. If an eavesdropper observes the key exchange, the key will be modified, and the parties exchanging the key will be notified.

Elliptic-Curve Cryptography

An elliptic curve is a curve on a plane (2-dimensional) over a specific field, with a single point at infinity. I would try to draw it but it is infinite, so we would run out of paper.

In summary, if you pick two points on an elliptic curve and add them together, the result will be a third point that is on the curve. Elliptic curve cryptography does not consume much computing power; therefore, it is well suited for mobile devices.

The curve is defined by the equation

We can use the mathematical properties of this curve to encrypt our data. There are many algorithms that use elliptic curves

The Elliptic Curve Diffie–Hellman (ECDH) key agreement scheme
The Elliptic Curve Integrated Encryption Scheme (ECIES), also known as Elliptic Curve Augmented Encryption Scheme or simply the Elliptic Curve Encryption Scheme
The Elliptic Curve Digital Signature Algorithm (ECDSA)
The Edwards-curve Digital Signature Algorithm (EdDSA)

To implement the curve

The users must agree on the domain (the shape of the curve). In general, the requirement is to select several points on the curve
- P, the field size
- a and b, which define the equation
- G, the base of the curve
- n, such that nG = O, where n is the O is the point at infinity of the curve
- h where h = (1/n)|E(Fp)| , and where h is an integer between 1 and 4
The users can choose a random elliptic curve and calculate the above points, choose the points randomly and then generate a curve, or use a standard curve. It is faster to use a standard curve because calculations are complicated and time-consuming
- FIPS PUB 186-4 lists several standard curves for use by government encryption methods
Key size
- The field should be twice the size of the encryption key size. A 256-bit key requires a curve that is 2⁵¹²
Some curves can be calculated faster than others
- Projective coordinates can be used to add different points on a curve without using inversion
- Fast reduction can be accomplished if p is a pseudo-Mersenne prime, where p is approximately 2^d, or p = d² -1

Security risks

Side channel attack
- In Elliptic Curve Cryptography, a different operation is used to calculate addition and multiplication, each of which uses different amounts of power. A hacker could monitor the processor to detect the power consumption and timing to determine the curve parameters.
- A side channel attack can be prevented by using an Edwards curve. The Edwards curve allows addition and multiplication in the same operation.
Backdoor
- Some have alleged backdoors are present in some of the number generators used to create elliptic curves. If you are lazy and choose a curve from a preselected list, then you may be subject to a backdoor.
- It may be better to generate your own curve, if you can verify that it is secure

Perfect Forward Secrecy

In perfect forward secrecy, a session encryption key will not be compromised, even if the server private key is compromised.

Recall that Alice and Bob are still talking. Alice has a public key and a private key. Each time that Alice and Bob talk, they each use their private key to generate a session key. The session key is used only for that conversation, and then discarded. If a session key is compromised, only the data secured by that session key is compromised.

With forward secrecy, if a future key is compromised, then data secured in previous sessions is not compromised.

Consider Alice and Bob

Alice and Bob generate an asymmetric public and private key pair
They exchange the keys
Alice and Bob generate a session key from the asymmetric key
Alice symmetrically encrypts a message using the session key
Bob decrypts Alice’s message with the session key
Each time Alice and Bob need to send a new message, they create a new session key
If a session key is compromised, only the message encrypted by that message is compromised. Previous messages and future messages cannot be decrypted with the compromised session key.

An ephemeral key is one that is used for a short period of time. The key may be used only once or multiple times within a session. When used only once, an ephemeral key provides Perfect Forward Secrecy.

Random/Pseudo-Random Number Generation

A true Random Number Generator does not exist, because computers are logical devices. However, random number generation is important for encryption algorithms. Many algorithms rely on the use of random numbers. If a hacker were able to predict the numbers that are generated, he could compromise the algorithm.

A true Random Number Generator can be created by measuring physical/natural processes, such as thermal noise, radio noise, or radioactivity. The observation of quantum data is considered the best way to collect random data. A tool that does this may be known as a Hardware Random Number Generator, but the hardware device that observes the random data may introduce bias into the collection method.

A Pseudorandom number generator (PRNG) is a computing algorithm that generates random numbers. The numbers that a PRNG produces are not truly random because they are generated from a seed value. Many PRNGs have failed.

A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is one that can be used in cryptography. To be a valid CSPRNG

The CSPRNG should pass the “next-bit test”. Given any number of bits of a randomly generated sequence, there should not be an algorithm that can predict the next bit (or any other future bit) with a success rate of at least 50%
If the CSPRNG’s current state is compromised, it should be impossible to predict any future or previous states.

Most PRNGs will fail the above tests. Even though a PRNG appears to be random, reverse-engineering and statistical analysis will be able to crack it.

Some papers reported that the NSA placed a kleptographic backdoor in the Dual_EC_DRBG PRNG, which was published as NIST SP 800-90A. RSA received a $10 million payment from the NSA to continue using the Dual_EC_DRBG PRNG.

There are several good CPRNGs (they were considered secure at the time of writing, but nobody knows if they will stay secure forever)

Yarrow algorithm
ChaCha20 algorithm
Fortuna algorithm
Microsoft’s CrytpGenRandom function

The CPRNGs that have been standardized are

NIST SP 800-90A Rev.1 (Dual_EC_DRBG PRNG)
ANSI X9.17-1985 Appendix C
ANSI X9.62-2005, Annex D

It is likely that backdoors have been introduced into many other PRNGs and CSPRNG. Therefore, the most secure approach is to create a CSPRNG that observes a random, natural process, and try to remove biases from the measurement.

Quantum Computing

Moore’s law says that computing power doubles every two years. When we develop algorithms, we are hoping that the data stays secure for at least 100 years. If we consider that computing power doubles every two years, then we should think about how long it will take a computer invented 99 years from now to crack it.

Computers use 0’s and 1’s. If there are 8 bits in a byte. That means that every bit could have one of two values. A quantum computer uses the properties of physics to perform calculations. In a way, it is like that each bit can be both 0 and 1 at the same time, which sounds weird, but the physics behind it is complicated. Or a bit can be in more than one place at the same time. Quantum computers use bits called qubits.

A quantum computer can be used to model a complicated math problem such as

Cracking an algorithm
Factoring a large number
Solving a logistics problem
Modeling the shape of a protein
Machine learning
Search problems

A quantum computer can be thousands or even millions of times faster than a normal computer.

Right now, quantum computers are very expensive and very fragile.

Eventually, quantum computers will exist that can break any modern encryption algorithm. But somebody will develop a quantum encryption algorithm. Which brings me to the next point.

Under quantum theory, we cannot measure the location of an object without changing it. If Alice and Bob want to communicate, Alice and Bob set up a quantum channel and a classical channel. Alice sends Bob a key through the quantum channel. If the error rate is high, it is likely that Eve eavesdropped on the qubits, and changed some of them.

If not, then Alice and Bob can use the key to send encrypted data over the classical channel.

Post-Quantum Cryptography

We have an idea of how quantum computers work or are supposed to work. Right now, the focus is on developing algorithms that will survive when everybody has a quantum computer. There are none. The approach is, can we find a math problem that a quantum computer can’t solve quickly and then model an encryption algorithm after it?

Ideas so far

Lattice-Based Cryptography
Multi Variable Cryptography – there is an idea that an equation with multiple variables can’t be solved quickly, but so far all current equations have failed
Hash-Based Cryptography
Code-Based Cryptography
Supersingular Elliptic Curve Isogeny Cryptography
Symmetric Key Quantum Resistance – AES and SNOW 3G algorithms are resistant to quantum computers if we use a large enough key

Security Through Obscurity

Security through obscurity is a concept where the security architecture is hidden or camouflaged.

Security through obscurity is not a good security mechanism. It is like hiding a key under a doormat. The underlying system components were built by humans, and those humans may leave the organization with knowledge of how it works and can later compromise the system.

Modes of Operation

Let’s introduce a concept called a decryption oracle. The oracle is a computer that always runs a decryption a piece of data. A hacker might try to gain some valuable information about our encryption algorithm, by sending our decryption oracle some garbage data, and analysing the results. This is called unauthenticated encryption.

We can avoid this by creating a system that will only decrypt authenticated messages. When a message is encrypted, we add a MAC, or Message Authentication Code to it. The hacker’s messages will not have the MAC. The best way is to calculate the MAC from the ciphertext and send them together. This is called authenticated encryption.

We can also introduce a counter. Each time we encrypt some text, we increment the number. We also encrypt the counter and send it with the ciphertext. If a hacker tries to send some encrypted data, he won’t know what the counter is.

Blockchain

A blockchain is a decentralized public ledger. You might think of it as bitcoin, but it has many other applications.

Each block in the block chain represents a transaction and contains a timestamp and a hash of the block that came before it. If we want to verify that a block is valid, we can compute its hash and check the value in the next block. We can also check the digital signature of the block.

The blockchain is stored on many different servers, which run by volunteers and receive updates via peer-to-peer. When a new block is added to a chain on one server, then the other servers receive the update and update their own block chains. If a new block is added to a chain on two servers at the same time, then the algorithm has a way to decide which new block is valid. Once more than half of the servers agree on the next block, then all of the servers consider it to be valid.

There are many different blockchain algorithms.

The benefits of blockchain

There is no single point of failure since the ledger is distributed across multiple systems
The system is open
The system can’t be hacked because the data is distributed. If a hacker was able to modify the ledger on one server (and give himself a large account balance for examples), the other servers would just disagree. A private blockchain controlled by one organization can be hacked.

Stream vs Block

A Stream Cipher uses two algorithms

An encryption algorithm
A decryption algorithm

A stream cipher is necessary when the length of the plaintext is not known, such as a Wi-Fi connection or a live video.

When presented with a plaintext, the algorithm encrypts or decrypts the plaintext one bit at a time. The encryption algorithm and decryption algorithms must be in sync. Some stream ciphers can automatically resync.

The key flows in a stream, and the plaintext flows in a stream, and these two streams combine to result in a ciphertext.

For the Stream to be secure

The stream cipher must not reuse keys. If two ciphertexts are sent, which are encrypted with the same key, then the two messages can be compared to identify the key.

E(A) xor E(B) = (A xor C) xor (B xor C) = A xor B xor C xor C = A xor B
The keystream must have a large enough period (non-repeating portion)
It must not be possible to identify the key from the keystream
The key stream must not be distinguishable from random noise

A Block Cipher uses two algorithms

An encryption algorithm
A decryption algorithm

A block cipher encrypts plaintext of a specific length (the length of the block). If the plaintext is longer than the block, it must be broken into fragments that are the length of the block. The algorithm is then applied to each fragment. If the last portion plaintext is shorter than the block, then it must be padded.

There are a number of block ciphers

Iterated Block Cipher
Substitution-Permutation Network
Feistel Cipher
Lai-Massey Cipher

Security risks

Brute-force Attack. By obtaining multiple blocks of ciphertexts and comparing them, the key can be determined. The larger the block, the harder it is to determine the key. But the larger the block, the more inefficient the algorithm.
Integral Cryptanalysis. Integral Cryptanalysis compares multiple sets of plaintext blocks, where most of the value of the blocks are identical.
Linear Cryptanalysis. In Linear Cryptanalysis, equations of plaintext, cyphertext, and key bits are created. These equations are used with pairs of plaintext/ciphertext to obtain the key bits.
Differential Cryptanalysis. Differential Cryptanalysis is where the hacker obtains sets of plaintexts and ciphertexts. The hacker then compares the plaintext and ciphertext to determine the key.

Cipher Modes

Since no random generator is truly random, biases are introduced into the encrypted data. The cipher mode attempts to hide the biases. Cipher modes apply to block ciphers.

There are several modes, but the two that are recommended are CBC and CTR.

ECB – Electronic Code Book. With ECB, the plaintext is cut into blocks. Each block is encrypted separately and decrypted separately. This is the weakest form of encryption because repeating patterns are introduced into the blocks.
CBC – Cipher Block Chaining. The plaintext is cut into blocks. An initialization vector is added to the first block. Each plaintext block is XORed with the previous ciphertext blocks before being encrypted. Therefore, each block depends upon the value of all the previous blocks,.

CBC is the most common method. It is slow because encryption and decryption must take place sequentially. If there is a large amount of plaintext or ciphertext, it cannot be encrypted/decrypted in parallel with a powerful processor.
PCBC – Propagating Cipher Block Chaining. PCBC is the same as CBC except that each plaintext block is XORed with the previous plaintext block and ciphertext block before being encrypted.
CFB – Cipher Feedback. In CFB, the block cipher is converted into a self-synchronizing stream cipher. A single byte must be encrypted at a time.
OFB – Output Feedback. In OFB, a block cipher is converted into a stream cipher. The keystream blocks are XORed with the plaintext blocks to obtain ciphertext.
CTR – Counter Mode. In CTR, the block cipher is converted into a stream cipher. Each keystream block is generated by encrypting the value of a counter. Blocks can be encrypted in parallel.

Symmetric Algorithms

There are many symmetric algorithms in use, as mentioned previously.

AES (Advanced Encryption Standard)
- Used worldwide by many organizations and by the United States government
- 192 and 256-bit keys can be used to protect Top Secret data
- Replaced DES
- The algorithm uses 10, 12, or 14 rounds of encryption to convert the plaintext to ciphertext
- The algorithm uses a 2D matrix which shifts the rows and columns to generate the ciphertext
- There are several side-channel attacks that can be used to break AES, but no known ways to cryptographically break it
DES (Data Encryption Standard)
- The algorithm is a block cipher
- The key length is 56 bits (64 bits with 8 bits for parity)
- DES uses a mathematical function known as the Feistel function, which uses expansions and permutations
- Can be broken
  - Brute force attacks on a 56-bit key length can be broken in 48 hours with a set of computers that cost less than $10,000
  - Differential cryptanalysis can break it quickly
  - Linear cryptanalysis can be used if some plaintexts are known
3DES (Triple Data Encryption Algorithm)
- There are three options for implementing 3DES
  - 3DES uses three independent keys. Each key is 56-bits, but the effective key size is 112-bits.
  - 3DES uses two independent keys. Key One and Key Two are independent, and Key Three is the same as Key One. The effective key size is 112-bits.
  - 3DES uses three identical keys. This option is no longer considered secure.
- 3DES can be broken with a Meet in the Middle Attack.
  - Consider that the method for encrypting a file is to operate an encryption algorithm with a key
  - For added security, the encryption algorithm can be run multiple times, each time with a different key
  - The number of attempts required to break the algorithm would increase exponentially with each iteration of the algorithm.
  - If intermediate results of the encryption method are stored, then a Meet in the Middle Attack can be used to find keys that match a forward function from the plaintext to the intermediate and from the ciphertext to the intermediate
RC4
- RC4 or Rivest Cipher 4 was a stream cipher
- It was simple and fast
- RC1, RC2, RC3, RC5, and RC6 were also created
- The RC4 cipher should not be used because of the many security vulnerabilities
  - The RC4 produces a biased output; since the distribution of the output is not uniform, with enough ciphertext the original text can be determined
  - Fluhrer, Mantin and Shamir found that the first few bytes of the ciphertext are not random. An analysis of many messages can be used to find the key.
Blowfish.
- Blowfish is a public domain algorithm
- Blowfish was a block cipher with a key length that could range from 32-bits to 448-bits
- Blowfish used a 64-bit block size, which could be broken in a birthday attack
- Blowfish is still secure in most applications, but it is recommended to switch to Twofish.
Twofish
- Twofish is a public domain algorithm that succeeded Blowfish
- Twofish used 128-bit block sizes
- It has not been broken, but then again, it is not widely used, so very few efforts have been made to break it

Asymmetric Algorithms

An asymmetric algorithm is also known as public-key cryptography. It involves two keys: a public key and a private key. The public key is used to encrypt data and the private key is used to decrypt data. Only the private key must be kept private.

Here is how it works. Bob generates a private key. He uses the private key to generate a public key. He publishes the public key on the internet. Alice needs to send Bob a message. She finds Bob’s public key. She uses Bob’s public key to encrypt a message. She sends the message to Bob and Bob uses his private key to decrypt the message.

There are many asymmetric algorithms, including RSA, Diffie-Hellman, and DSA.

There are multiple asymmetric algorithms available

RSA (Rivest–Shamir–Adleman)
- (m^e)^d = m (mod n) is the equation that represents it
- How the RSA key is generated
  - Choose two prime numbers p and q. These prime numbers are private
  - Compute n = pq, where n becomes the modulus. n is the part of the public key
  - Calculate λ(n), which is kept secret.
  - Choose an integer e such that 1 < e < λ(n) and gcd(e, λ(n)) = 1, e becomes part of the public key
  - Calculate d as d ≡ e−1 (mod λ(n)), where d is kept private
  - Thus, n and e are public, and p, q, and d are private
  - The effect is that it is easy to multiply two prime numbers (private key) but difficult to factor the product of two large prime numbers (public key)
  - The public key can be distributed, and the private key is kept secret.
- RSA Security Risks
  - The RSA key can be cracked if the prime numbers p and q are small
  - RSA can be cracked if a message is sent to multiple recipients who have the same e, but different p, q, and n values. The Chinese Remainder Theorem can be used.
  - The multiplicative property applies. m₁^em₂^e ≡ (m₁m₂)^e (mod n). A hacker can ask a private key holder to decrypt a ciphertext where c’ = cr^e (mod n). The hacker can use it to determine the value of c = m^e (mod n).
  - Coppersmith’s attack on RSA
  - Faulty Key Generation. If d is not large enough, and p is between q and 2q, then d can be calculated from the public key
  - Random Number Generation. If the p and q are not selected at random, then the p and q can be guessed. Many embedded applications have the values of p and q hardcoded.
DSA (Digital Signature Algorithm)
- DSA uses a public key and a private key. The private key is used to generate a digital signature.
- The key is generated from several parameters
  - Cryptographic Hash Function, H (typically SHA-1 or SHA-2)
  - Key Length L
  - Modulus length N such that N < L and N < |H|
  - N-bit prime q and L-bit prime p, such that p-1 is some multiple of q
  - Random integer h from {2 to p -2}
  - Calculate g from
  - The result is p, q, and g. All users of the system receive these parameters.
  - For each user, choose integer x from the set {1 to q – 1}
  - Calculate y = g^x mod p
  - X becomes the private key and y is the public key
- A user can sign their message with x by calculating
  - R = g^k mod p
  - s := (k-¹⁽H(m) + xr)) mod q
  - The signature is (r, s)
- A user can verify that the signature is valid by using y
  - Verify that 0 < r < q and 0 < s < q
  - Calculate w := s^-1mod q
  - Calculate u₁ := H(m) w mod q
  - Calculate u₂ := r w mod q
  - Calculate v := (g^u1 y^u2 mod p) mod q
  - If v = r, then the signature is valid
Diffie-Hellman Groups
- Diffie-Hellman is an algorithm for exchanging keys
- How it works
  - Alice and Bob select a modulus and base
  - Alice chooses a secret integer a, and calculates A = g^a mod p, which she sends to Bob
  - Bob chooses a secret integer b, and calculates B = g^b mod p, which he sends to Alice
  - Alice calculates s = B^a mod p
  - Bob calculates s = A^b mod p
  - Alice and Bob have the same result, which is the shared secret, but neither of them had to share their secret.
- Diffie-Hellman can work with more than two parties
- Diffie-Hellman is secure against eavesdroppers
- Ephemeral Diffie-Hellman (DHE) can also be used
  - If Eve sat in between Alice and Bob, she could perform a Man in the Middle attack during the key generation
  - Eve would intercept the values that Alice and Bob sent to each other and change them. In other words, Eve pretends that she is Bob to Alice and Alice to Bob. Bob thinks he is speaking with Alice, but he is talking to Eve, and Alice thinks she is talking to Bob, but she is talking to Eve.
  - Eve could cause Alice and Bob to generate different secrets
  - Then Eve would decrypt the messages from Alice, encrypt them, and send them to Bob, and decrypt the messages from Bob, encrypt them, and send them to Alice
  - To prevent this kind of attack, a different key is generated for each session
  - Each time Alice and Bob connect, they generate a new set of keys. This is known as Ephemeral Diffie-Hellman
- Elliptic-Curve Diffie-Hellman (ECDH) is another key generation method
  - ECDH works with elliptic curve keys. It is more secure than the standard Diffie-Hellman.
Elliptic Curve
- We mentioned Elliptic Curve cryptography earlier
PGP/GPG
- PGP or Pretty Good Privacy and GPG is Gnu Privacy Guard
- PGP uses a web of trust
- Key features
  - Fingerprint. A public key can be shortened to a short fingerprint. A user can publish a fingerprint of their key.
  - Compatibility. As PGP changes, new features are added. Users who communicate over PGP must ensure that they are using the same version.
  - Confidentiality. PGP can be used to send confidential messages.
    - The sender generates a one-time symmetric key, known as the session key
    - The sender sends the message and the session key to the recipient
    - To protect the session key, the sender encrypts it with the recipient’s public key
  - Digital Signature
    - PGP can be used to prove that a message has not been modified after it was sent
    - PGP can be used to prove that a specific person sent the message
    - Like other functions, PGP calculates a hash of the message and then digitally signs the hash
  - Certificates
    - Public keys must be distributed in a way so that it is possible to verify that a given public key belongs to a specific individual. This is known as the web of trust.
    - A public key may be digitally signed by a third party to confirm that it belongs to a specific person.
    - A trust signature can be used to create a certification authority. The trust signature belongs to a person who can sign other keys.
    - A certificate can be revoked if it becomes compromised. A certificate can also contain an expiry date
- Security Quality
  - There is currently no known method of breaking PGP
  - PGP is subject to regular updates, but none of the algorithms in use have been broken
- History
  - PGP was developed by Phil Zimmermann in 1991
  - Exporting munitions without a license is illegal
  - Encryption technology with keys greater than 40 bits was considered a munition
  - Therefore, by putting the PGP source code on the internet, Phil Zimmermann exported a weapon without a license
  - He was never charged, but he was subject to a three-year criminal investigation
  - In Junger v. Daley, the United States Circuit Court of Appeals for the Sixth Circuit ruled that software source code was protected by the first amendment. In other words, code is speech.
  - In Bernstein v. United States, the United States Circuit Court of Appeals for the Ninth Circuit ruled that software source code was protected by the first amendment. The opinion was withdrawn, and the case was dismissed due to a technicality.
- PGP itself is not easy to implement. PGP software applications for enterprise are sold by the PGP Corporation, which produces products such as PGP Desktop, PGP E-mail, and PGP Whole Disk Encryption.
- An Open Source PGP JavaScript library is available to implement in web applications
- GPG or GNU Privacy Guard is an application that uses PGP
  - RSA, ElGamal, DSA, 3DES, IDEA, Blowfish, Twofish, AES, MD5, and SHA-1 are all supported algorithms

Asymmetric algorithms can be compromised by a man-in-the-middle attack (as discussed earlier).

If the communication is encrypted and uses public key cryptography (such as Apple iMessage), a man-in-the-middle attack is more difficult. Users encrypt messages with public keys (which they obtain from a central directory). If Alice wants to send a message to Bob, she obtains Bob’s public key from Apple, encrypts the message, and sends it to Bob. Bob uses his private key (which only he knows) to decrypt the message.

If Eve can intercept the message

She generates her own public and private keys
She hacks into the central directory and changes Bob’s public key to her own. She copies Bob’s public key
When Alice queries the directory, she receives what she thinks is Bob’s public key (but is in fact Eve’s public key)
Alice encrypts the message with Eve’s public key and sends the message to Eve (thinking that she is sending it to Bob and using Bob’s public key)
Eve decrypts the message, reads it, and then encrypts it with Bob’s public key
Eve sends the message to Bob
Bob receives the message, thinking it came from Alice and decrypts it with his own private key

Eve does the same thing with Alice’s public key so that she can intercept messages that Bob is sending to Alice

Steganography

As mentioned previously, Encryption converts plaintext data so that it can appear in the open without being read.

A Steganography Tool does not encrypt data but hides it instead. If an eavesdropper located the hidden data, he or she would be able to read it. For added security, the hidden data should also be encrypted.

Steganography is used when encryption or encryption alone cannot protect the data. For example, a user is in a country where transmitting encrypted data is illegal or would raise suspicions but needs to send a secure message.

The steganography carrier

The carrier is the medium carrying the hidden message. It could be a phone call, an e-mail, or a file transmission. We commonly hide messages in audio, video, or images. The carrier is modified to carry the hidden message. It is important that
- The carrier is much larger than the message. If the message increases the size of the carrier substantially, then eavesdroppers will notice that the carrier has been modified.
- The carrier must be unique. If the carrier is publicly available, then the modified carrier can be compared against the original.
Data can be split among multiple carriers
Carrier engine. A carrier engine uses an algorithm to inject hidden data into legitimate data. The types of algorithms include adaptive substitution, frequency space manipulation, injection, generation, and metadata substitution.

Encryption converts plaintext data so that it can appear in the open without being read.

Homomorphic Encryption

Homomorphic Encryption is an idea where we can perform calculations on the encrypted data without decrypting it first. When we decrypt the data, the result of the computation is the same as if we had performed the calculation on the unencrypted data.

Homomorphic Encryption allows us to protect user privacy. If the data at rest is encrypted, then it reduces computing power because we don’t need to decrypt the data first. It also means that an untrusted party can perform computations on our data without seeing the contents.

It has been applied to multiple encryption algorithms. There are two types of homomorphic encryption

Common Use Cases

Common use cases of encryption

Low Power Devices
- Phones and other mobile devices have limited computing power. Computing power consumes battery life. By creating more efficient encryption algorithms, these devices can be secured without consuming substantial system resources or slowing them down.
Low Latency
- A low latency application is one where the data must be transmitted and received in almost real time. An example is a live two-way video stream. If the encryption and decryption processes take a substantial amount of time, the video stream will experience latency and the user experience will suffer.
- A stream cipher can encrypt and decrypt a low latency application.
High Resiliency
- A high resiliency system continues to function after experiencing a disruption
- Algorithms must be capable of operating even when parts of the ciphertext are lost
Supporting Confidentiality
- Confidentiality is where the data that is sent is only seen by the sender and the recipient
- Encryption algorithms keep the data confidential
Supporting Integrity
- Integrity ensures that the data is not modified while in transit
- A hash function protects data integrity
Supporting Obfuscation
- An obfuscated object is one that is hidden so that it is not readily visible
- Obfuscation is not a secure method of protecting data but may be used when other options are limited
Supporting Authentication
- Authentication verifies that the person who claimed to send the message is the person who sent the message; that is, that the identity of the sender can be verified
Supporting Non-Repudiation
- Non-repudiation ensures that the sender cannot later deny sending the message
- If a message is digitally signed, it is assumed that it was sent by the signer
- A user should not rely on a message that does not support non-repudiation

Limitations to Encryption

When we choose an algorithm, we must take the following into consideration

Speed. Can the application perform adequately, or will the encryption algorithm slow it down? The speed depends on the algorithm, the key length, the number of iterations, and the processor power.
Size. What is the size of the block that is being encrypted?
Weak Keys. We should choose a key that is long enough so that it cannot be easily broken.
Time. How long will it take to encrypt or decrypt a file?
Longevity. How long can the algorithm be used for? How long until computing power that can decrypt the ciphertext is developed?
Predictability. Is the algorithm random, or predictable? If a hacker gathers enough encrypted data, can be perform a frequency analysis to determine the result?
Reuse. Can the keys be reused, or should we select a new session key each time?
Entropy. Entropy is a randomness that the system collects for use in the crypto function. Entropy is usually collected from natural processes such as the weather. If the entropy isn’t random enough, then it can be predicted, and it makes the algorithm weak.
Computational Overheads. How much processing power does the algorithm consume?
Weak/Deprecated Algorithms. A Weak Algorithm should never be used. A Weak Algorithm is one that can easily be broken. A software developer should take careful to encode algorithms in a way that allows for their quick replacement when security vulnerabilities are discovered. In fact, an algorithm should be changed before they are deprecated (once new, stronger, proven algorithms are available).

A deprecated algorithm is one that has been officially replaced with a newer one.

Examples of weak algorithms

MD5
- SHA-1
- 1024-bit RSA
- 160-bit elliptic curves
- DES
- Various homemade encryption algorithms. Do not make your own algorithm.
Resource vs Security Constraints. The organization must weigh the resource requirements of each available algorithm against its benefits

In general, the more secure the algorithm, the longer it will take to encrypt and decrypt the data

We want to encrypt data in a way that protects it for a long time, but we also don’t want to impact the user experience, and we do not want to spend money on expensive encryption hardware. We must find a balance.