3.6 Select and determine cryptographic solutions

  • Cryptographic life cycle (e.g., key management, algorithm selection)
  • Cryptographic methods (e.g., symmetric, asymmetric, elliptic curves)
  • Public Key Infrastructure (PKI)
  • Key management practices
  • Digital signatures
  • Non-repudiation
  • Integrity (e.g., hashing)

When we implement encryption, the type of algorithm selected must be based on

  • The types of data being encrypted

  • The crypto service provider

    • The crypto service provider is a software library that provides cryptographic functions

    • Instead of manually writing code to implement a cryptographic function in an application (which may introduce security vulnerabilities and take forever), a developer can incorporate a pre-existing library.  The library is the engine that performs the encryption and decryption; we just plug it in.

    • The library takes a plaintext input and outputs a ciphertext output or vice versa.  The developer does not need to worry about the internal function of the library.

    • The library can be replaced when newer or more secure libraries are made available, without having to modify the rest of the application.

  • Crypto module

    • A crypto module is a physical device that performs encryption and decryption

    • The module may have specially designed hardware that is specifically suited to perform the encryption and decryption functions

    • An example is an AES engine in the Apple iPhone.  This engine is a physical processor that performs encryption and decryption, and no other function.  It is specially designed to remove encryption and decryption workloads from the iPhone’s main processor.

An encryption algorithm is like a lock on your door.  There are many types of locks.  Some are good, and some are bad.  Even if you and your neighbor have the same brand of lock, you won’t have the same keys.  An algorithm is the same thing. 

You must choose a key to make the algorithm work.  Each user who implements the algorithm hopefully chooses a different key.  We should be careful to manage keys securely.  Keys must be generated randomly.  A key that is not generated by a truly random process can be guessed.

A symmetric algorithm is one that uses the same cryptographic key to encrypt and decrypt text.  The keys may be identical or easily transformed from an encryption key to a decryption key and represent a shared secret between the two parties attempting to communicate.

A symmetric algorithm is less convenient than an asymmetric algorithm (public key cryptography) because both parties must have access to the same key.

Alice and Bob need to communicate.  Alice and Bob agree on a shared secret (the key), but how can Alice share the key with Bob without anybody seeing it?  She must put it on a USB drive and physically give it to Bob, or she must send it over the internet, or she must write it down on a piece of paper and give it to Bob.  If Alice encrypts the key with another key, then she would have to share that key as well.

If Carol needs to communicate with Bob, but Carol is thousands of miles away, and has never met Bob, then both Carol and Bob must exchange their shared secret over the internet.

Bob must use a different key with Carol than he did with Alice.  If not, then Alice could decrypt a message between Carol and Bob.  If Bob needs to communicate with many people, then Bob needs many keys.  Let’s say that Bob is a bank with hundreds of thousands of customers.  Bob needs a separate key for every customer.

Thus, the symmetric algorithm can be compromised regardless of the length of the key or the strength of the algorithm.  On the internet, it is difficult to distribute a symmetric key with another party.  What is to stop people from seeing Bob’s key the first time that he shares it?

An asymmetric algorithm is also known as public-key cryptography.  It involves two keys: a public key and a private key.  The public key is used to encrypt data and the private key is used to decrypt data.  Only the private key must be kept private.

With an asymmetric algorithm, Bob generates a private key.  He uses the private key to generate a public key.  He publishes the public key on the internet.  Alice needs to send Bob a message.  She finds Bob’s public key.  She uses Bob’s public key to encrypt a message.  She sends the message to Bob and Bob uses his private key to decrypt the message.

Alice does the same thing so that she can receive encrypted messages from Bob.  Alice generates a private key.  She uses the private key to generate a public key.  She publishes the public key on the internet.  Bob finds Alice’s public key.  He uses Alice’s public key to encrypt a message.  He sends the message to Alice and Alice uses her private key to decrypt the message.

There are many asymmetric algorithms, including RSA, Diffie-Hellman, and DSA.

The algorithm can be compromised by a man-in-the-middle attack (as discussed earlier).

  • If the communication is encrypted and uses public key cryptography (such as Apple iMessage), a man-in-the-middle attack is more difficult.  Users encrypt messages with public keys (which they obtain from a central directory).  If Alice wants to send a message to Bob, she obtains Bob’s public key from Apple, encrypts the message, and sends it to Bob.  Bob uses his private key (which only he knows) to decrypt the message.  Eve can intercept the message

    • She generates her own public and private keys

    • She hacks into the central directory and changes Bob’s public key to her own.  She copies Bob’s public key

    • When Alice queries the directory, she receives what she thinks is Bob’s public key (but is in fact Eve’s public key)

    • Alice encrypts the message with Eve’s public key and sends the message to Eve (thinking that she is sending it to Bob)

    • Eve decrypts the message, reads it, and then encrypts it with Bob’s public key

    • Eve sends the message to Bob

    • Bob receives the message, thinking it came from Alice and decrypts it with his own private key

    • Eve does the same thing with Alice’s public key so that she can intercept messages that Bob is sending to Alice

A Key Exchange is a way for two people to exchange keys.  Alice and Bob want to talk securely.  Alice and Bob need to exchange encryption keys over the internet, but the internet is not secure.  How can Alice and Bob securely exchange their keys, so that the further communication between them is encrypted?  How can they be sure that the key is not stolen or intercepted?

There are several algorithms

  • Diffie-Helman Key Exchange.  This allows two people to exchange keys even when there is an eavesdropper who sees all their communications.

  • Public Key Infrastructure.  A certification authority (CA) holds the public keys for all users.  A limitation of PKI is that there is no way to verify that the central authority has provided the key corresponding to the user with whom you want to speak to (a malicious authority could substitute their own key so that they can intercept your communications).

  • Web of Trust or PGP.  PGP allows two users to directly exchange keys without the use of a CA.

  • Quantum Key Exchange.  Quantum Key Exchange uses the quantum theory that an observation/measurement of an object in a quantum state modifies the object.  If an eavesdropper observes the key exchange, the key will be modified, and the parties exchanging the key will be notified.

A digital signature proves that a specific person signed a specific message.  A valid signature proves that the message was created by the signer and that it was not modified in transit.

An electronic signature is not legally accepted in all countries.  Some documents or transactions may require a “wet signature” in ink.

A digital signature has the following

  • A key generation algorithm

  • A signing algorithm

  • A verification algorithm

As before, Alice would like to provide Bob with a signed document. 

  • Alice selects a private key

  • Alice uses the private key to create a public key, which is published and available

  • Alice uses the private key to sign the document

  • For the key to be valid, it must be impossible to generate a signature without knowing the private key

  • Alice sends the signed document to Bob

  • Bob retrieves Alice’s public key and uses it to verify her signature.  Bob uses the verification algorithm.

Typically, Alice will not sign the entire document.  She will generate a hash (a mathematical representation) of the document and then sign the hash.  Mathematically speaking, there are a finite number of hashes, but an infinite number of potential documents (messages).

  • It is more efficient to calculate a signature on a hash than on the entire document

  • The document may be too large to sign, in which case it must be broken into separate portions, each of which must be signed separately

  • The signature algorithm may not be able to sign/perform calculations on the entire content of the document, especially if it contains attachments, photographs, or other types of data.  Some signature algorithms only work on text.

How to forge a digital signature

  • Choose a random signature

  • Use the verification algorithm to generate a different message that could correspond to the signature (or the hash that could correspond to the signature) – this is called a collision.  This is practically impossible when the users are using a strong hash.

  • There are three types of forgery

    • Existential forgery

      • Create a message and signature pair that was not created by the legitimate signer.  Since there is a finite number of hashes but an infinite number of documents, we might be able to randomly generate a document that matches a hash.

      • This is considered the weakest hacking method.  An algorithm that prevents existential forgery is practically unforgeable.  The message we find might not make any sense or might not be useful.

    • Selective forgery

      • Create a message and signature pair for a pre-selected message

    • Universal forgery

      • Create a valid signature for any message

      • This is the strongest hacking method

Other ways to forge a digital signature – What You See is What You Sign

  • The security of the computing platform is more important than the security of the key

  • A user is signing what he sees on the screen, but an untrusted application can display a different document than what is saved

  • Therefore, the user’s signature can be applied to a fake document if a malicious person takes control of his computer

  • The user is not actually inspecting the bits that are signed (or may not inspect the document after it has been saved)

Benefits of a digital signature

  • A user can prove that the document was signed by the person who claimed to sign it

  • A user can prove that the document was not changed during transmission

  • The signer cannot later deny that they signed the document

The digital signature algorithm

  • The algorithm must be secure

  • The users must properly use their signatures

  • The private key must be kept private

    • Place the key on a smart card

    • Only use your digital signature with trusted applications

  • The owner of the public key must be verifiable

There are many laws related to signatures.  Originally, a facsimile of a signed document was accepted like the original, but an e-mailed/scanned document was not.  Governments have passed laws requiring the acceptance of digital/electronic signature.

For example, Title 21 CFR Part 11 requires the US Federal Government to accept electronic signatures. 

An electronic signature could be a printed name or an image of a signature.  It is not necessarily a digital signature, nor is it necessarily cryptographically secure.

The Electronic Signatures in Global and National Commerce Act or ESIGN Act allows electronic signatures to be used for transactions that affect interstate or foreign commerce.  Consumers have the right to opt out of an electronic signature requirement.

Non-repudiation ensures that the sender cannot later deny sending the message.  If a message is digitally signed, it is assumed that it was sent by the signer. A user should not rely on a message that does not support non-repudiation.

A Hash function converts a piece of data from one format into another.  A Hash function is typically one-way.  The goal of the hash function

  • Convert a piece of data from a variable length to a fixed length.  An encryption algorithm may require an input of a fixed length.  For example, an encryption algorithm may require a key that is 32-bits long, but users can select passwords of different lengths.  The user’s password can be converted into a hash that fits into the algorithm.

  • Hide the original data.  Original, plaintext passwords should not be stored.  A user password can be converted into a hash that can be stored.  If the hash is compromised, then the password will not be compromised.  Many website databases are compromised, but if the user passwords are hashed, then the original passwords are not determined.  This only applies to a non-reversible hash.

    Since the hash is not reversible, then multiple inputs may result in the same hash.

There are many hash functions.  Attributes of a good hash function

  • Consider the mathematical space of the hash output.  A 32-bit hash for example is 32 characters long.  If the hash is only letters and numbers, then the hash would have 3266 possible values. 

    The number of possible password input combinations is always infinite.

    A hash should be continuous along the entire space, and there should be an equal probability of generating each output value.

  • Be impractical to guess two inputs that produce the same output.  Given an input m1, it should be difficult to find a different input m2 such that hash(m1) = hash(m2).  This is known as weak collision resistance.

    It should be difficult to find two different messages m1 and m2 such that hash(m1) = hash(m2).  This is known as strong collision resistance. 

  • A hash should be efficient.  There is a trade-off between how fast a hash can be searched and how much disk storage space it occupies.  A large hash index can be searched quickly but occupies more space on the disk. 

    It is better to have a faster hash function over a slow one.

  • Deterministic.  The hash must always return the same output for a given input.

  • Not practical to be able to predict the input that created a given output.  Given a hash value h it should be difficult to find any message m such that h = hash(m).  This is known as pre-image resistance.

  • A small change to the input should result in a large change to the output.  It can be assumed if two strings appear similar and have the same hash output, then they are identical.

  • A hash should be difficult

    • Mathematically speaking, difficult means that it is not solvable in asymptotic polynomial time

    • Cryptographically speaking, difficult means that it is secure against any person for as long as the system must be operational

The use of the hash

  • Message integrity.  When a message is first sent, a hash of the message is computed by the sender.  The hash is shared with the recipient.  The recipient can calculate the hash on the message received.  If the calculated hash matches the original hash, the recipient knows that the message did not changed during transmission.

  • Digital Signature.  A hash is calculated over the entire message that needs to be signed.  The hash is then signed.  A calculation to produce a digital signature is complicated, and it would save computing power to calculate the signature on the hash instead of on the entire message.

  • Password.  Passwords are stored as hashes instead of plain text.  As discussed earlier, when a user account is created, the password is hashed, and the hash is stored.  During a log in attempt, the hash of the entered password is calculated and compared to the original hash.  If they are the same, then the user is authenticated.  A hash allows a service provider to avoid storing the password in plaintext.

  • Proof-of-work.  Proof-of-work is a system that is used to prevent denial of service attacks and to prevent automated systems from accessing websites.  The system that is attempting to access a resource is presented with a difficult mathematical equation that involves calculating a hash.  If the hash is correct, then the system is provided access.  The equation must be difficult to calculate but easy to verify.

    The machine requiring access is forced to use some of its computing resources before gaining access.  If the machine is operated by a legitimate human user, then the calculation represents a minor inconvenience of a few seconds.  If the machine is a bot making thousands of requests, the calculation would consume its resources.

    Proof-of-work is also used by Bitcoin mining.

  • File identifier.  A hash of a file can be calculated to uniquely identify the file.  In a file system with millions of files, each file can be retrieved by providing its hash.  File hashing is used by online file sharing systems.

The hash functions that are commonly used

  • MD5

    • Produces a 128-bit hash

    • MD5 was used to calculate checksums for files that have been transferred over the internet.  It would verify that the file that was sent was also the one that was received.  MD5 was also used to store passwords.

    • MD5 is not secure because it is easy to find a collision and should not be used.

  • SHA-1

    • Produces a 160-bit hash

    • Used by the government to protect sensitive, unclassified material

    • Computing power to find a collision can be obtained for less than $100,000

    • In the process of being replaced by SHA-2

  • SHA-2

    • Produces a 256-bit or 512-bit hash

    • Used by Bitcoin

    • SHA-2 is still in use and considered secure

  • SHA-3

    • Produces a 256-bit or 512-bit hash

    • SHA-3 is not designed to replace SHA-2 as SHA-2 is still considered secure

    • Instead, SHA-3 is designed as an alternative that is available should SHA-2 be compromised in the future (and it will be!)

    • SHA-3 uses a sponge construction.  Data is absorbed, transformed, and then pushed out.

  • Blake, Blake2, and Blake3

    • Blake functions were developed in various cryptographic competitions

    • Blake2 is faster and more secure than SHA-2.  It is faster and similarly secure to SHA-3.

An elliptic curve is a curve on a plane (2-dimensional) over a specific field, with a single point at infinity.  I would try to draw it, but it is infinite, so we would run out of paper. 

In summary, if you pick two points on an elliptic curve and add them together, the result will be a third point that is on the curve.  Elliptic curve cryptography does not consume much computing power; therefore, it is well suited for mobile devices.

The curve is defined by the equation

We can use the mathematical properties of this curve to encrypt our data.  There are many algorithms that use elliptic curves

  • The Elliptic Curve Diffie–Hellman (ECDH) key agreement scheme

  • The Elliptic Curve Integrated Encryption Scheme (ECIES), also known as Elliptic Curve Augmented Encryption Scheme or simply the Elliptic Curve Encryption Scheme

  • The Elliptic Curve Digital Signature Algorithm (ECDSA)

  • The Edwards-curve Digital Signature Algorithm (EdDSA)

To implement the curve

  • The users must agree on the domain (the shape of the curve).  In general, the requirement is to select several points on the curve

    • P, the field size

    • a and b, which define the equation

    • G, the base of the curve

    • n, such that nG = O, where n is the O is the point at infinity of the curve

    • h where , and where h is an integer between 1 and 4

  • The users can choose a random elliptic curve and calculate the above points, choose the points randomly and then generate a curve, or use a standard curve.  It is faster to use a standard curve because calculations are complicated and time-consuming

    • FIPS PUB 186-4 lists several standard curves for use by government encryption methods

  • Key size

    • The field should be twice the size of the encryption key size.  A 256-bit key requires a curve that is 2512

  • Some curves can be calculated faster than others

    • Projective coordinates can be used to add different points on a curve without using inversion

    • Fast reduction can be accomplished if p is a pseudo-Mersenne prime, where p is approximately 2d, or p = d2 -1

Security risks

  • Side channel attack

    • In Elliptic Curve Cryptography, a different operation is used to calculate addition and multiplication, each of which uses different amounts of power.  A hacker could monitor the processor to detect the power consumption and timing to determine the curve parameters.

    • A side channel attack can be prevented by using an Edwards curve.  The Edwards curve allows addition and multiplication in the same operation.

  • Backdoor

    • Some have alleged backdoors are present in some of the number generators used to create elliptic curves.  If you are lazy and choose a curve from a preselected list, then you may be subject to a backdoor.

      It may be better to generate your own curve, if you can verify that it is secure

In perfect forward secrecy, a session encryption key will not be compromised, even if the server private key is compromised.

Recall that Alice and Bob are still talking.  Alice has a public key and a private key.  Each time that Alice and Bob talk, they each use their private key to generate a session key.  The session key is used only for that conversation, and then discarded.  If a session key is compromised, only the data secured by that session key is compromised.

With forward secrecy, if a future key is compromised, then data secured in previous sessions is not compromised.

Consider Alice and Bob

  • Alice and Bob generate an asymmetric public and private key pair

  • They exchange the keys

  • Alice and Bob generate a session key from the asymmetric key

  • Alice symmetrically encrypts a message using the session key

  • Bob decrypts Alice’s message with the session key

  • Each time Alice and Bob need to send a new message, they create a new session key

  • If a session key is compromised, only the message encrypted by that message is compromised.  Previous messages and future messages cannot be decrypted with the compromised session key.

An ephemeral key is one that is used for a short period of time.  The key may be used only once or multiple times within a session.  When used only once, an ephemeral key provides Perfect Forward Secrecy.

Random/Pseudo-Random Number Generation

A true Random Number Generator does not exist, because computers are logical devices.  However, random number generation is important for encryption algorithms.  Many algorithms rely on the use of random numbers.  If a hacker were able to predict the numbers that are generated, he could compromise the algorithm.

A true Random Number Generator can be created by measuring physical/natural processes, such as thermal noise, radio noise, or radioactivity.  The observation of quantum data is considered the best way to collect random data.  A tool that does this may be known as a Hardware Random Number Generator, but the hardware device that observes the random data may introduce bias into the collection method.

A Pseudorandom number generator (PRNG) is a computing algorithm that generates random numbers.  The numbers that a PRNG produces are not truly random because they are generated from a seed value.  Many PRNGs have failed.

A cryptographically secure pseudorandom number generator (CSPRNG) or cryptographic pseudorandom number generator (CPRNG) is one that can be used in cryptography.  To be a valid CSPRNG

  • The CSPRNG should pass the “next-bit test”.  Given any number of bits of a randomly generated sequence, there should not be an algorithm that can predict the next bit (or any other future bit) with a success rate of at least 50%

  • If the CSPRNG’s current state is compromised, it should be impossible to predict any future or previous states.

Most PRNGs will fail the above tests.  Even though a PRNG appears to be random, reverse-engineering and statistical analysis will be able to crack it.

Some papers reported that the NSA placed a kleptographic backdoor in the Dual_EC_DRBG PRNG, which was published as NIST SP 800-90A.  RSA received a $10 million payment from the NSA to continue using the Dual_EC_DRBG PRNG.

There are several good CPRNGs (they were considered secure at the time of writing, but nobody knows if they will stay secure forever)

  • Yarrow algorithm

  • ChaCha20 algorithm

  • Fortuna algorithm

  • Microsoft’s CrytpGenRandom function

The CPRNGs that have been standardized are

  • NIST SP 800-90A Rev.1 (Dual_EC_DRBG PRNG)

  • ANSI X9.17-1985 Appendix C

  • ANSI X9.62-2005, Annex D

It is likely that backdoors have been introduced into many other PRNGs and CSPRNG.  Therefore, the most secure approach is to create a CSPRNG that observes a random, natural process, and try to remove biases from the measurement.

Quantum Computing

Moore’s law says that computing power doubles every two years.  When we develop algorithms, we are hoping that the data stays secure for at least 100 years.  If we consider that computing power doubles every two years, then we should think about how long it will take a computer invented 99 years from now to crack it.

Computers use 0’s and 1’s.  There are 8 bits in a byte.  That means that every bit could have one of two values.  A quantum computer uses the properties of physics to perform calculations.  In a way, it is like that each bit can be both 0 and 1 at the same time, which sounds weird, but the physics behind it is complicated.  Or a bit can be in more than one place at the same time.  Quantum computers use bits called qubits.

A quantum computer can be used to model a complicated math problem such as

  • Cracking an algorithm

  • Factoring a large number

  • Solving a logistics problem

  • Modeling the shape of a protein

  • Machine learning

  • Search problems

A quantum computer can be thousands or even millions of times faster than a normal computer when given the right problem to solve.

Right now, quantum computers are very expensive and very fragile.

Eventually, quantum computers will exist that can break any modern encryption algorithm.  But somebody will develop a quantum encryption algorithm.  Which brings me to the next point.

Under quantum theory, we cannot measure the location of an object without changing it.  If Alice and Bob want to communicate, Alice and Bob set up a quantum channel and a classical channel.  Alice sends Bob a key through the quantum channel.  If the error rate is high, it is likely that Eve eavesdropped on the qubits, and changed some of them. 

If not, then Alice and Bob can use the key to send encrypted data over the classical channel.

Post-Quantum Cryptography

We have an idea of how quantum computers work or are supposed to work.  Right now, the focus is on developing algorithms that will survive when everybody has a quantum computer.  There are none.  The approach is, can we find a math problem that a quantum computer can’t solve quickly and then model an encryption algorithm after it?

Ideas so far

  • Lattice-Based Cryptography

  • Multi Variable Cryptography – there is an idea that an equation with multiple variables can’t be solved quickly, but so far, all current equations have failed

  • Hash-Based Cryptography

  • Code-Based Cryptography

  • Supersingular Elliptic Curve Isogeny Cryptography

  • Symmetric Key Quantum Resistance – AES and SNOW 3G algorithms are resistant to quantum computers if we use a large enough key