Cryptanalysis 101

Cryptanalysis is the science and art of attacking cryptosystems. In most cases the term specifically refers to finding flaws in encryption algorithms. The proper design of ciphers cannot be accomplished without having knowledge of the attacks against them. I don't think any other security field requires that the defender be this good at attacking. New cipher designs must be subjected to cryptanalysis by both the author and other researchers before being considered for use. This of course is not always true and many product employ what is called homegrown crypto (attacks that non-cryptographers naively create as an afterthought). Anyway, the ultimate goal of an attack is to recover the key being used to encrypt the target information.

Brute Force vs. "Breaks"

There is one method of key recovery which is always available: brute force. This means trying every possible key and using it to decrypt some collected ciphertext. Eventually you will hit the correct key and you win. Now what does "eventually" mean? Well, the worst-case scenario is that the correct key will be the last one you test. On average, the key will be found at the half-way mark. The number of possible keys is called the keyspace and the amount of information (bits) that is takes to represent a key inside this keyspace is called the key length. Cryptanalysts try to find breaks in the system/cipher. Although this term is a general one (similar to "hack" but specific to crypto), it pretty much means a trick to discover the key with greater probability and/or in less computational work than brute force. This includes identifying key candidates that are more likely than others and also eliminating key candidates that are less likely than others. The less work that needs to be done to recover the key using this trick is one aspect of what makes some breaks more impressive than others.

Classifying Attacks

Now that we have defined "breaking" a cipher in pretty wide terms (i.e: find a problem), it's time to broadly classify some cryptanalytic attacks. One method of categorizing them is by asking "What power/information is needed to perform the attack?" In all of these situations, the same key is expected to be used throughout and the goal to to recover it. Here are a few of the more common types of attacks:

Frequency Analysis

Cryptogram puzzles are solved for enjoyment and the method used against them is usually some form of frequency analysis. This is the act of using known statistical information and patterns about the plaintext to determine it. In cryptograms, each letter of the alphabet is encrypted to another letter. This table of letter-letter translations is what makes up the key. Because the letters are simply converted and nothing is scrambled, the cipher is left open to this sort of analysis; all we need is that ciphertext. If the attacker knows that the language used is English, for example, there are a great many patterns that can be searched for. Classic frequency analysis involves tallying up each letter in the collected ciphertext and comparing the percentages against the English language averages. If the letter "M" is most common then it is resonable to guess that "E"-->"M" in the cipher because E is the most common letter in the English language. These sorts of clues can be bounced off each other to derive the key and the original plaintext. The more collected ciphertext the attacker has, the better this will work. As the amount of information increases, its statistical profile will draw closer and closer to that of English (for example). This sort of thing can also be applied to groups of characters ("TH" is a very common combination in English for example). The example frequency analysis image above was performed on the first three sentences of this paragraph turned into a cryptogram. As you can see, the English language is very predictable with regard to letter frequency and this can exploited in some situations to break ciphers.

Toy Ciphers and Practicing

As we learn more about cryptanalysis and read papers about attacks, we'll need a way to actually learn what is read. I like to create toy ciphers to try out attacks or develop new ones. These are encryption algorithms that are invented for the purpose of being broken. You can just string some cipher components together in a way that makes sense for the attack you'd like to try. By starting simple and not just copying whatever cipher is broken in the paper you're reading, you can learn a lot more. Also, more practice can be had by applying techniques against weakened variants of real-world ciphers. I also find it helpful to shrink the block and key sizes down a lot when first trying something new. This allows some results to be worked out by hand with pen/paper and can aid in troubleshooting code. The paper "Self-Study Course in Block Cipher Cryptanalysis" by Bruce Scheneir describes some weakened cipher variants for beginners and makes a good starting point.

Linear and Differential Cryptanalysis

Here I will briefly describe (only an overview) two of the most revolutionary analysis methods for cryptography in recent years. A great deal of new research stems from them and understanding the methods is important. I urge the reader to take their time as they read more into these subjects. Ask yourself questions "why is that?", "what does that mean?", "how can i test that out?". I'm a firm believer that passively reading something is useless compared to actively reading it. Differential cryptanalysis is an attack published by Eli Biham and Adi Shamir in 1990. It was discovered earlier by both IBM (1974) and the NSA (who knows?) but kept secret. It is a chosen-plaintext attack that involves choosing plaintexts in pairs with a particular XOR difference and looking for a corresponding XOR difference in the pairs of ciphertext produced. Linear cryptanalysis was discovered and published by Mitsuru Matsui in 1992 as an attack on FEAL. It is a known-plaintext attack that builds a linear approximation of the cipher (using XOR operations on various bits) and then compares the expression to the collected plaintext to estimate the likely keys.