Posted in Science & Nature

Cryptography: Frequency Analysis

A cipher is a message that has been encoded using a certain key. The most common and basic type of ciphers are encrypted using letter substitution, where each letter represents a different, respective letter. For example, the message may be encoded in a way so that each letter represents a letter three values before it on the alphabet (e.g. if a=0, b=1… “a” becomes “d”, “b” becomes “e” etc.). This creates a jumble of letters that appears to be indecipherable.

However, the characteristics of substitution ciphers make them the most decipherable type of encryptions. As each letter can only represent one other letter, as long as the key is cracked (i.e. what letter is what), the message and any future messages can be cracked. The most important tool in decrypting substitution ciphers is pattern recognition and frequency analysis.

Frequency analysis relies on the fact that every language has certain letters that are more used than others. In the English language, the letters that are most used, in order, are: E, T, A, O, I, N, S, H, R, D, L, U (realistically, only E, T, A, O are significant and the rest are neither reliable nor useful in frequency analysis).

For example, if Eve intercepted a long, encrypted message that she suspects to be a simple substitution cipher, she will first analyse the text for the most common letter, bigram (two letter sequence) and trigram. If she found that I is the most common single letter, XL the most common bigram and XLI the most common trigram, she can ascertain with considerable accuracy that I=e, X=t and L=h (“th” and “the” are the most common bigram and trigram respectively). Once she substitutes these letters into the cipher, she will soon discover that certain patterns arise. Eve may notice words such as “thCt” and deduce that C=a, or find familiar words and fill in the blanks in the key. The discovery of each letter leads to more patterns and the vicious cycle easily breaks the code.

Frequency analysis is extremely useful as it can be used to attack any simple substitution ciphers, even if they do not use letters. For example, in Sir Arthur Conan Doyle’s Sherlock Holmes tale The Adventure of the Dancing Men, Sherlock Holmes uses frequency analysis to interpret a cryptogram showing a string of hieroglyphs depicting dancing men.

To reinforce this weakness in substitution ciphers, many cryptographers have devised better encryption methods such as polyalphabetic substitution, where several alphabets are used (e.g. a grid of two alphabets – also called a tabula recta).