Posted in Science & Nature

Cryptography: Book Cipher

So far, the three ciphers introduced could all easily be cracked using frequency analysis and the Kasiski examination. Is there a cipher that is easy to implement yet difficult to break for a beginner cryptanalyst? An extremely popular and surprisingly powerful cipher is the book cipher. Essentially, the book cipher replaces a keyword with an entire book. Instead of replacing a letter for a letter or symbol in a systematic and mathematical way (such as a set shift number or using a tabula recta), the book cipher replaces letters for numbers that refer to a certain text within a book. As the only way to decode the message is to have the book, it is an extremely secure way of enciphering a message given that both parties have an identical copy of the book.

There are many variations of the book cipher. The most popular type is giving a page number, with the first letter of the page being the plaintext. A variant of this is giving a set of three numbers for every letter: the page number, the line number and the word number (or just two: page and line, then take the first letter). Ironically, this may be less secure at times as it may reveal that it is a book cipher. However, doing this for each letter makes the enciphering and deciphering process incredibly long and arduous.

A shortcut method is to refer to a word within a page (using the three-number set coordinates method described above) to shorten the ciphertext. Although this method is much easier in practice, it poses the challenge of finding a book that includes all the words in the plaintext, which may be difficult if the code is for military or espionage purposes.

Because of this, and the fact that both parties (or everyone in the ring) need identical versions of the book while not standing out too much, the most common books used are the dictionary (typically a famous version such as the Oxford Dictionary) or the bible (again, a standard version is used). These books are not only good because they incorporate a massive vocabulary, but they are also inconspicuous while being carried around in an enemy territory.

The book cipher is a very difficult code to crack for most people without advanced cryptanalysis training. Thus, the easiest way to crack is to deduce what book is the keytext. There are numerous ways to do this, but one way would be to cross-match the books of two known spies until common books are found. In the setting of spies in a foreign country, a book such as a traveller’s guide or phrasebook dictionary can be considered a likely target as it can be carried around easily while containing many words. Ergo, the secret behind cracking the book cipher is less about cryptography and more about using the science of deduction.

Posted in Science & Nature

Cryptography: Frequency Analysis

A cipher is a message that has been encoded using a certain key. The most common and basic type of ciphers are encrypted using letter substitution, where each letter represents a different, respective letter. For example, the message may be encoded in a way so that each letter represents a letter three values before it on the alphabet (e.g. if a=0, b=1… “a” becomes “d”, “b” becomes “e” etc.). This creates a jumble of letters that appears to be indecipherable.

However, the characteristics of substitution ciphers make them the most decipherable type of encryptions. As each letter can only represent one other letter, as long as the key is cracked (i.e. what letter is what), the message and any future messages can be cracked. The most important tool in decrypting substitution ciphers is pattern recognition and frequency analysis.

Frequency analysis relies on the fact that every language has certain letters that are more used than others. In the English language, the letters that are most used, in order, are: E, T, A, O, I, N, S, H, R, D, L, U (realistically, only E, T, A, O are significant and the rest are neither reliable nor useful in frequency analysis).

For example, if Eve intercepted a long, encrypted message that she suspects to be a simple substitution cipher, she will first analyse the text for the most common letter, bigram (two letter sequence) and trigram. If she found that I is the most common single letter, XL the most common bigram and XLI the most common trigram, she can ascertain with considerable accuracy that I=e, X=t and L=h (“th” and “the” are the most common bigram and trigram respectively). Once she substitutes these letters into the cipher, she will soon discover that certain patterns arise. Eve may notice words such as “thCt” and deduce that C=a, or find familiar words and fill in the blanks in the key. The discovery of each letter leads to more patterns and the vicious cycle easily breaks the code.

Frequency analysis is extremely useful as it can be used to attack any simple substitution ciphers, even if they do not use letters. For example, in Sir Arthur Conan Doyle’s Sherlock Holmes tale The Adventure of the Dancing Men, Sherlock Holmes uses frequency analysis to interpret a cryptogram showing a string of hieroglyphs depicting dancing men.

To reinforce this weakness in substitution ciphers, many cryptographers have devised better encryption methods such as polyalphabetic substitution, where several alphabets are used (e.g. a grid of two alphabets – also called a tabula recta).