Posted in History & Literature

Shibboleth

We are often corrected by others (as much as we correct others) on the proper pronunciation of words. Luckily, improper pronunciation is rarely consequential (other than sparking debates such as how to pronounce the word “gif”). However, on numerous occasions throughout history, this was not the case.

During World War 2, American soldiers in the Pacific Theatre came up with a questionable way of detecting enemy soldiers pretending to be allies to sneak in to bases. If a suspicious person was to approach a checkpoint claiming that they were an American or Filipino soldier, the sentry would ask them to say a certain word. The word was “lollapalooza” – an American colloquialism for something that is exceptional and extraordinary. The basis for this test was that Japanese people tend to pronounce the English letter “l” as “r” due to the difference in the two languages. Therefore, if the person was to repeat back “rorra-” they would be immediately shot.

This seems like a highly inaccurate method. What if they were an American soldier who had a bad head cold, or a lisp? But this type of racial profiling by the way someone pronounces a certain word has been commonly used throughout history to filter out people of certain races. Lollapalooza is an example of a shibboleth – a word that can distinguish people of a certain race by their inability to properly pronounce it.

The word comes from the Biblical story of the Ephraimites. When the Gileadites were invaded by the Ephraimites, they fought back and repelled the Ephraimites, who tried to retreat by crossing the River Jordan. The Gileadites planned ahead by securing the river so that they could capture the Ephraimites. They ordered each person crossing the river to say the word “shibboleth”. Because the Ephraimite’s dialect did not include a way to pronounce the “sh” sound, they would repeat back “sibboleth” and were killed on the spot.

Unfortunately, shibboleths have typically been used to identify members of a certain race so that they could be massacred. Nowadays, shibboleths are used in a more light-hearted manner. For example, New Zealanders and Australians mock each other on how each pronounce the words fish and chips. Because New Zealanders pronounce the “i” with a shorter sound, Australians tease that they say “fush and chups”. On the other hand, New Zealanders mock Australians on their long “i” sounds that make it sound as if they are saying “feesh and cheeps”.

Posted in Science & Nature

Cryptography: Book Cipher

So far, the three ciphers introduced could all easily be cracked using frequency analysis and the Kasiski examination. Is there a cipher that is easy to implement yet difficult to break for a beginner cryptanalyst? An extremely popular and surprisingly powerful cipher is the book cipher. Essentially, the book cipher replaces a keyword with an entire book. Instead of replacing a letter for a letter or symbol in a systematic and mathematical way (such as a set shift number or using a tabula recta), the book cipher replaces letters for numbers that refer to a certain text within a book. As the only way to decode the message is to have the book, it is an extremely secure way of enciphering a message given that both parties have an identical copy of the book.

There are many variations of the book cipher. The most popular type is giving a page number, with the first letter of the page being the plaintext. A variant of this is giving a set of three numbers for every letter: the page number, the line number and the word number (or just two: page and line, then take the first letter). Ironically, this may be less secure at times as it may reveal that it is a book cipher. However, doing this for each letter makes the enciphering and deciphering process incredibly long and arduous.

A shortcut method is to refer to a word within a page (using the three-number set coordinates method described above) to shorten the ciphertext. Although this method is much easier in practice, it poses the challenge of finding a book that includes all the words in the plaintext, which may be difficult if the code is for military or espionage purposes.

Because of this, and the fact that both parties (or everyone in the ring) need identical versions of the book while not standing out too much, the most common books used are the dictionary (typically a famous version such as the Oxford Dictionary) or the bible (again, a standard version is used). These books are not only good because they incorporate a massive vocabulary, but they are also inconspicuous while being carried around in an enemy territory.

The book cipher is a very difficult code to crack for most people without advanced cryptanalysis training. Thus, the easiest way to crack is to deduce what book is the keytext. There are numerous ways to do this, but one way would be to cross-match the books of two known spies until common books are found. In the setting of spies in a foreign country, a book such as a traveller’s guide or phrasebook dictionary can be considered a likely target as it can be carried around easily while containing many words. Ergo, the secret behind cracking the book cipher is less about cryptography and more about using the science of deduction.

Posted in Science & Nature

Cryptography: Kasiski Examination

The Kasiski examination can be used to attack polyalphabetic substitution ciphers such as the Vigenère cipher, revealing the keyword that was used to encrypt the message. Before this method was devised by Friedrick Kasiski in 1863, the Vigenère cipher was considered “indecipherable” as there was no simple way to figure out the encryption unless the keyword was known. But with the Kasiski examination, even the Vigenère cipher is not safe anymore.

The Kasiski examination is based on the fact that assuming the number of letters of the keyword is n, every nth column is encoded in the same shift as each other. Simply put, every nth column can be treated as a single monoalphabetic substitution cipher that can be broken with frequency analysis. Ergo, all the cryptanalyst needs to do to convert the Vigenère cipher into a Caesar cipher is know the length of the keyword.

To find the length of the keyword, look for a string of repeated text in the ciphertext (make sure it is longer than three letters). The distance between two equal repeated strings is likely to be a multiple of the length of the keyword. The distance is defined as the number of characters starting from the last letter of the first set of strings to the last letter of the second set of strings (e.g. “abcdefxyzxyzxyzabcdef” -> “abcdef” is repeated” -> distance is “xyzxyzxyzabcdef” which is 15 letters). The reason this works is that if there is a repeated string in the plaintext and the distance between these strings is a multiple of the keyword length, the keyword letters will line up and there will be repeated strings in the ciphertext also. If the distance is not a multiple of the keyword length, even if there is a repeated string of letters in the plaintext, the ciphertext will be completely different as the keyword would not match up and be different.

It is useful recording the distance between each set of repeated strings to find the greatest common factor. The number that factors the most into all of these distances (e.g. 6 is a factor of 6, 12, 18…) is most likely the length of the keyword. Once the length of the keyword is found, then every nth letter must have been encrypted using the same letter of the keyword. Thus, by recording every nth letter in one string, you can obtain what is essentially a Caesar cipher. The Caesar cipher is then attacked using frequency analysis. Once a few of these strings (of different positions on the ciphertext) are solved, the keyword can be revealed by checking the shift key against a tabula recta (e.g. if a certain string of nth letters is found to have been shifted 3 letters each, then the corresponding letter in the keyword must be “D”, which shifts every plaintext letter by 3 in the Vigenère cipher). When the keyword is deduced, every message encrypted using that keyword can now easily be decoded by you.

Although the Kasiski examination appears to be complex, attempting to try it reveals how simple the process is. Thus, it is useful to try encrypting a message using the Vigenère cipher then trying to work out the keyword using the Kasiski examination. Much like the frequency analysis, it is an extremely useful tool in the case of needing to break a secret code.

Posted in Science & Nature

Cryptography: Vigenere Cipher

It has thus been proven that the Caesar cipher, the pigpen cipher and any substitution cipher can be simply broken using frequency analysis. The basis for this is that each letter or symbol can only represent a single letter, meaning that letter frequencies (e, t, a, o…) are directly translated onto the cipher language. Ergo, by making each letter represent more than one letter, the letter frequencies can be masked and an additional level of security can be added to the cipher. This is called polyalphabetic substitution and it is the basis for a type of cipher known as the Vigenère cipher.

The cipher was first conceived in 1553 by Giovan Battista Bellaso and has been improved since. It is famous for being rather simple to use despite the difficult to decipher it at a beginner’s level. This trait earned the cipher the nickname “le chiffre indéchiffrable”, which is French for “the indecipherable cipher”.

The Vigenère cipher can be thought of a stack of Caesar ciphers (essentially a cipher within a cipher), where each letter is shifted by a variable key (in a normal Caesar shift, every letter is shifted by the same key). This is achieved by the implementation of a keyword and a table called a tabula recta. A tabula recta is simply a grid made from 26 rows of the alphabet, each row of which is made by shifting the previous one to the left. This table essentially shows all the possible outcomes of a Caesar shift.

Now, let us try encoding a message using the Vigenère cipher. The message “attack at dawn” is encoded using the keyword “nothing”. Ideally, there should be no repeating letters in the keyword for the sake of security. Therefore, if there are any repeating letters, just remove the repeated letters (e.g. “crocodile” -> “crodile”). First, repeat the keyword until it matches the number of letters of the message (e.g. “attackatdawn” is aligned with “nothingnothi”). Then, use the tabula recta to encrypt the message. The rule of thumb is “key-row, message-column”, meaning that the row of the tabula recta starting with the letter of the key is matched against the column starting with the respective letter of the message. To take the first letter as an example, the key letter is “n” and the message letter is “a”. The letter corresponding to where the “n” row and “a” column meets is “N”. If this rule is followed for each letter, the encrypted message becomes: “NHMHKXGGRTDV”. Although it takes some effort to find each letter on the table, the message becomes “indecipherable” to a beginner cryptanalyst as frequency analysis becomes useless. For example, the repeating letter “H” can represent either “t” or “a”. The longer the keyword is, the more secure the Vigenère cipher becomes.

However, the Vigenère cipher is not indecipherable. Next, we will look at a cryptanalysis method called the Kasiski examination that attacks a polyalphabetic cipher such as the Vigenère cipher to gain access to the keyword.

Posted in Science & Nature

Cryptography: Pigpen Cipher

Another well-known substitution cipher is the “pigpen cipher” or “Freemason’s cipher”. As the name suggests, it was often used by Freemasons to encrypt their messages. However, as time has passed, it has become so well-known that it is not a very secure cipher at all.

The pigpen cipher does not substitute the letter for another letter, but instead uses a symbol that is derived from a grid-shaped key. The key is made of two 3×3 grids (#)(one without dots, one with dots) and two 2×2 grids (X)(one without dots, one with dots). The letters are filled in systematically so that each shape represents a certain letter (e.g. v=s, >=t, <=u, ^=v)

The cipher has many variations that attempt to throw off an attacker by rearranging the order of the grids or the letters. Thus, even if a cunning attacker picks up on the fact that the cipher is a pigpen cipher, they may use the wrong key and get a completely wrong message. Nonetheless, it is a useful skill to recognise the unique symbols of the pigpen cipher as it is a popular cipher used commonly in puzzles.

As with any substitution ciphers, frequency analysis and pattern recognition is the key to cracking the pigpen cipher.

Posted in Science & Nature

Cryptography: Frequency Analysis

A cipher is a message that has been encoded using a certain key. The most common and basic type of ciphers are encrypted using letter substitution, where each letter represents a different, respective letter. For example, the message may be encoded in a way so that each letter represents a letter three values before it on the alphabet (e.g. if a=0, b=1… “a” becomes “d”, “b” becomes “e” etc.). This creates a jumble of letters that appears to be indecipherable.

However, the characteristics of substitution ciphers make them the most decipherable type of encryptions. As each letter can only represent one other letter, as long as the key is cracked (i.e. what letter is what), the message and any future messages can be cracked. The most important tool in decrypting substitution ciphers is pattern recognition and frequency analysis.

Frequency analysis relies on the fact that every language has certain letters that are more used than others. In the English language, the letters that are most used, in order, are: E, T, A, O, I, N, S, H, R, D, L, U (realistically, only E, T, A, O are significant and the rest are neither reliable nor useful in frequency analysis).

For example, if Eve intercepted a long, encrypted message that she suspects to be a simple substitution cipher, she will first analyse the text for the most common letter, bigram (two letter sequence) and trigram. If she found that I is the most common single letter, XL the most common bigram and XLI the most common trigram, she can ascertain with considerable accuracy that I=e, X=t and L=h (“th” and “the” are the most common bigram and trigram respectively). Once she substitutes these letters into the cipher, she will soon discover that certain patterns arise. Eve may notice words such as “thCt” and deduce that C=a, or find familiar words and fill in the blanks in the key. The discovery of each letter leads to more patterns and the vicious cycle easily breaks the code.

Frequency analysis is extremely useful as it can be used to attack any simple substitution ciphers, even if they do not use letters. For example, in Sir Arthur Conan Doyle’s Sherlock Holmes tale The Adventure of the Dancing Men, Sherlock Holmes uses frequency analysis to interpret a cryptogram showing a string of hieroglyphs depicting dancing men.

To reinforce this weakness in substitution ciphers, many cryptographers have devised better encryption methods such as polyalphabetic substitution, where several alphabets are used (e.g. a grid of two alphabets – also called a tabula recta).

Posted in Life & Happiness

What Cannot Be Seen

Voici mon secret. Il est très simple: on ne voit bien qu’avec le cœur. L’essentiel est invisible pour les yeux.

“Here is my secret. It is very simple. It is only with the heart that one can see rightly. What is important is invisible to the eye.”

(from The Little Prince by Antoine de Saint Exupéry)

In The Little Prince there is a story that goes like this. At the age of six the protagonist, after seeing a picture of a boa constrictor swallowing an animal in a book, draws “Drawing Number One”:

image

He showed it to adults and asked if the drawing frightened them. They replied: “Why should any one be frightened by a hat?”

Drawing Number One was not a drawing of a hat, but a boa constrictor digesting an elephant. However, adults could not understand the true meaning of the drawing. So the protagonist drew Drawing Number Two which showed the elephant inside the boa constrictor. The adults advised him to put aside drawings of things like boa constrictors and elephants that could not be seen and instead take interest in geography, history, arithmetic and grammar. That is why he gave up his dream of becoming an artist.

image

Even after growing up and becoming a pilot, he sometimes showed people Drawing Number One and asked what they saw. But they only saw the hat, never the elephant or the boa constrictor. Unlike when he was young, he did not try to explain the true meaning of the drawing and instead brought up “adult topics” like golf and politics.

Money, status, beauty… Things that can be seen can fool you just as an optical illusion does. Things that are invisible such as the mind, creativity, understanding and love are the only things that can truly bring you satisfaction. So how can we look for things that we cannot see? We can infer that the wind blows from the rustling of leaves. If the leaves are not rustling, it means the wind is not blowing. To find like-minded people, a “password” needs to be used, just as we used leaves to find wind.

The Encyclopaedia of Absolute and Relative Knowledge is like a “password” to me, just like Drawing Number One was for the protagonist of The Little Prince. It is something that can be used to see if the other person shares my way of thinking and beliefs. Whether it be a book, a picture or a quote, such a password that represents you as a whole can be very useful in finding a true companion. If they ask you why you write such a thing, what the meaning of the picture is or why you respect the quote, then they are not the one you were looking for. The person you are desperately looking for will never ask “why” but instead respond with honest curiosity. Yes, the correct answer to the password is an expression of childlike curiosity. Upon seeing that person’s pure smile, you will know: that that person can decode your password, that they understand you, that they will accept your everything. That person is the one you have been looking for to accompany you for the rest of your life.