Sektu: Old Cryptography and Entropy

Friday, August 6, 2021

Old Cryptography and Entropy

In my last two posts, I first suggested that Voynich Manuscript 81R might contain a poem in Latin dactylic hexameter, but then I argued that the lines only convey about half of the information necessary to encode such a poem. In this post I'll try to reconcile those two arguments by showing that a late medieval/early Renaissance cipher system could have produced this effect.

The pages of the VM have been carbon-dated to between 1404 and 1438. If the text is not a hoax, and it was written within a century or so of the production of the vellum, then what cryptographic techniques might the author plausibly have known, and how would they impact the total bits per line of an enciphered poem?

According to David Kahn's The Code-Breakers, the following methods might have been available to someone in Europe during that period. For most of these, I have created simulations using the Aeneid as a plain text, and measured the effect on bits per line using the formula for P_bc from my last post.

Writing backwards (0.2% increase)
Substituting dots for vowels (28.5% decrease)
Foreign alphabets (little or no change, depending on how well the foreign alphabet maps to the plaintext alphabet)
Simple substitution (no change)
Writing in consonants only (45.6% - 49% decrease, depending on whether v and j are treated as vowels)
Figurate expressions (impractical to test, but likely to increase bits per line)
Exotic alphabets (no change, same as simple substitution)
Shorthand (impractical to test, but likely to decrease bits per line)
Abbreviations (impractical to test, but certain to decrease bits per line)
Word substitutions (did not test, but likely to cause moderate increase or decrease to bits per line)
Homophones for vowels (increase bits per line, but the exact difference depends on the number of homophones per vowel. With two homophones for each vowel, there was a 19.5% increase)
Nulls (increase bits per line, but the exact difference depends on the number of distinct nulls used and the number of nulls inserted per line)
Homophones for consonants (increase bits per line, but the exact difference depends on the number of homophones per consonant)
Nomenclators (impact depends on the type of nomenclator. I tested with a large nomenclator and got a 44.5% decrease in bits per line)

If 81R contains a poem in Latin dactylic hexameter, then it appears the encoding system caused something like a 47.9% decrease in the number of bits per line. Only two of the encoding methods above have a similar effect:

Writing in consonants only
Using a large nomenclator

The first of these options is intriguing, because removing the vowels from a Latin text causes a significant number of lexical collisions, especially if v and j are treated as vowels. If this is one of the steps in the Voynich cipher process, then the appearance of repeated sequences like daiin daiin daiin in the VM could result from sequences like ita ut tu, ut vitae tuae, etc.

That, of course, cannot be the only story here. If the VM is written in a cipher that removes all of the vowels, then it must also be written in a cipher that encodes single Latin consonants as strings of multiple Voynich letters in order to account for the length of Voynichese words. This must also be done in a way that increases the lengths of words without significantly increasing the number of bits per line.

I think this is quite possible to do. In my next post I'll try to demonstrate this with a proof-of-concept cipher that creates a cipher text like the VM from a Latin plaintext.

5 comments:

Nick PellingAugust 6, 2021 at 1:23 PM
Hi Brian,

David Kahn tried really hard, but his book only really offers a narrow window onto 15th century ciphers and codes.

For example, Alberti reports a long conversation he had with a transposition cipher expert: but our net knowledge of transposition (strewing) ciphers in use in the 15th century is close to zero.

There are plenty of good reasons to think there is something quite atypical about Voynichese: to me, it feels like an elegant marriage between Milanese praxis and Tuscan brainpower - but I would say that, wouldn't I? ;-p

Cheers, Nick
ReplyDelete
Replies
Nick PellingAugust 8, 2021 at 12:35 PM
The problem is that we have no idea how transposition ciphers were used in the 15th century, because we have zero examples to work with. Which is precisely why Alberti's report of a conversation with a highly skilled transposition cipher expert is so annoying. :-(

The low-level transposition cipher I mentioned in Curse was the one used by Filarete in his treatise on architecture, where he transposed a load of names (e.g. Galeazzo -> Zogalia, Averlino -> Nolivera, etc). Something as simple as that may not affect the entropy hugely? A century later this was mentioned as a Florentine schoolboy game (basically like Pig Latin), so I get the impression that transposition was more of a Florentine big-brain thing than an empirical Milanese thing.
ReplyDelete
Replies
Nick PellingAugust 14, 2021 at 2:56 AM
Medieval cryptographers and code-breakers had no useful concept of entropy, or indeed just about anything in our modern toolbox.

As I wrote in "Fifteenth Century Revisited" ( https://www.academia.edu/33813775/Fifteenth_Century_Cryptography_Revisited ), even the 'homophone trick' seem to have emerged primarily as a technique for visually concealing the tell-tale vowels at the end of Italian words, and then moving on Latin. With all that in mind, the suggestion put forward by David Kahn that homophones were introduced to flatten out the stats looks a lot like a spurious back-projection.

My overall position is therefore that Voynich-era cryptography was much more about concealment than transformation. As such, the way Voynichese works makes almost no sense... unless you accept that its words are brutally abbreviated (and also that the text was written more to remind than to encode), at which point all the binomial word length observations start to make sense.
ReplyDelete
Replies

Add comment