Thursday, August 26, 2021

A Voynich-like Code

I've just read an article titled The Linguistics of the Voynich Manuscript by Claire L. Bowern and Luke Lindemann, which summarizes previous scholarship on the manuscript and concludes that "the character-level metrics show Voynichese to be unusual, while the word- and line-level metrics show it to be regular natural language."

Reading the article reminded me to finish this post, which I started several weeks ago. Here, I'll outline a cipher using ideas from my previous posts, which I believe a late medieval or early Renaissance scholar might plausibly have created, which I think would produce some of the features of the Voynich manuscript.

I'll walk through the cipher steps with an English phrase and a Latin phrase: Can you read these words? Potesne legere haec verba?

Step 1: Remove the vowels from all of the words. This is what causes the words of the cipher text to carry less information than they would in the source language. In this example I'm treating the letter v as a vowel in Latin, but w as a consonant in English, because these are the historical conventions in these languages.

English; cn y rd ths wrds?

Latin: ptsn lgr hc rb?

It is an open question for me whether it is feasible to reverse this step in Latin. In English I know it is reasonable if you are familiar with the general content of the text, because a similar approach was used to create mnemonics describing Masonic ceremonies:


Step 2: Encipher each word using a substitution cipher that replaces each letter with a syllable, with special syllables reserved for the last letters in each word, to create the appearance of an inflected language. This is what creates the low second-order character entropy.

In this case, I have created the key using the first and last syllables from polysyllabic words at the beginning of Virgil's Aeneid. I haven't bothered to create a complete key, it only covers the letters needed for this example.


A partial key

Using this key, the two example sentences become:

English: viam tum prono vepria liprocaa?

Latin: favelaam otrogus prirum proma?

One of the neat things about this cipher approach is that one could hypothetically train oneself to speak the cipher. 

Step 3: Write the cipher in a secret alphabet. This changes very little about the cipher, and might be considered more of a cultural requirement of the era.

To be clear, I don't think the Voynich cipher worked in exactly this way. For example, the frequency of daiin in the Currier A pages is nearly exactly the frequency of t (representing et, ut, te, tu, etc.) in a long devoweled Latin Text, but it isn't clear how daiin could be used to encode initial, medial or final t in other longer words. If the underlying language of the VM is Latin, and it is encoded using a system like this, then it is likely that there is some additional complexity in step 2. For example, there might be a set of words (like daiin, chol, chor) that encode single letters, then another set of prefixes and suffixes to encode letters in longer words.

2 comments:

  1. This is not hugely dissimilar from what I suggested in Curse 2006.

    The 15th century saw non-Tironian abbreviation flourish in note-taking, so that Latin scribes were taught to sharply contract (contractio) and shorten (abbreviatio) words for tachygraphy. This is basically the kind of small-group private shorthand you're talking about here.

    Combine that with a verbose cipher that "fattens" the shorthand out again, and you're on the road to the right kind of cipher text.

    My suspicion has been that the ar/or/al/ol 'subcipher' originated as a way of disguising the repetitive visual nature of Roman numbers (e.g. XXVIII), for which you would probably predict a date of around 1410-1430.

    But what is missing here is a corpus of the kind of contractive shorthand circa 1420-1460 that I've read about many times but have never actually seen. Just so you know!

    ReplyDelete
    Replies
    1. I clearly need to make another attempt to get hold of Curse! I tried to buy a used copy online recently, but the seller failed to deliver it. I'll check on Amazon today.

      I think your idea of looking at contractions and abbreviations makes more sense culturally than my idea of simply removing all of the vowels.

      While there may not be a corpus of these texts, there is a Lexicon Abbreviaturarum, and a 1901 edition of it is available for download from Google Books. With a bit of work and the Latin ISE corpus, maybe I could create something that simulates a corpus of text written in abbreviated form.

      I've been wondering how the VM would represent Roman numerals. Do you expand on this idea of using ar/or/al/ol in Curse?

      Delete