Tuesday, October 26, 2021

Are there nulls in the VM?

Renaissance cryptographers used nulls to break up repeated sequences of characters and to alter the frequency statistics of a text. Did the author of the VM do anything similar? If so, how could we detect it?

Null characters increase the entropy of a text. The more randomly a null character is employed, the greater the entropy it adds to the text. For example, the Latin text Historia rerum in partibus transmarinis gestarum written by William of Tyre conveys an average of 2.711 bits per character when measured using third-order entropy. If we randomly insert a null character "@" into the text at an average interval of every five characters, the entropy increases to 3.044 bits per character.

In cases like this, it seems like we ought to be able to ferret out the null character by identifying the character which, when removed, causes a significant drop in the entropy of the text.

The procedure we'll use is:

1. Calculate the average bits per character of a text which we think may be a cipher. This value will be the Bnull.

2. Make a copy of the cipher text and remove from the copy a character C which we think might be a null.

3. Calculate average bits per character of the text with C removed. Let this be BC.

5. Calculate the "nullitude" of the character C using NC = BC / Bnull.

First, we should know what the results look like with a text that does not contain nulls. Here are the nullitude values for characters in the Historia when no null character is inserted:

Here we do not see a significant drop below the value of 1, indicating that no single character, when removed from the text, causes a noticeable decrease in entropy. This is what we expected to see.

Now, looking at a copy of the Historia into which a null "@" has been inserted randomly about every five characters, we see a different result:


Here the null character stands out clearly, causing a significant drop in the entropy of the text once it is removed. Again, this is what we expect.

We can apply the same test to the Voynich Manuscript. Here is what we find with Currier A:


The result here is interesting because, at the left side of the chart, we see that removal of the character "e" causes a drop in entropy. It isn't a huge drop, but it does stand out from the rest of the characters. This suggests the possibility that Currier A words like cheol and cheor might be synonyms of the more common words chol and chor.

Though "e" looks slightly nullish in Currier A, we do not see the same phenomenon in Currier B:


In Currier B the distribution is more like the Latin plaintext above, with no particular character looking more like a null than the others.

2 comments:

  1. As you'd expect, I've thought a lot about nulls in the Voynich. Which order entropy did you calculate? The point about nulls is that they disrupt the predictive entropy, i.e. the ability to predict the next letter.

    ReplyDelete
  2. I used third order entropy for these nullitude graphs. A few things I didn't explicitly call out in this blog post: 1) For Voynichese I tested individual letters as well as common sequences like ee, eee, ch, sh, ii, iii. 2) I treated the word divider as a character, represented as a "." at the far right of each graph. 3) If a null character's placement in the text is governed by context (e.g. null is always inserted before a vowel) then removing it doesn't cause a huge drop in entropy. However, in this case you could say you don't have a null character at all, but a set of biliteral homophones.

    ReplyDelete