Monday, July 26, 2021

Entropy in Voynichese

It has often been observed that Voynich characters have relatively low entropy (c.f. this discussion on RenĂ© Zandbergen's site). This is a serious problem for the proposal I made in my last post, where I suggested that page 81R of the Voynich Manuscript might contain a poem in Latin dactylic hexameter.

Suppose you calculate the bits of information conveyed by a character c of a text T using a formula like the following:

Sc = (ln(fT) - ln(fc)) / ln(2)

where

Sc is the number of bits conveyed by the single character c

fT is the number of characters in the text

fc is the number of times the character c appears in the text

Using this formula we find that the lines on 81R carry, on average, 121.4 bits of information. In contrast, lines of the Aeneid carry an average of 156.1 bits of information. This is a real problem, which becomes even more severe if you look at the incremental information conveyed by the second character in a pair. That is, for a character c appearing immediately after a character b:

Pbc = (ln(fb) - ln(fbc)) / ln(2)

where

Pbc is the number of bits conveyed by character c when it appears in the pair bc

fb is the number of times the character b appears in the text

fbc is the number of times the sequence bc appears in the text, which may also be expressed as the number of times that the character c appears immediately after b.

This second approach to measuring information tells us, for example, that the character "u" in a Latin text conveys no additional information when it follows "q". Since the total frequency of "qu" is the same as the total frequency of "u", the numerator is zero, and total bits likewise is zero.

When you apply this measure to the lines on 81R and the Aeneid, the average amount of information conveyed the lines of 81R drops to 66.9 bits, while the information conveyed in the average line of the Aeneid drops only to 128.5 bits.

This is a serious challenge to the idea that the plaintext on 81R is a Latin poem in dactylic hexameter, because it suggests that these lines simply don't contain enough information to encode such a poem. In my next post I will look at historically and culturally plausible enciphering schemes that could produce this effect.

2 comments:

  1. Hi Brian!
    I can't find your results of the Codex Rohonc transcription. You have done a great job, too bad you keep it to yourself. Maybe I don't know how to search? In any case, I wish you a good continuation!

    ReplyDelete
    Replies
    1. I can't find it either! At some point moving from one server to another, I misplaced it. I still have the code to generate it, though, so I'll rebuild it and post it somewhere. (Maybe on github, so it's easier for other people to work with.)

      Delete