Saturday, November 8, 2014

Relative frequency of initials and finals

In a previous post, I argued that we could use the presence or absence of hyphens at the end of a line to generate some basic statistics about word initials and word finals. At the time I was thinking of using this information to divide the text into words, but over the last few busy months I have been thinking about another use for this data.

In most (or all?) languages, the frequency of ranking initials differs somewhat from the ranking of finals and medials. For example, in Latin, the letter t occurs nearly four times more often at the end of a word than at the beginning, whereas u occurs about 3.5 times more frequently as an initial than a final.

Rohoncian is no different from known languages in that respect. For example, the glyph D occurs 8.5 times more frequently as a final than as an initial. If Rohoncian is a known real language, then the difference between frequency ranking in initial, medial and final positions could be used to help narrow it down.

For example, using the three most common glyphs in Rohoncian, we could construct a kind of litmus test. The relative frequencies of those glyphs are:


If we wanted to test the theory that Rohoncian is Latin and those three glyphs are alphabetic, then we would match them up to the most common three Latin letters:


In broad terms, this correspondence seems to work out well. C shares in common with e that both are ranked first in overall frequency and somewhat more frequent as initials. Similarly, D and t share the third position and are significantly more frequent as finals than initials.

The main problem with this, as far as earlier proposals go, is that two of the evangelists have names ending in D (i.e. CO IH D and XDC D). However, this is already a problem because it seems to work best to read those names as Luke (or Mark) and Matthew, and it is not clear what those names share in common that would lead them to be written with the same final.

On the positive side, I had previously proposed reading the word K O A D CX as "nights". If this is the word noctes, then the D falls in the right place, and CX could be read es. (The glyph CX looks like C, but with a dot).

Part of me wonders what we would get if we looked at initials, medials and finals in the Voynich manuscript. But that carcass has been picked over by smarter minds than mine, and yielded almost nothing.

