Saturday, November 27, 2021

A mistake in the VM?

One of the notable things about the VM is that there are no obvious mistakes. The scribe(s) do not appear to have erased, scraped out or overlined any text.

On f105r, however, there are four words written in an odd location, and I will argue that these were accidentally omitted from the end of the first line of the text, but the mistake was caught before the scribe was done and they were written in above the line. These words make up line 10 in the Landini-Stolfi transliteration, and fall between the second paragraph and the third one.


On René Zandbergen's site the description for this page has the following note: "There is a break between the third and fourth paragraph, and it appears as if the end of the third paragraphs was written above it." I think the "break" referred to here is the fact that the ink of the fourth paragraph is slightly fainter than the third, and the letters are neater and smaller, suggesting that paragraphs 1-3 were written in one sitting, and paragraphs 4 and onward were written later.

Observations

We can observe the following things about the physical appearance of these words:
  • They are set lower than the last line of the second paragraph, though there was ample space to place them in line with it, suggesting that they are not intended to be part of paragraph 2.
  • The gallows letters of the first line of the third paragraph are interposed between the oddball words, suggesting that the oddball words were written after the first line of paragraph 3 was completed, and written around the gallows letters.
  • The color and shape of the letters in these words is similar to those in paragraphs 1-3, so not obviously written later or in a different hand.
The words have the following statistical properties:
  • sairy elsewhere only appears as the last word of the first line of a paragraph
  • ore does not appear elsewhere
  • daiindy appears once in Currier A as part of a label and twice in Currier B (including this instance), not in either case as the end of a line
  • ytam appears both in Currier A and B, usually at the end of the line, but not always
Conclusions

The color and style of the letters, together with their placement, suggest that they belong to paragraph 3, and they were written in at roughly the same time that the other lines of paragraph 3 were written. The statistical properties of the words suggest that they belong to the end of a line, but which line?
  • Line 11: This line ends in dyaiin, which is a word not found elsewhere
  • Line 12: This line ends in ry, which elsewhere is only a line-final word
  • Line 13: This line ends with ot, which elsewhere does not appear at the end of a line. There is blank space at the end of the paragraph, sufficient to write about two words.
Given that line 12 ends in a word which is elsewhere only line-final, and line 13 leaves enough space for at least two of the oddball words to have been written there, the best explanation is that these words belong at the end of line 11, the line immediately below them.

I have seen omissions like this in manuscripts in the past, and the cause is often that the eye skips from one word to a later similar word. In this case, perhaps the scribe's eye skipped from sairy to yaiir, which starts line 12. That opens up two possibilities:
  • If the text was enciphered first on a wax tablet (or something similar) and then copied to the vellum, and the eye-skip occurred during the copying process, then line-breaks on the wax tablet were not the same as the line breaks on the vellum.
  • If the eye-skip occurred during the encipherment process, then the plaintext for sairy could be similar to (or even identical to) the plaintext for yaiir.

Saturday, November 20, 2021

Word Transposition in the Voynich Manuscript

This is a follow-on to my last post, but I don't want to bother recapping the argument from last time, so I'll just start over fresh and go a different direction.

Summary
While a text in an unknown language may look random, there are two "forces" that govern the appearance of words in the text. One of those "forces" is absent in the Voynich Manuscript, and I think that indicates that a word transposition step has taken place.

Argument
The following graph shows the relative frequency that a word will recur in a text after having once occurred.


The graph above shows the likelihood that a given word will appear a second time after it has appeared a first time. For example, in a Latin medieval prose text (red line), if a word appears once then there is an extremely low chance (0.03%) that the next word will be the same word. That rises to an almost 1% chance that it will appear six words later, then slowly drops off to a 0.76% chance that it will appear 30 words later, and a roughly 0.6% chance that it will appear 100 words later.

A similar phenomenon can be seen in the early modern French novel Pantagruel (blue line).

This curve could be described as the product of the interaction of two "forces":
  1. Strong repulsive force: Instances of the same word have a very low likelihood of being found in very close vicinity to each other. Perhaps languages are naturally structured in a way that avoids close repetition.
  2. Weak attractive force: Instances of the same word have a higher likelihood of being found in the same broad area of the text as each other. Intuitively it seems like this should not apply to high-frequency words with low semantic content (articles, prepositions, etc.), since there is no reason for these to be grouped together in the same area of a text. Instead, this ought to apply primarily to lower-frequency words with high semantic content, since these words will be tied to the topic of discourse, and will therefore be clustered in areas of the text where the topic relates to their semantic domain. (I should have proved this out, but I didn't.)

Interestingly, Latin syllables (orange line) respond to the same strong repulsive force as words, but not the weak attractive one. This makes sense if the weak attractive force relates to semantic content, because syllables themselves have no semantic content and are therefore not tied to the topic of the text. Instead, with Latin syllables we see a strong tendency for syllables not to repeat in close vicinity to each other, but then the curve just rises to a plateau.

So what do we see in the VM?


Words in the Voynich Manuscript demonstrate the effects of the weak attractive force more or less like Latin words.  This suggests they have a semantic component, and there is some kind of topicalization going on. However, the VM shows no evidence of the strong repulsive force. What could cause that?

The strong repulsive force works over a very short distance, generally less than five words. If words were shuffled around so they were separated from their neighbors by a distance of five words or more, then this would conceal the effect of the strong repulsive force.

In other words, perhaps there is a transposition step in the VM cipher, operating on words. This could solve a lot of problems.

For example, such a transposition could also explain why the text does not exhibit line-breaking features that make it clear whether it runs right-to-left or left-to-right.

It might also explain why the last lines of some paragraphs (especially in the Currier A sections) have gaps in them. Perhaps these gaps are slots that were simply not filled in by the transposition algorithm.


Indeed, if we suppose that the transposition works on the level of the paragraph, then that could explain why so many paragraphs begin with a word containing an ornate gallows letter. If the transposition algorithm resets for each paragraph, then the reader would need a visual cue to indicate where to start over again.

I can even imagine algorithms that could produce the phenomenon of head words, body words and tail words, though this is a bit more of a stretch since that would mean there is some connection between what a word is and where the transposition algorithm puts it.

Lastly, this could explain why the VM has no punctuation. In manuscripts of this era punctuation was common (though not universal). Since punctuation marks stand between words in connected linear text, if the words are shuffled through some kind of transposition algorithm then it might no longer be clear where to put the punctuation marks.

So...how would one prove or disprove the existence of word transposition in the VM?