Sunday, September 19, 2021

I was Wrong about F81R
How Line Breaks and Word Breaks Behave in Currier A and B

This is going to be a long and boring post, so here's the summary:
  • Line breaks in the VM do not act like line breaks in a natural text, in that they do not provide evidence of whether the text runs left-to-right or right-to-left.
  • Word breaks in Currier A act like word breaks in a natural text, but in Currier B they do not.
  • Since my analysis of F81R as a poem was based on the assumption that line breaks and word breaks were natural, yet they turn out not to be natural at all, there remains nothing to support the idea that this page contains a poem.
Here are the tests I conducted whose results led me to that conclusion.

1. Direction of Text

Question: Text in the VM is laid out on the page in a way that suggests left-to-right text, but does the content of the text support that? How do we know the layout of the text isn't intentionally misleading?

Test: In a traditional European text, line breaks are governed by the width of the text column, and have an arbitrary relationship to the underlying text. Therefore we should expect that high frequency pairs of words W1 and W2 will occasionally be broken across lines, so W1 will appear on one end of one line and W2 will appear on the other end of the next line. If W1 appears at the right end of one line and W2 appears at the left end of the next line, then the text behaves like a left-to-right text. If they appear on the left and right ends, respectively, then it behaves like a right-to-left text.

Demonstration: I applied the test to De natura rerum ad Sisebutum regem liber, by Isidorus Hispalensis Episcopus, which is roughly the size of the Currier A section of the VM. The sample text contained 366 distinct pairs of words that were repeated at least twice, for a total of 966 instances of repeated pairs. In 68 cases a pair was found broken across lines in a way that indicated left-to-right text, in 13 cases it was found broken in a way that indicated right-to-left text.

Conclusion: With more than five times as many left-to-right breaks, the evidence pointed strongly to a left-to-right text, as was expected.

Currier A: I found 491 distinct pairs repeated at least twice, for a total 1389 instances of repeated pairs. In 32 cases a pair was broken across lines in a way that indicated left-to-right text, in 47 cases it was found broken in a way that indicated right-to-left text.

Conclusion: The number of left-to-right breaks is not significantly different from the number of right-to-left breaks. This is not obviously a natural text running in either direction.

Currier B: I found 1701 distinct pairs repeated at least twice, for a total 5313 instances of repeated pairs. In 69 cases a pair was broken across lines in a way that indicated left-to-right text, in 94 cases it was found broken in a way that indicated right-to-left text.

Conclusion: The number of left-to-right breaks is not significantly different from the number of right-to-left breaks. This is not obviously a natural text running in either direction.

2. Word Breaks

Question: Text in the VM appears to be broken into words by spaces, but do these spaces really act like word breaks within the text?

Test: Word breaks should divide the text into a relatively productive lexicon. A productive lexicon is one that can produce the text in question with a relatively small number of words used at relatively high frequencies. We should find that true word breaks divide the text into a productive lexicon better than any other character in the text.

Treat each character in the text as a potential word-break character and measure the frequency of the most frequent word in the resulting lexicon. Use that frequency as a proxy for the productivity of the lexicon. If the word break character results in the most productive lexicon, then it acts like a true word break.

Demonstration: I applied the test to De natura rerum ad Sisebutum regem liber. The word break character resulted in a score of 320, while the next best character (s) resulted in a score of 184.

Conclusion: The lexicon created by the word break character is nearly twice as productive as the next best candidate. The word break character acts like a true word break, as expected.

Currier A: The word break character resulted in a score of 512, while the next best character (o) resulted in a score of 266.

Conclusion: The lexicon created by the word break character in Currier A is nearly twice as productive as the next best candidate. The word break character acts like a true word break.

Currier B: The word break character resulted in a score of 499, but the character producing the most productive lexicon was actually 'e', which yielded a score of 514. The character 'a' was third in rank, with a score of 482.

Conclusion: The lexicon created by the word break character in Currier B is not significantly more productive than the lexicon created by other high-frequency characters. In Currier B, the word break character does not act like a word break.

2 comments:

  1. Test 1 relies on the first letter of line-initial words being genuine, but there is good reason to doubt that this is the case.

    Another (possibly more reliable) test is to look at how often EVA m appears as the last letter of a paragraph.

    As for your second test, I think there's good reason to suspect that ee is (like ch and sh) a single glyph. Which would make splitting lines there potentially misleading.

    ReplyDelete
    Replies
    1. Interesting! I was thinking that the ends of lines might be filled out with nulls, and that the scribe had neglected to carry out this step in F81R. I think the next thing I do might be to look at the frequencies of words relative to their positions in lines. (I finally managed to get your book, by the way. Wow!)

      Delete