Tuesday, February 12, 2019

Word Breaks in the Voynich Manuscript

I have been reworking my word break algorithm, and it is now much more accurate than it was before. For example, in the Vulgate Genesis, I get word breaks like the following:

IN PRINCIPIOCREAVIT DEUS CÆLUM ET TE RRAM TERRAAUTEM ERAT
INANIS ET VACUAET TE NEBRÆ ERANT SUPER FACIEM ABYSSIET
SPIRIT US DEI FEREBAT UR SUPER AQUASDIXIT QUE DEUS FIATLUXET
FACTAEST LUXET VIDIT DEUS LUCEM QUOD ESSET BONA

In this sample text there were 40 genuine word breaks, and my algorithm correctly identified 31 of them, with four false positives and eight genuine breaks missed.

Similarly with the King James Genesis:

INTHE BEGINNING GOD CREATED THE HEAVENAND THE EARTH 
AND THE EARTH WASWITHOUT FORMAND VOIDAND DARK NESS 
WASUPON THE FACEOF THE DEEPAND THE SPIRIT 
OFGODMOVED UPON THE FACEOF THE WATERS AND 

Here the sample text contained 41 genuine word breaks, the algorithm correctly identified 29 of them with one false positive and 11 genuine breaks missed.

So what about the Voynich Manuscript? Do the word breaks identified by my algorithm match up to the spaces in the manuscript? Here are the first four lines of the VM in Eva transcription:

fachy sy kal ar ataiin sholshory cthresy kor sholdy
sory ckhar or y kairchtaiin sharasecthar cthar dan
sy aiirsheky ory kaiin shodcthoary cthesdar aiin sy
soiin oteey oteosrol oty cthiar daiin okaiin or okan

In this case, the VM had 39 apparent word breaks, and my algorithm identified 28 of them with five false positives.

From this, it appears word breaks in the Voynich Manuscript act like word breaks in the other sample texts.

No comments:

Post a Comment