One way to do this would be to create a network showing the relationships between words within a cipher text, and try to find the best match between that network and a similar network for a known text.
I am trying these ideas out with chapters 81 and 87 of Melville's Moby Dick. You can see the basic idea if you look at the closest relationships between the top 20 words of each chapter:
Closest relationships between top 20 words in Chapter 81
Closest relationships between top 20 words in Chapter 87
You can see that in both chapters there is a little island of nouns (whale, whales, it, he), and another little island of determiners (a, the, his). Most prepositions are connected to other prepositions, and the pronouns "that" and "this" are connected to each other. In the underlying data there is much more information available about the closeness of the relationships, but that is not shown in these graphs.
The main problem I will run into is the problem of processing time. Luckily, in the quiet years since I was last writing about these things, I have learned to use cloud computing. It will just be a big job and it needs to be planned out carefully.
No comments:
Post a Comment