Monday, September 23, 2013

How would you use semantic similarities for decipherment?

A couple weeks ago, I started taking an online course in cryptography taught by Dan Boneh from Stanford on Coursera.  (It's excellent, by the way.)  I've been a little slow to post anything new, but I wanted to incompletely finish a thought.

Why should we care about mapping semantic relationships in a text?  Well, I think it could be a valuable tool for deciphering an unknown language.

First, what I've been calling "semantic similarity" is really contextual similarity, but it happens to coincide with semantic similarity in content words (content-bearing nouns, verbs, adjectives, adverbs). In function words, it's really functional similarity.  In phonemes (which I don't think I have mentioned before) it correlates with phonemic similarity.

Here's an example of a map generated from the alphabetic letters in the book of Genesis in King James English:


The most obvious feature of this map is that all of the vowels (and the word-boundary #) are off on their own, separated from all of the consonants.  If we had an alphabetic text in an unknown script in which vowels and consonants were written, we could probably identify the vowels.

There are finer distinctions that I may get into one day when I have more time, but the main point here is that contextual similarities of this type measure something valuable from a decipherment perspective. I believe it should be possible to get the general shape of the phonology of an alphabetic writing system, to separate out prepositions or postpositions (if they exist), and to eventually distinguish parts of speech.

No comments:

Post a Comment