I have written some code to map networks onto each other, to try to find the best match between the topologies of the two networks.
It could run for a very long time, depending on the size of the network. Using the top 20 words from chapters 81 and 87 of Moby Dick, and running it for 8.5 minutes, I get a fairly good initial set of matches:
The groups in this graph represent the matches established between words in the two chapters. For six words (the, for, and, of, was, with, this) I got exact matches. The remaining groups were incorrectly matched after running for 8.5 minutes.
Even data of this quality could be useful. Suppose you have a text that must be in a known language, but you don't know what language it is. You could try all of the likely languages and generate a set of possible matches between words in your text and words in the candidate languages.
No comments:
Post a Comment