Tuesday, July 22, 2014

Zipf's Law in the Rohonc Codex

I've added a column showing frequencies to the catalog of glyphs that accompanies my in-process transcription of the Rohonc codex.

Of course, the first thing one wants to do with glyph frequencies is to see if Rohoncian obeys Zipf's Law. At first blush, it would seem not, because we have the following distribution for the top ten glyphs:


Glyph Frequency Frequency * Rank
C49524952
I49029804
D381611448
CO298311932
N258812940
O257215432
H165711599
IX153812304
CX140212618
CX1Q8998990

However, this distribution supports something I have suspected already: CO, C and CX are probably the same glyph. I separated them in my transcription because I decided it would be easier to merge glyphs later. But, I suspected that they might be the same because they are apparently interchangeable in the Holy Noun.

If CO, C and CX are merged, then the distribution appears as follows:

Glyph Frequency Frequency * Rank
C, CO, CX93379337
I49029804
D381611448
N258810352
O257212860
H16579942
IX153810766
CX1Q8998990

It's still not perfect, but it is much closer to a normal Zipf distribution.

No comments:

Post a Comment