Of course, the first thing one wants to do with glyph frequencies is to see if Rohoncian obeys Zipf's Law. At first blush, it would seem not, because we have the following distribution for the top ten glyphs:
Glyph | Frequency | Frequency * Rank |
C | 4952 | 4952 |
I | 4902 | 9804 |
D | 3816 | 11448 |
CO | 2983 | 11932 |
N | 2588 | 12940 |
O | 2572 | 15432 |
H | 1657 | 11599 |
IX | 1538 | 12304 |
CX | 1402 | 12618 |
CX1Q | 899 | 8990 |
However, this distribution supports something I have suspected already: CO, C and CX are probably the same glyph. I separated them in my transcription because I decided it would be easier to merge glyphs later. But, I suspected that they might be the same because they are apparently interchangeable in the Holy Noun.
If CO, C and CX are merged, then the distribution appears as follows:
If CO, C and CX are merged, then the distribution appears as follows:
Glyph | Frequency | Frequency * Rank |
C, CO, CX | 9337 | 9337 |
I | 4902 | 9804 |
D | 3816 | 11448 |
N | 2588 | 10352 |
O | 2572 | 12860 |
H | 1657 | 9942 |
IX | 1538 | 10766 |
CX1Q | 899 | 8990 |
It's still not perfect, but it is much closer to a normal Zipf distribution.
No comments:
Post a Comment