## Friday, June 21, 2013

### A measure of the distance of phonemic change

It is common in the software world to use a metric called Leveshtein Distance to measure the distance between two strings.  The metric reflects the smallest number of insertions, deletions and substitutions that would be needed to change one string into another.  I am wondering if a variation on this theme could be used to measure the magnitude of a phonemic change.

For example, say you have a proto-language L0 whose inventory includes /p/ and /f/, where a sound change occurs that merges the two by means of p > f in child language L1.  Then you have a second sound change in L1 that turns f > h in grandchild language L11.  Meanwhile, in parallel, p > b has occurred in child language L2, and b > v / V_V in grandchild language L21.

For example, suppose you have the following samples from L11 and L21:

L11   huku, apple < L0 *puku
L11   toho, sand < L0 *tofo
L11   aka, turtle < L0 *aka

L12   buku, apple < L0 *puku
L12   tovo, sand < L0 *tofo
L12   ama, turtle < L0 *ama, crocodile

What I'm looking for is something that will say that "apple" and "sand" are close, but "turtle" is not.  In terms of Levenshtein distance, they are all equally distant from each other (one substitution), but you would have to jump through more hoops to propose a proto-form that would account for aka and ama as cognates.

If you had a network of plausible sound changes, you could chart the shortest distance from one sound to the other, like this:

h < f < p > b = 4
h < f > v = 3
m < mb < mp < mk > nk > k = 6

If you had a large enough vocabulary sample, you could score each proposed path of change in terms of how often it could explain the observed evidence, then eliminate those that were excessively implausible or inconsistent with other changes.