Maurizio	Serva -	Universit&agrave; dell'Aquila #		
Automated languages phylogeny from Levenshtein distance	#
The 
idea that the distance among pairs of languages can be 
evaluated from lexical differences seems to have its roots in 
the work of the French explorer Dumont D'Urville. He collected 
comparative words lists of various languages during his 
voyages aboard the Astrolabe from 1826 to 1829 and, in his 
work about the geographical division of the Pacific, he 
proposed a method to measure the degree of relation between 
languages.  The method used by the modern glottochronology, 
developed by Morris Swadesh in the 1950s, measures distances 
from the percentage of shared cognates, which are words with a 
common historical origin. The weak point of this method is 
that subjective judgment plays a relevant role. In fact, even 
if cognacy decisions are made by trained and experienced 
linguists, they typically vary for different authors.  
Recently, we have proposed a new automated method which is 
motivated by the analogy with genetics. The new approach has 
some advantages: the first is that it avoids subjectivity, the 
second is that results can be replicated by other scholars 
assuming that the database is the same, the third is that it 
is not requested a specific expertize in linguistic, and the 
last, but surely not the least, is that it allows for a rapid 
comparison of a very large number of languages. The distance 
between two languages is defined by considering a renormalized 
Levenshtein distance between pair of words with the same 
meaning and averaging on the words contained in a list. The 
renormalization, which takes into account the length of the 
words, plays a crucial role, and no sensible results can be 
found without it.  In this paper we give a short review of our 
automated method and we illustrate it by the Indo-European 
family and the cluster of Malagasy dialects, showing in both 
cases that it is able find out new important aspects of the 
languages relationships.