Maurizio Serva - Università dell'Aquila # Automated languages phylogeny from Levenshtein distance # The idea that the distance among pairs of languages can be evaluated from lexical differences seems to have its roots in the work of the French explorer Dumont D'Urville. He collected comparative words lists of various languages during his voyages aboard the Astrolabe from 1826 to 1829 and, in his work about the geographical division of the Pacific, he proposed a method to measure the degree of relation between languages. The method used by the modern glottochronology, developed by Morris Swadesh in the 1950s, measures distances from the percentage of shared cognates, which are words with a common historical origin. The weak point of this method is that subjective judgment plays a relevant role. In fact, even if cognacy decisions are made by trained and experienced linguists, they typically vary for different authors. Recently, we have proposed a new automated method which is motivated by the analogy with genetics. The new approach has some advantages: the first is that it avoids subjectivity, the second is that results can be replicated by other scholars assuming that the database is the same, the third is that it is not requested a specific expertize in linguistic, and the last, but surely not the least, is that it allows for a rapid comparison of a very large number of languages. The distance between two languages is defined by considering a renormalized Levenshtein distance between pair of words with the same meaning and averaging on the words contained in a list. The renormalization, which takes into account the length of the words, plays a crucial role, and no sensible results can be found without it. In this paper we give a short review of our automated method and we illustrate it by the Indo-European family and the cluster of Malagasy dialects, showing in both cases that it is able find out new important aspects of the languages relationships.