Linguistic distance


Linguistic distance is how different one language or dialect is from another. Although they lack a uniform approach to quantifying linguistic distance between languages, practitioners of linguistics use the concept in a variety of linguistic situations, such as learning additional languages, historical linguistics, language-based conflicts and the effects of language differences on trade.

Measures

Lexicostatistics

The proposed measures used for linguistic distance reflect varying understandings of the term itself. One approach is based on mutual intelligibility, i.e. the ability of speakers of one language to understand the other language. With this, the higher the linguistic distance, the lower is the level of mutual intelligibility.
Because cognate words play an important role in mutual intelligibility between languages, these figure prominently in such analyses. The higher the percentage of cognate words in the two languages with respect to one another, the lower is their linguistic distance. Also, the greater the degree of grammatical relatedness and lexical relatedness, the lower is the linguistic distance. As an example of this, the Hindustani word pānch is grammatically identical and lexically similar to its cognate Punjabi and Persian word panj as well as to the lexically dissimilar but still grammatically identical Greek pent- and English five. As another example, the English dish and German tisch 'table' are lexically similar but grammatically dissimilar. Cognates in related languages can even be identical in form, but semantically distinct, such as caldo and largo, which mean respectively 'hot' and 'wide' in Italian but 'broth, soup' and 'long' in Spanish. Using a statistical approach by comparing each language's mass of words, distances can be calculated between them. In technical terms, what is calculated is the Levenshtein distance. Based on this, one study compared both Afrikaans and West Frisian with Dutch to see which was closer to Dutch. It determined that the Dutch and Afrikaans were considerably closer than Dutch and West Frisian.
However, lexicostatistical methods, which are based on retentions from a common proto-language – and not innovations – are problematic due to a number of reasons, so some linguists argue they cannot be relied upon during the tracing of a phylogenetic tree. Unusual innovativeness or conservativeness of a language can distort linguistic distance and the assumed separation date, examples being Romani language and East Baltic languages respectively. On the one hand, continued adjacency of closely related languages after their separation can make some loanwords 'invisible', therefore, from lexicostatistical point of view these languages appear less distant then they actually are. On the other hand, strong foreign influence of languages spreading far from their homeland can make them share fewer inherited words than they should do.

Other internal aspects

Besides cognates, other aspects that are often measured are similarities of syntax and written forms.
To overcome the aforementioned problems of the lexicostatistical methods, Donald Ringe, Tandy Warnow and Luay Nakhleh developed a complex phylogenetical method relying on phonological and morphological innovations in 2000s.

Language learning

A 2005 paper by economists Barry Chiswick and Paul Miller attempted to put forth a metric for linguistic distances that was based on empirical observations of how rapidly speakers of a given language gained proficiency in another one when immersed in a society that overwhelmingly communicated in the latter language. In this study, the speed of English language acquisition was studied for immigrants of various linguistic backgrounds in the United States and Canada.