A Colorful Map Visualizes the Lexical Distances Between Europe’s Languages: 54 Languages Spoken by 670 Million People

Stephen F. Stein­bach, a res­i­dent of Vien­na and a “car­tog­ra­phy, lan­guage and trav­el enthu­si­ast, with an engi­neer­ing back­ground,” is not a lin­guist. Stein­bach, who runs the site Alter­na­tive Trans­port, seems much more inter­est­ed in map­ping and trans­porta­tion than mor­phol­o­gy and ety­mol­o­gy. But he has made a con­tri­bu­tion to a lin­guis­tic con­cept called “lex­i­cal dif­fer­ence” with the map you see above, a col­or­ful 2015 visu­al­iza­tion of Euro­pean lan­guages, grouped togeth­er in clus­ters accord­ing to their sub­fam­i­lies (Ital­ic-Romance, Baltic, Slav­ic, Ger­man­ic, etc.—see a much larg­er ver­sion here).

Straight and arc­ing lines span the rel­a­tive dis­tance these lan­guages have pre­sum­ably trav­eled from each oth­er. Sol­id lines between lan­guages rep­re­sent a very close prox­im­i­ty, dashed lines of dif­fer­ent thick­ness­es show more dis­tance, and thin dot­ted lines tra­verse the great­est expans­es.

Hun­gar­i­an and Ukrain­ian, for exam­ple, have a lex­i­cal dis­tance score of 90, where Pol­ish and Ukrain­ian, both Slav­ic lan­guages, are only 30 degrees from each oth­er. “The map shows the lan­guage fam­i­lies that cov­er the con­ti­nent,” writes Big Think, “large, famil­iar ones like Ger­man­ic, Ital­ic-Romance and Slav­ic, small­er ones like Celtic, Baltic and Ural­ic; out­liers like Semit­ic and Tur­kic; and isolates—orphan lan­guages, with­out a fam­i­ly: Alban­ian and Greek.”  (Tech­ni­cal­ly, mod­ern Greek does have a family—Hellenic—though it is the only sur­viv­ing mem­ber.)

As we might expect from this sub­set of the durable Indo-Euro­pean schema, the lan­guages with­in each clus­tered group occu­py the short­est dis­tance from each oth­er, with some excep­tions. Roman­ian, for exam­ple, is slight­ly clos­er to Alban­ian than it is to French, its Romance cousin. The Slav­ic lan­guages Russ­ian and Pol­ish seem to have trav­eled a bit fur­ther apart than Pol­ish has from the Baltic lan­guage of Lithuan­ian. What does this mean, exact­ly? Accord­ing to the mea­sure of “lex­i­cal dis­tance” pro­posed by Ukrain­ian lin­guist Kon­stan­tin Tishchenko, it means that clos­er lan­guages might be more mutu­al­ly intel­li­gi­ble, at least from a lex­i­cal stand­point, since they may share more cog­nates (sim­i­lar-sound­ing and mean­ing words) and bor­row­ings.

Gas­ton Ümlaut, the han­dle of a lin­guist on the Stack Exchange Lin­guis­tics beta, cau­tions that the con­cept of “lex­i­cal dis­tance” may be “pret­ty use­less” giv­en that the com­par­isons also include false cognates—words that sound or look sim­i­lar but have no rela­tion­ship to each oth­er. These could account for some seem­ing incon­sis­ten­cies. (Ümlaut admits he has not read the orig­i­nal arti­cle, writ­ten in Russ­ian. If you are able, you can find it online in the book Metathe­o­ry of Lin­guis­tics, here.) Stein­bach has respond­ed in the same thread.

The idea received a much more tren­chant cri­tique more recent­ly. Stein­bach clar­i­fied that the the­o­ry, and the map, only com­pare writ­ten words and not syn­tax or speech. “It has noth­ing to do with gram­mar, syn­tax, rhythm or oth­er impor­tant fea­tures that are impor­tant for intel­li­gi­bil­i­ty,” he writes. “It also com­pares a small list of words and not the entire vocab­u­lary of one lan­guage to anoth­er.” This expla­na­tion does cast doubt on whether “lex­i­cal dis­tance” is a mean­ing­ful con­cept. I’ll leave it to the lin­guists to decide. (Stein­bach reached out to Tis­chchenko but has yet to receive a reply.)

Tischchenko’s orig­i­nal “lex­i­cal dis­tance” map, fur­ther up, drawn in 1997, gets the idea across with min­i­mal fuss, but it leaves much to be desired graph­i­cal­ly. (A large, hand-drawn col­or ver­sion improves upon the print­ed map.) Stein­bach took his ver­sion from a 2008 Eng­lish-lan­guage adap­ta­tion made by Tere­sa Elms in 2008 (above). In his blog post here, he explains all of the changes he made to Elms and Tischchenko’s designs. These include adjust­ing the size of the “bub­bles” to pro­por­tion­al­ly rep­re­sent the num­ber of speak­ers of each lan­guage. Stein­bach also added sev­er­al lan­guages, as well as “grave­stones” for the dead Ana­to­lian and Tochar­i­an branch­es. In all, his map shows “54 lan­guages, rep­re­sent­ing 670 mil­lion peo­ple.” He adds, vague­ly, that “it checks out.”


After post­ing his Lex­i­cal Dis­tance Map, Stein­bach pro­posed a “3D” ver­sion, with the added dimen­sion of time. (See his pre­lim­i­nary sketch above.) The maps are intrigu­ing, the the­o­ry of “lex­i­cal dis­tance” an inter­est­ing one, but we should bear in mind, as Stein­bach writes, that he is “no lin­guist,” and that this idea is hard­ly an ortho­dox one with­in the dis­ci­pline.

via Big Think

Relat­ed Con­tent:

The Tree of Lan­guages Illus­trat­ed in a Big, Beau­ti­ful Info­graph­ic

How Lan­guages Evolve: Explained in a Win­ning TED-Ed Ani­ma­tion

Speak­ing in Whis­tles: The Whis­tled Lan­guage of Oax­a­ca, Mex­i­co       

Josh Jones is a writer and musi­cian based in Durham, NC. Fol­low him at @jdmagness

by | Permalink | Comments (13) |

Sup­port Open Cul­ture

We’re hop­ing to rely on our loy­al read­ers rather than errat­ic ads. To sup­port Open Cul­ture’s edu­ca­tion­al mis­sion, please con­sid­er mak­ing a dona­tion. We accept Pay­Pal, Ven­mo (@openculture), Patre­on and Cryp­to! Please find all options here. We thank you!

Comments (13)
You can skip to the end and leave a response. Pinging is currently not allowed.
  • StevieC says:

    I’ll throw in the fact that Scot­tish Gael­ic and Irish Gael­ic are not the same lan­guage. All of the Irish speak­ers in my Scot­tish Gael­ic class­es do not think it is the same.

  • Valeria says:

    The only one issue is that Kon­stan­tin Tishchenko is Ukrain­ian lin­guist, not Russ­ian. Even the screen­shot of the book page with the map in this arti­cle is from the Ukrain­ian book

  • Josh Jones says:

    Cor­rect­ed, thank you.

  • Nik says:

    The map should include Iran­ian and Indi­an lan­guages (at least northen part), maybe ancient and mod­ern. The lan­guage group is “Indo-Euro­pean”. It seems to me that lex­i­cal dis­tance between Slav­ic and Indo-aryan lan­guages might be small­er than between Slav­ic and Romance or Ger­man lan­guages.

  • Michael KILLIAN says:

    Scot­tish and Irish Gael­ic are close­ly relat­ed. I can fol­low much of Scot­tish Gael­ic on TV even though my Irish Gaeilge is rusty. Irish Gael­ic as spo­ken in Done­gal is cer­tain­ly much clos­er to Scot­tish Gael­ic than to Gael­ic spo­ken in West or South of Ire­land. ‘Same lan­guage’ is a very rel­a­tive term. Strict lan­guages were cod­i­fied by nation-states’ nation­al acad­e­mies and cen­tral gov­ern­ments. But it seems clear to me that among Celtic lan­guages, there are com­mon ele­ments between Scot­tish and Irish Gael­ic and between Bre­ton and Welsh. Bre­ton or Welsh speak­ers find learn­ing the oth­er lan­guage of this pair much eas­i­er than tack­ling Gael­ic.

  • Daniel says:

    Metathe­o­ry of Lin­guis­tics is writ­ten in Ukrain­ian, not Russ­ian. And the name of lin­guist is Kos­tiantyn Tyshchenko, not Kon­stan­tin Tishchenko, because Kon­stan­tin is a Russ­ian name, not Ukrain­ian.

  • Steven says:

    There is mon­u­men­tal dis­con­nect in the west­ern world with respect to his­tor­i­cal ori­gin of the names Rus’(~Ukraine), Rus­sia, Grand Tar­taria, Hord, or what they rep­re­sent. The afore­men­tioned epony­mous are well know in the post Sovi­et republics that man­aged to escape the “Union” and there is almost uni­ver­sal agree­ment among them about real his­to­ry and the nature of Krem­lins pseu­do-his­to­ry that it uses as a weapon of war.

    “Russ­ian” is not an organ­ic lan­guage, but a force­ful­ly imposed Old-Bul­gar­i­an (Sec­ond Slav­ic trans­la­tion of the Bible*) over all of the sub­jects that were under Krem­lins** con­trol. The orig­i­nal pol­i­cy was imple­ment­ed by Peter the Great in ~1700 and the fol­low­ing czars kept up the pol­i­cy in order to reuni­fy the old ter­ri­to­ries of the Horde under one “Chris­t­ian” lan­guage. Anoth­er rea­son for specif­i­cal­ly adapt­ing a Slav­ic lan­guage, could have been Krem­lins plan of expan­sion into Slav­ic ter­ri­to­ries and appro­pri­a­tion of their history/culture. Orig­i­nal lan­guages of Grand Tar­taria encom­passed almost whole Finno-Ural­ic fam­i­ly, and to a less­er degree, Turik. Moscow was just one of the cen­ters for col­lect­ing trib­ute which was sent to the Khans in Crimea, Kazan, Astra­han, and Kasi­mov. This was the case since the time of Genghis Khan and until ~1700. Moscow’s rulers them selves were Turik Khans of less­er rank, how­ev­er, today only their Chris­t­ian names are shown in mod­ern his­to­ry books. Since the Khans in the Horde allowed Chris­tian­i­ty to coex­ist with Islam, it was a com­mon prac­tice for the Khans to have a Chris­t­ian name to sup­ple­ment their offi­cial name which can be eas­i­ly seen in Turik records.

    Also, at the time of USSR, the dis­tri­b­u­tion of lan­guages could have been altered even more when Krem­lin killed off 10s of mil­lions of peo­ple (most­ly in Ukraine and Khaz­ak­stan) and force­ful­ly relo­cat­ed Ural­ic peo­ple into new­ly depop­u­lat­ed areas. Fur­ther­more, Krem­lin had almost 300 year long effort of destroy­ing the orig­i­nal cul­tures, writ­ings, and lan­guages of the indige­nous peo­ple (includ­ing Finno-Ural­ic peo­ple of Moscow), in order to achieve a uni­form Empire under the new (Old-Bul­gar­i­an) lan­guage.

    All of these fac­tors have to be tak­en into account by any lin­guist if he hopes to ever recon­struct the actu­al lin­guis­tic his­to­ry of PIE. For those that under­stand Russ­ian, there is a good resource on YouTube chan­nel “История Руси”. Eng­lish speak­ers, instead of rely­ing on Krem­lins pro­pa­gan­da, can con­tact actu­al his­to­ri­ans in the Baltic or Slav­ic states.

    *First Slav­ic trans­la­tion of the Bible was done in Moravia using Glagolic alpha­bet, the sec­ond trans­la­tion known as Old-Bul­gar­i­an was an adap­ta­tion of Glagolic ver­sion to new Cyril­ic alpha­bet and the Bul­gar­i­an lan­guage. It should be not­ed for lin­guis­tic pur­pos­es, Bul­gar­i­ans are Turik peo­ple that raid­ed and set­tled Macedonia/Rus’, lat­er they adapt­ed the sur­round­ing Slav­ic lan­guage.

    **Krem­lin means Fortress in Turik while Moscow means swamps or dirty water in Finnish.

  • Fulka says:

    Where did Putin touch you?

  • Vladimir says:

    Ukrain­ian sci­ence is strong­ly influ­enced by Ukrain­ian nation­al­ism. The idea fix of Ukrain­ian nation­al­ism is to prove that Ukraine has noth­ing to do with Rus­sia. Not sur­pris­ing­ly, in this scheme, pub­lished by the Ukrain­ian researcher Tishchenko, Bul­gar­i­an is the clos­est lan­guage to Russ­ian. From the point of view of con­ven­tion­al sci­en­tif­ic data, this is absurd.

    My native lan­guage is Russ­ian. Bul­gar­i­an is the same as Chi­nese for me. I almost do not under­stand Bul­gar­i­an words and do not catch what the con­ver­sa­tion is about. When I come across a famil­iar word, it does­n’t mean what I think. In Ukrain­ian, I know half of the words and in gen­er­al I under­stand spo­ken lan­guage, although I have nev­er lived in Ukraine.

  • Mariana says:

    Vladimir I don’t know how come you don’t under­stand Bul­gar­i­an but even I can under­stand Bul­gar­i­an not too bad and it’s most­ly because I know Russ­ian and of course oth­er Slav­ic lan­guages are help­ful. Regard­ing Rus­sians under­stand­ing Ukrain­ian sounds fun­ny. If I had a dime for every sin­gle time when I met a Russ­ian who could­n’t under­stand a word from me speak­ing Ukrain­ian and ask­ing to repeat in Russ­ian instead — I would be a mil­lion­aire.

  • Alexander Bojantchev says:

    Why are ukraini­ans in gen­er­al so aller­gic to the con­cept of dialect con­tin­u­um? No lan­guages are homoge­nous and clear­ly lex­i­cal­ly demar­cat­ed, espe­cial­ly if they were with­in the bor­der of a multi­na­tion­al empire such as the Russ­ian, Ottoman or Aus­tro­hun­gar­i­an empires allow­ing for increased inter­ac­tion. Dif­fer­ent ukrain­ian dialects have dif­fer­ent lex­i­cal dis­tance to Russ­ian. Same as dialect con­tin­u­ums of Langue d’Oil and Langue D’oc, Ser­bocroa­t­ian and Bul­gar­i­an, etc. Mod­ern aca­d­e­m­ic ukrain­ian has a lot more polonisms and slo­vak cog­nates com­pared to many east­ern or cen­tral vari­eties of ukrain­ian, which is nor­mal since, the cen­ter of ukrain­ian nation­al­ism is Gali­cia.

  • Kevin Griffin says:

    It’s inter­est­ing that a dashed line, imply­ing greater than 25% dif­fer­ence in vocab­u­lary, links Irish and Scot­tish Gael­ic.

    Is this because they have slight­ly dif­fer­ent spelling con­ven­tions? E.g., Irish for school is scoil but it is spelled sgoil in Scot­tish Gael­ic.

    Also, Irish has short­ened spelling for many words where Scot­tish Gael­ic retains an un-short­ened spelling. E.g., leni­tion is séimhiú (un-short­ened spelling séimhi­ughadh) in Irish and sèimheachadh in Scot­tish Gael­ic. They’re all the same word with the same mean­ing but slight­ly dif­fer­ent spellings.

  • Yuriy says:

    Josh, thanks for this sum­ma­ry. Exact­ly what I was search­ing for and it popped up as the sec­ond link on Google.

    In a addi­tion to a pre­vi­ous com­ment, please note that your “orig­i­nal arti­cle, writ­ten in Russ­ian” links to an arti­cle in Ukrain­ian.

Leave a Reply

Open Culture was founded by Dan Colman.