In 2011 The Guardian published an article about the last two speakers of Ayapenco would not speak to eachother, bringing attention to the notion of dying languages. Ayapaneco is just one example of a larger number of languages that are at risk or vulnerable to extinction.
In order to raise awareness about the language extinction phenomenon, UNESCO publishes and updates a list of endangered languages. UNESCO rates the vitality of a language based on demographic criteria like "number of speakers", as well as socio-political factors like "political repression" and "available educational material".
By geographically exploring the data from the UNESCO list, accompanied by more detailed descriptions of the history of the languages, we wish to communicate the story of the endangered languages of Europe. Furthermore, we seek to visualize how politics might influence language vitality by encoding within the map information about the number of official languages spoken in the individual countries.
The purpose of this project is to introduce people with a non-linguistic background to this issue, by creating a visualization that supports exploring the data through a simple, user friendly, interactive experience.
In the map, each circle represents a language. Each circle can be hovered over to show which language it represents as well as some information about the language. The countries in which the language is spoken will also be highlighted. The circles can be clicked which changes the color of the countries that their language is spoken in. Only one circle can be selected at a time.
In the legend, features can be toggled on and off to encode more or less information in the circles.
The languages and their corresponding information can also be scrolled through using the table below the map. As with the circles, the languages can be hovered over and clicked to display their location and the countries they are spoken in on the map. Furthermore, three of the columns can be sorted in ascending or descending order using the arrows next to the column headers.
The dataset visualized is obtained from Kaggle, which provides a csv of endangered languages published by UNESCO. The dataset consists of a list of endangered languages across the world with the following features: name, country spoken in, country codes, number of speakers, geo location, and degree of endangerment. The degree of endangerment category ranges from vulnerable to extinct.
Additional information is obtained from Wikipedia to include an official language count for countries and a description of each language that are directly encoded into the visualization. It should be noted that the latitude and longitude coordinates provided in the dataset are an approximation of the geographical epicenter of where the language is spoken, and as such the points should be considered as a purposive sample.
From the visualization, we can observe different tendencies. If we toggle the speaker and endangerment features, we find that languages with more speakers are generally less endangered. However, the opposite is not necessarily true: among languages with few speakers we do find brightly colored green dots, which indicate they are not severely endangered.
Furthermore, if we toggle the number of official languages per country, we see that generally, endangered languages spoken in countries with a higher number of official languages are less endangered. For example, if we look at Germany, which has a high number of offical languages, you can observe that most of the points are green irrespective of how many speakers there are.
However, as we briefly discussed above, the point data is not truly representative of where the language is spoken. This means that the observable correlation between a point on the map (language) and the country where it is located is not necessarily precise.
In order to mend the problem of the languages being associated with more than one country, it would be more suitable to represent the distribution of endangerment vs. the number of official languages by viewing languages as polygons instead of dots. However, for the purpose of exploring the data, being able to pinpoint each language using one coordinate gives a better overview of the distribution of the langugages.