5dez 2022
06:30 UTC

Linguistic repositories as asset

Linguistic description requires a large amount of linguistic data for describing processes, varieties, and languages, validating linguistic theories, and even defining standard grammar. For minority or non-hegemonic languages, good quality authentic linguistic data is not always available, and it is not always easily accessible either. When it is possible, it is often organized into unsystematic repositories, linguistic data collections.
An analogy with seed vaults supports the professionalism of this process. The Global Seed Vault is in the Arctic Circle in Norway’s Svalbard islands. It is a structure dug into the ground with chambers at 100m that keeps the seeds at  18°C. The storage structure provides long-term storage. As in a bank, specimens are deposited; wheat, rice, and potatoes are the species with the most specimens. A seed bank for the storage of species may be used to restore food security, as in the case of threats such as cataclysms, wars, or even destruction by poverty. Because of the war in Syria, the seed vault in Aleppo has requested the opening of the Global Seed Vault. Seed vaults may also be considered as a heritage of genetic diversity of plant crops, almost a biblical Noah’s ark. Just as there are species seeds that respond to the food demand of humanity, there are also species stores in its evolutionary spectrum. The same logic can be applied to languages: currently, Ethnologue’s inventory of the number of documented living languages in the world points to a figure of close to 7,000. However, languages are being lost in a process called "language death". Each language that dies is a culture that is lost. In a process of dominance that erases cultures, a small number of
languages, the hegemonic languages, compute a large number of speakers. This process seems like a Tower of Babel in reverse. And contrary to the biblical myth, homogeneity is not a positive for languages. Language is a patrimony that allows access to the cultural assets of peoples and nations.
The analogy with the need to save seeds is the starting point for reflections about language and the transformation of linguistic data collections into linguistic repositories. There are challenges to be overcome in proposing a standardized and professional language repository to host the collections of linguistic data arising from the reported projects, in accordance with the principles of the Open Science movement. Thinking about the sustainability of projects to build linguistic documentation repositories, partnerships with the information technology area, or even with private companies, could minimize problems of obsolescence and safeguarding of data by promoting the circulation and automation of analysis through natural language processing algorithms. This series of planning actions may help to promote the longevity of the linguistic documentation repositories.