Manana Tandaschwili
Frankfurt University
Head of the Department for Caucasian Languages
Frankfurt a/M, Germany
Manana Tandaschwili
Frankfurt University
Head of the Department for Caucasian Languages
Frankfurt a/M, Germany
Georgian Language in the Context of the New Paradigm of Science
(New Classification of Languages and Modern Status of the Georgian Language)
Abstract
The 21st century has brought significant changes to various fields of science, largely due to the rapid development of information technologies. The philosophy of science has evolved, introducing new challenges for languages. While the traditional classification paradigm included genetic, typological, and relational classifications, a new paradigm has emerged. This new paradigm categorizes languages according to different parameters: 1) the legal status of the language, 2) the viability of the language and its areas of use, and 3) the degree of operation of languages in the digital age. According to this new classification, languages are now categorized into high-resource languages (HRL) and low-resource languages (LRL). Consequently, the traditional Language Atlas, which reflects the genealogical classification or distribution area of languages, has been replaced by a new Language Atlas that shows the percentage of digital resources available for natural language processing (NLP) in relation to the number of languages spoken worldwide.
In both Georgian and international scientific literature and portals (Nancy et al., 2017), the Georgian language is considered a Low Resource Language (LRL). This presentation aims to examine the appropriateness of this classification for the Georgian language in the context of the new scientific paradigm.
According to theoretical scientific literature, LRLs can be understood as few studied, resource-scarce, less computerized, less privileged, and less commonly taught (Cieri et al., 2016:4543-44; Magueresse et al., 2020:1). Artificial intelligence experts define LRLs as languages with a limited amount of linguistic data and resources for natural language processing (NLP) tasks. In the context of NLP and machine learning, having “low resources” typically means a lack of annotated texts, linguistic databases, or other resources necessary to train and develop effective language models. Reasons a language might be considered low resource include having minimal digital presence, lacking annotated datasets, being underrepresented in academic research, or having insufficient computational resources.
Based on the definitions provided and the fact that there are both large databases and advanced language technologies available for Georgian (Tandashvili & Kamarauli, 2021:87 ff), I argue that the term “Low Resource Language” is not adequately applicable to Georgian, both in terms of content and context.
In my opinion, the third parameter of the new language classification (the degree of operation of languages in the digital age) should be revised. Specifically, a multi-level system should be developed instead of the current binary classification (HRL vs. LRL) to better reflect the functionality of languages today. I propose the following intermediate levels:
Keywords: Digital Kartvelology, Georgian Language, Low Resource Languages