User:Rantakaulio/GSoC2021Proposal

Finnish, Olonets-Karelian and Karelian lexicon development

The three languages that this application targets are closely related Balto-Finnic languages spoken in geographical proximity to one another. Finnish is a large majority language with very advanced NLP infrastructure, whereas Olonets-Karelian and Karelian represent two orthographies in this Eastern Finnic dialect continuum. Both Olonets-Karelian and Karelian have written use and linguistic resources, such as Universal Dependencies treebanks, but the resource landscape is still very scarce. One of the current infrastructure problems is the imbalance: some languages and language pairs are much better covered than others. The proposed application aims to bring three closely related language pairs to comparable levels.