User:Rantakaulio/GSoC2021Proposal

From Apertium
Revision as of 11:36, 13 April 2021 by Rantakaulio (talk | contribs) (Created page with "Finnish, Olonets-Karelian and Karelian lexicon development The three languages that this application targets are closely related Balto-Finnic languages spoken in geographical...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Finnish, Olonets-Karelian and Karelian lexicon development

The three languages that this application targets are closely related Balto-Finnic languages spoken in geographical proximity to one another. Finnish is a large majority language with very advanced NLP infrastructure, whereas Olonets-Karelian and Karelian represent two orthographies in this Eastern Finnic dialect continuum. Both Olonets-Karelian and Karelian have written use and linguistic resources, such as Universal Dependencies treebanks, but the resource landscape is still very scarce. One of the current infrastructure problems is the imbalance: some languages and language pairs are much better covered than others. The proposed application aims to bring three closely related language pairs to comparable levels.