Turkic MT Improvements GSoC2019 report
This aim of this project was improving the following language pairs of Apertium: tur->uig, uzb->tur, kir->tur, tat->tur.
My commits can be found [here]. You can also download my work as a [ zip file].
Corpora and Coverage
|Tur-Uig||53505239 words, 82.3% cov||178233 words, 93.0% cov|
|Uzb-Tur||12730161 words, 80.8% cov||184447 words, 81.1% cov|
|Kir-Tur||11435418 words, 82.5% cov||184808 words, 92.0% cov|
|Tat-Tur||--||178220 words, 91.4% cov|
To correctly discern the lemma and the morphology so as to be translated correctly into the target language, Apertium uses Constraint Grammar (CG).
To determine in which context which translation of a given lemma would be selected, lexical selection is employed.