Turkic MT Improvements GSoC2019 report
This aim of this project was improving the following language pairs of Apertium: tur->uig, uzb->tur, kir->tur, tat->tur.
My commits can be found [here]. You can also download my work as a [ zip file].
Corpora and Coverage
|Tur-Uig||53505239 words, 82.3% cov||178233 words, 93.0% cov|
|Uzb-Tur||12730161 words, 80.8% cov||184447 words, 81.1% cov|
|Kir-Tur||11435418 words, 82.5% cov||184808 words, 92.0% cov|
|Tat-Tur||--||178220 words, 91.4% cov|
To correctly discern the lemma and the morphology so as to be translated correctly into the target language, Apertium uses Constraint Grammar (CG). Currently Uyghur has about 45 CG rules for disambiguation.
To determine in which context which translation of a given lemma would be selected, lexical selection is employed. Currently uig-tur has 35 lexsel rules.