Turkic MT Improvements GSoC2019 report
This aim of this project was improving the following language pairs of Apertium: tur->uig, uzb->tur, kir->tur, tat->tur.
My commits can be found [here]. You can also download my work as a [ zip file].
https://github.com/apertium/apertium-tur-uzb/commits?author=koguzhan https://github.com/apertium/apertium-tur/commits?author=koguzhan https://github.com/apertium/apertium-uzb/commits?author=koguzhan https://github.com/apertium/apertium-uig-tur/commits?author=koguzhan https://github.com/apertium/apertium-uig/commits?author=koguzhan https://github.com/apertium/apertium-tur-tat/commits?author=koguzhan https://github.com/apertium/apertium-tat/commits?author=koguzhan https://github.com/apertium/apertium-tur-kir/commits?author=koguzhan https://github.com/apertium/apertium-kir/commits?author=koguzhan
Corpora and Coverage
|Tur-Uig||53505239 words, 82.3% cov||178233 words, 93.0% cov|
|Uzb-Tur||12730161 words, 80.8% cov||184447 words, 81.1% cov|
|Kir-Tur||11435418 words, 82.5% cov||184808 words, 92.0% cov|
|Tat-Tur||5792382 words, 86.4% cov||178220 words, 91.4% cov|
To correctly discern the lemma and the morphology so as to be translated correctly into the target language, Apertium uses Constraint Grammar (CG).
To determine in which context which translation of a given lemma would be selected, lexical selection is employed.