Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on
If you have any questions, please come and talk to us on
#apertium
on irc.freenode.net
or contact the GitHub migration team.Turkic MT Improvements GSoC2019 report
From Apertium
This aim of this project was improving the following language pairs of Apertium: tur->uig, uzb->tur, kir->tur, tat->tur.
Contents |
Commits
My commits can be found below, on each depository:
Tur-Uzb Tur Uzb Uig-Tur Uig Tur-Tat Tat Tur-Kir Kir
Transfer
Transfer rules were written for tur->uig and uzb->tur, using Regression Tests. They can be found here: Uighur and Uzbek.
Corpora and Coverage
L | Wiki | Bible |
---|---|---|
Tur-Uig | 53505239 words, 82.3% cov | 178233 words, 93.0% cov |
Uzb-Tur | 12730161 words, 80.8% cov | 184447 words, 83.5% cov |
Kir-Tur | 11435418 words, 82.5% cov | 184808 words, 92.0% cov |
Tat-Tur | 5792382 words, 86.4% cov | 178220 words, 91.4% cov |
Future Plans
Uzbek lexicon still needs to be improved. Analysis of Uzbek can be problematic because of the unusual alphabet of the language along with non-standard forms, which also needs further improvement. More lexical selection, disambiguation and transfer rules are needed to achieve a greater translation quality on all pairs.