Difference between revisions of "Turkic MT Improvements GSoC2019 report"
Jump to navigation
Jump to search
Line 35: | Line 35: | ||
| |
| |
||
|} |
|} |
||
==Disambiguation== |
|||
To correctly discern the lemma and the morphology so as to be translated correctly into the target language, Apertium uses Constraint Grammar (CG). Currently Uyghur has about 45 CG rules for disambiguation. |
|||
==Lexical Selection== |
|||
To determine in which context which translation of a given lemma would be selected, lexical selection is employed. Currently uig-tur has 35 lexsel rules. |
Revision as of 15:34, 20 August 2019
This aim of this project was improving the following language pairs of Apertium: tur->uig, uzb->tur, kir->tur, tat->tur.
Commits
My commits can be found [here]. You can also download my work as a [ zip file].
Transfer
Transfer rules were written for tur->uig and uzb->tur, using Regression Tests. They can be found here: Uighur and Uzbek.
Corpora and Coverage
Corpus | Words | Coverage |
---|---|---|
Disambiguation
To correctly discern the lemma and the morphology so as to be translated correctly into the target language, Apertium uses Constraint Grammar (CG). Currently Uyghur has about 45 CG rules for disambiguation.
Lexical Selection
To determine in which context which translation of a given lemma would be selected, lexical selection is employed. Currently uig-tur has 35 lexsel rules.