Turkic MT Improvements GSoC2019 report

From Apertium

Revision as of 10:43, 25 August 2019 by Oğuz (talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)

Jump to navigation Jump to search

This aim of this project was improving the following language pairs of Apertium: tur->uig, uzb->tur, kir->tur, tat->tur.

Contents

1 Commits
2 Transfer
3 Corpora and Coverage
4 Disambiguation
5 Lexical Selection

Commits

My commits can be found [here]. You can also download my work as a [ zip file].

https://github.com/apertium/apertium-tur-uzb/commits?author=koguzhan https://github.com/apertium/apertium-tur/commits?author=koguzhan https://github.com/apertium/apertium-uzb/commits?author=koguzhan https://github.com/apertium/apertium-uig-tur/commits?author=koguzhan https://github.com/apertium/apertium-uig/commits?author=koguzhan https://github.com/apertium/apertium-tur-tat/commits?author=koguzhan https://github.com/apertium/apertium-tat/commits?author=koguzhan https://github.com/apertium/apertium-tur-kir/commits?author=koguzhan https://github.com/apertium/apertium-kir/commits?author=koguzhan

Transfer

Transfer rules were written for tur->uig and uzb->tur, using Regression Tests. They can be found here: Uighur and Uzbek.

Corpora and Coverage

L	Wiki	Bible
Tur-Uig	53505239 words, 82.3% cov	178233 words, 93.0% cov
Uzb-Tur	12730161 words, 80.8% cov	184447 words, 81.1% cov
Kir-Tur	11435418 words, 82.5% cov	184808 words, 92.0% cov
Tat-Tur	5792382 words, 86.4% cov	178220 words, 91.4% cov

Disambiguation

To correctly discern the lemma and the morphology so as to be translated correctly into the target language, Apertium uses Constraint Grammar (CG).

Lexical Selection

To determine in which context which translation of a given lemma would be selected, lexical selection is employed.

Retrieved from "https://wiki.apertium.org/w/index.php?title=Turkic_MT_Improvements_GSoC2019_report&oldid=70348"