GSoC 2018 proposal draft to create and develop Turkmen-Turkish translation pair.
Name: Özge Kılıç
Time zone: UTC+3
Why is it that you are interested in Apertium?
I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is an excellent start.
Proposal: Turkmen-Turkish MT
Which of the published tasks are you interested in? What do you plan to do?
My plan is to adopt an unreleased language pair, tuk-tur. I'll be working on it to bring it up to release quality, which will involve writing and refining rules for transfer and lexical selection that will result in a valid text in the target language.
Why should google and apertium sponsor it?
Since there is a limited number of sources of Turkmen-Turkish, this machine translation is a great opportunity for two nations to understand each other.
Facilitating MT of a text from Turkmen to Turkish.
bidix words, up to 50%
Adding words to bidix, get coverage to around 80%
Begin CG for TUK
POS tagging/constraint grammar
Get CG rules up to 100, ~50% disambiguation
Creation of an Annotated Corpus
Plan by Weeks
1. 30% coverage
2. Basic CG
3. 40% coverage
5. 50% coverage
6. Transfer, lexical selection, 65% coverage
7. CG, 80% coverage
8. Transfer, lexsel, 84% coverage
10. CG, Transfer
11. Transfer, lexsel, 86% coverage
12. Transfer, 88% coverage
13. Preparing text for annotation
14-16. Annotating the Turkmen corpus, %90 coverage
I'm facilitating the translation of a about 500 words Turkmen text into Turkish.
WER comparable to other inter-Turkic/Romance pairs. Data for machine-learned disambiguation.
Summer Obligations and Commitments
I'll be busy with my finals in the first week of June but I'll be free at other times.
I'm a 3nd year student of English&Turkish Translation&Interpreting at Marmara University. I'm a native speaker of Turkish. I have taken Russian classes.