Difference between revisions of "User:Ozgay"

From Apertium
Jump to navigation Jump to search
(Created page with "GSoC 2018 proposal draft to create and develop Turkmen-Turkish translation pair. == Personal Information == Name: Özge Kılıç E-mail: ozgekilic.9@gmail.com ITC: ozgay...")
 
Line 16: Line 16:
'''Why is it that you are interested in Apertium?'''
'''Why is it that you are interested in Apertium?'''


I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is a excellent start.
I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is an excellent start.



== Proposal: Turkmen-Turkish MT ==
== Proposal: Turkmen-Turkish MT ==

Revision as of 19:28, 7 April 2019

GSoC 2018 proposal draft to create and develop Turkmen-Turkish translation pair.

Personal Information

Name: Özge Kılıç

E-mail: ozgekilic.9@gmail.com

ITC: ozgay

Time zone: UTC+3


Why is it that you are interested in Apertium?

I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is an excellent start.

Proposal: Turkmen-Turkish MT

Which of the published tasks are you interested in? What do you plan to do?

My plan is to adopt an unreleased language pair, tuk-tur. I'll be working on it to bring it up to release quality, which will involve writing and refining rules for transfer and lexical selection that will result in a valid text in the target language.


Why should google and apertium sponsor it?



Resources


Wikipedia


Work Plan

-Post-application period:

Facilitating MT of a text from Turkmen to Turkish.


-Community-bonding period:

bidix words, up to 50%


-Month 1:

Writing scripts

Adding words to bidix, get coverage to around 80%

Chunking

Transfer rules

Begin CG for UIG


-Month 2:

POS tagging/constraint grammar

Transfer rules

Get CG rules up to 100, ~50% disambiguation

>90% coverage


-Month 3:

Creation of an Annotated Corpus


Plan by Weeks

1. 30% coverage

2. Basic CG

3. 40% coverage

4. Transfer

5. 50% coverage

6. Transfer, lexical selection, 65% coverage

7. CG, 80% coverage

8. Transfer, lexsel, 84% coverage

9. Transfer

10. CG, Transfer

11. Transfer, lexsel, 86% coverage

12. Transfer, 88% coverage

13. Preparing text for annotation

14-16. Annotating the Turkmen corpus, %90 coverage


Coding Challenge

I'm facilitating the translation of a about 500 words Turkmen text into Turkish.


Deliverables

WER comparable to other inter-Turkic/Romance pairs. Data for machine-learned disambiguation.


Summer Obligations and Commitments

I'll be busy with my finals in the first week of June but I'll be free at other times.


Qualification

I'm a 3nd year student of English&Turkish Translation&Interpreting at Marmara University. I'm a native speaker of Turkish. I have taken Russian classes.