Difference between revisions of "User:Ozgay"

From Apertium
Jump to navigation Jump to search
(Created page with "GSoC 2018 proposal draft to create and develop Turkmen-Turkish translation pair. == Personal Information == Name: Özge Kılıç E-mail: ozgekilic.9@gmail.com ITC: ozgay...")
 
(add Category:GSoC 2019)
 
(6 intermediate revisions by one other user not shown)
Line 16: Line 16:
 
'''Why is it that you are interested in Apertium?'''
 
'''Why is it that you are interested in Apertium?'''
   
I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is a excellent start.
+
I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is an excellent start.
 
   
 
== Proposal: Turkmen-Turkish MT ==
 
== Proposal: Turkmen-Turkish MT ==
Line 28: Line 27:
 
'''Why should google and apertium sponsor it?'''
 
'''Why should google and apertium sponsor it?'''
   
  +
Since there is a limited number of sources of Turkmen-Turkish, this machine translation
 
  +
is a great opportunity for two nations to understand each other.
 
   
 
'''Resources'''
 
'''Resources'''
Line 36: Line 35:
 
Wikipedia
 
Wikipedia
   
  +
Türkmence Sözlük
 
   
 
== Work Plan ==
 
== Work Plan ==
   
-Post-application period:
 
   
Facilitating MT of a text from Turkmen to Turkish.
 
   
   
 
'''Plan by Weeks'''
-Community-bonding period:
 
   
bidix words, up to 50%
 
   
 
1. 50% coverage
 
-Month 1:
 
 
Writing scripts
 
 
Adding words to bidix, get coverage to around 80%
 
 
Chunking
 
 
Transfer rules
 
 
Begin CG for UIG
 
 
 
-Month 2:
 
 
POS tagging/constraint grammar
 
 
Transfer rules
 
 
Get CG rules up to 100, ~50% disambiguation
 
 
>90% coverage
 
 
 
-Month 3:
 
 
Creation of an Annotated Corpus
 
 
 
'''Plan by Weeks'''
 
 
1. 30% coverage
 
   
 
2. Basic CG
 
2. Basic CG
   
3. 40% coverage
+
3. 60% coverage
   
 
4. Transfer
 
4. Transfer
   
5. 50% coverage
+
5. 70% coverage
   
6. Transfer, lexical selection, 65% coverage
+
6. Transfer, lexical selection, 80% coverage
   
7. CG, 80% coverage
+
7. CG, 83% coverage
   
8. Transfer, lexsel, 84% coverage
+
8. Transfer, lexsel, 86% coverage
   
 
9. Transfer
 
9. Transfer
Line 101: Line 65:
 
10. CG, Transfer
 
10. CG, Transfer
   
11. Transfer, lexsel, 86% coverage
+
11. Transfer, lexsel, 89% coverage
   
12. Transfer, 88% coverage
+
12. Transfer, 92% coverage
   
 
13. Preparing text for annotation
 
13. Preparing text for annotation
   
14-16. Annotating the Turkmen corpus, %90 coverage
+
14-16. Annotating the Turkmen corpus, %95 coverage
 
 
   
 
== Coding Challenge ==
 
== Coding Challenge ==
Line 131: Line 93:
   
 
I'm a 3nd year student of English&Turkish Translation&Interpreting at Marmara University. I'm a native speaker of Turkish. I have taken Russian classes.
 
I'm a 3nd year student of English&Turkish Translation&Interpreting at Marmara University. I'm a native speaker of Turkish. I have taken Russian classes.
  +
  +
[[Category:GSoC 2019 student proposals]]

Latest revision as of 21:05, 8 April 2019

GSoC 2018 proposal draft to create and develop Turkmen-Turkish translation pair.

Personal Information[edit]

Name: Özge Kılıç

E-mail: ozgekilic.9@gmail.com

ITC: ozgay

Time zone: UTC+3


Why is it that you are interested in Apertium?

I'm a student of English Translation&Interpreting. My intention is to get a masters degree in Linguistics and I think this is an excellent start.

Proposal: Turkmen-Turkish MT[edit]

Which of the published tasks are you interested in? What do you plan to do?

My plan is to adopt an unreleased language pair, tuk-tur. I'll be working on it to bring it up to release quality, which will involve writing and refining rules for transfer and lexical selection that will result in a valid text in the target language.


Why should google and apertium sponsor it?

Since there is a limited number of sources of Turkmen-Turkish, this machine translation is a great opportunity for two nations to understand each other.

Resources


Wikipedia

Türkmence Sözlük

Work Plan[edit]

Plan by Weeks


1. 50% coverage

2. Basic CG

3. 60% coverage

4. Transfer

5. 70% coverage

6. Transfer, lexical selection, 80% coverage

7. CG, 83% coverage

8. Transfer, lexsel, 86% coverage

9. Transfer

10. CG, Transfer

11. Transfer, lexsel, 89% coverage

12. Transfer, 92% coverage

13. Preparing text for annotation

14-16. Annotating the Turkmen corpus, %95 coverage

Coding Challenge[edit]

I'm facilitating the translation of a about 500 words Turkmen text into Turkish.


Deliverables[edit]

WER comparable to other inter-Turkic/Romance pairs. Data for machine-learned disambiguation.


Summer Obligations and Commitments[edit]

I'll be busy with my finals in the first week of June but I'll be free at other times.


Qualification[edit]

I'm a 3nd year student of English&Turkish Translation&Interpreting at Marmara University. I'm a native speaker of Turkish. I have taken Russian classes.