Difference between revisions of "User:Oğuz/GSoC 2018"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by the same user not shown)
Line 87: Line 87:
'''Plan by Weeks'''
'''Plan by Weeks'''


1. 30% coverage
1. Coverage


2. Basic CG
2. Basic CG


3. 40% coverage
3. Coverage


4. Transfer
4. Transfer


5. 50% coverage
5. Coverage


6. Transfer, lexical selection
6. Transfer, lexical selection, 65% coverage


7. CG
7. CG, 80% coverage


8. Transfer, lexsel
8. Transfer, lexsel, 84% coverage


9. Transfer
9. Transfer
Line 107: Line 107:
10. CG, Transfer
10. CG, Transfer


11. Transfer, lexsel
11. Transfer, lexsel, 86% coverage


12. Transfer
12. Transfer, 88% coverage


13. Preparing text for annotation
13. Preparing text for annotation


14-16. Annotating the Uyghur corpus
14-16. Annotating the Uyghur corpus, %90 coverage



== Coding Challenge ==

I'm facilitating the translation of a 563 word Uyghur text into Turkish.



== Deliverables ==
== Deliverables ==
Line 129: Line 136:
== Qualification ==
== Qualification ==


I'm a 2nd year student of linguistics at Boğaziçi University. I'm a native speaker of Turkish and I've taken Uyghur courses so I have good grasp of both languages.
I'm a 2nd year student of linguistics at Boğaziçi University. I'm a native speaker of Turkish and I've taken Uyghur courses as part of my education.

Latest revision as of 20:28, 25 March 2018

GSoC 2018 proposal draft to create and develop Uyghur-Turkish translation pair.

Personal Information[edit]

Name: Oğuzhan Kuyrukçu

E-mail: kuyrukcuoguz@gmail.com

Phone number: +905414785653

ITC: oguz

Time zone: UTC+3


Why is it that you are interested in Apertium?

I'm a student of linguistics and I recently took up an interest in machine translation. When I found out about Apertium, I decided to put my knowledge of Turkish and Uyghur to use through resources Apertium provides.


Proposal: Uyghur-Turkish MT[edit]

Which of the published tasks are you interested in? What do you plan to do?

My plan is to adopt an unreleased language pair, uig-tur. I'll be working on it to bring it up to release quality, which will involve writing and refining rules for transfer and lexical selection that will result in a valid text in the target language.


Why should google and apertium sponsor it?

An extensive Uyghur-Turkish machine translator is yet to be done and most of the research in Turkology is done through Turkish, compared to other Turkic languages such as Uyghur. As such, a machine translator would enable those working in Turkology and related fields to study Uyghur texts through Turkish. Furthermore, cultural contact between Turkish and Uyghur populations are increasing with migration and these populations can use this tool to familiarize themselves with each other's culture.


Resources

E.N Necip, Uyghurche-Turkche Lughet

Rıdvan Öztürk, Yeni Uygur Türkçesi Grameri

Wikipedia

Uyghur-English-Mandarin dictionary[1]


Work Plan[edit]

-Post-application period:

Facilitating MT of a text from Uyghur to Turkish.


-Community-bonding period:

bidix words, up to 50%


-Month 1:

Writing scripts

Adding words to bidix, get coverage to around 80%

Chunking

Transfer rules

Begin CG for UIG


-Month 2:

POS tagging/constraint grammar

Transfer rules

Get CG rules up to 100, ~50% disambiguation

>90% coverage


-Month 3:

Creation of an Annotated Corpus


Plan by Weeks

1. 30% coverage

2. Basic CG

3. 40% coverage

4. Transfer

5. 50% coverage

6. Transfer, lexical selection, 65% coverage

7. CG, 80% coverage

8. Transfer, lexsel, 84% coverage

9. Transfer

10. CG, Transfer

11. Transfer, lexsel, 86% coverage

12. Transfer, 88% coverage

13. Preparing text for annotation

14-16. Annotating the Uyghur corpus, %90 coverage


Coding Challenge[edit]

I'm facilitating the translation of a 563 word Uyghur text into Turkish.


Deliverables[edit]

WER comparable to other inter-Turkic/Romance pairs. Data for machine-learned disambiguation.


Summer Obligations and Commitments[edit]

I'll be busy with my finals in the last weeks of May but I'll be free at other times.


Qualification[edit]

I'm a 2nd year student of linguistics at Boğaziçi University. I'm a native speaker of Turkish and I've taken Uyghur courses as part of my education.