User:Eirien/Proposal
Contents
Contact Info[edit]
Name: Sardana Ivanova
E-mail: i.sardana.n@gmail.com
IRC: Eirien
Github: https://github.com/varie
Location: Yakutsk, Russia
Timezone: UTC/GMT+9
Why is it you are interested in machine translation?[edit]
Since I have degrees both in Philology and Information Technologies, my background naturally lead me to computational linguistics and to machine translation in particular. I had experience in Japanese translation and interpretation and in programming so I think machine translation is very nice topic where I could combine both skills.
Why is it that you are interested in Apertium?[edit]
I’ve heard about Apertium first time from Roman Yangarber, researcher in University of Helsinki during my exchange semester there. We were interested in adding Sakha language to Revita - language learning system which they are developing. For that purpose we needed Sakha morphological analyser and Roman introduced Apertium to me. Saying “I was surprised” will not fully pass my wonderment that time. Someone is developing Sakha morphological analyser and somewhere so far away from place where Sakha is commonly spoken. We couldn’t find morphological analyser which would fulfill all requirements of Revita [1], so we decided to try to enhance Apertium’s Sakha morphological transducer. And then I thought that adding language pair where one of languages is sakha would be very good contribution to development of Sakha language.
Which of the published tasks are you interested in? What do you plan to do?[edit]
Title: Kazakh-Sakha translator[edit]
I plan to add new language pair Kazakh-Sakha.
Why Google and Apertium should sponsor it?[edit]
There are around 450 000 Sakha (Yakut) language speakers in the World. Sakha is considered vulnerable language. Supporting languages so they don’t disappear is very important.
How and who it will benefit in society[edit]
It would benefit society in whole by keeping diversity supporting vulnerable language and in particular Kazakh and Sakha language speakers. There is no any machine translation system which translates to/from Sakha as far as I know. Creating language pair where one language is Sakha would greatly support Sakha language leading to further development of Sakha language machine translation.
Workplan[edit]
Post Application Period Finish coding challenge Get to know Apertium better: reading documentation and experimenting
Community Bonding Period Discuss strategies and details with mentor Improve knowledge of Kazakh
Week 1
Add nouns and adjectives to bilingual dictionary
Week 2
Write transfer rules for nouns and adjectives
Week 3
Add verbs and other parts of speech to bilingual dictionary
Write transfer rules for verbs
Week 4
Run tests
Update documentation
Prepare for the first evaluation
Deliverable 1: Bilingual dictionary, basic transfer rules
Week 5
Even up nouns and adjectives
Week 6
Even up verbs and other parts of speech
Week 7
Extend bilingual dictionary
Week 8
Run tests
Update documentation
Prepare for the second evaluation
Deliverable 2: Improved bilingual dictionary, transfer rules
Week 9
Week 10
Extend bilingual dictionary Add multiwords
Work on transfer rules
Week 11
Run final tests
Fix issues
Week 12
Brush up the project and documentation
Prepare for final evaluation
Skills[edit]
I have Master’s degree in Philology: Japanese language and literature and Bachelor degree in Fundamental Informatics and Information Technologies. Currently I am second year Master student pursuing degree in Fundamental Informatics and Information Technologies at North-Eastern Federal University, Yakutsk, Russia.
Besides I have one year C++ working experience in game development company.
Languages: Sakha (native), Russian (native), English (advanced), Japanese (intermediate), Finnish (elementary)
Programming skills: C++, Python, C, C#, Java
Apertium is my first shy step into open-source.
Non-Summer-of-Code plans for the Summer[edit]
I will defend my thesis in June, but I don’t know exact dates now, so in some week in June I will be unable to spend more than 10-15 hours for the project. For the rest of summer I can spend 40 hours a week.
Coding challenge[edit]
https://github.com/varie/apertium-kaz-sah in progress