User:Eirien/Proposal2018
Contents
Contact Info
Name: Sardana Ivanova
E-mail: i.sardana.n@gmail.com
IRC: Eirien
Github: https://github.com/varie
Location: Helsinki, Finland
Timezone: UTC/GMT+3
Why is it you are interested in machine translation?
Since I have degrees both in Philology and Information Technologies, my background naturally lead me to Computational Linguistics and to machine translation in particular. I had experience in Japanese translation and interpretation and in programming so I think that machine translation is a very nice topic where I could combine both skills.
Why is it that you are interested in Apertium?
I’ve heard about Apertium first time from Roman Yangarber, researcher in University of Helsinki during my exchange semester there. We were interested in adding Sakha language to Revita - language learning system which they are developing. For that purpose we needed Sakha morphological analyser and Roman introduced Apertium to me. Saying “I was surprised” will not fully pass my wonderment that time. Someone is developing Sakha morphological analyser and somewhere so far away from place where Sakha is commonly spoken. We couldn’t find morphological analyser which would fulfill all requirements of Revita [1], so we decided to try to enhance Apertium’s Sakha morphological transducer. And then I thought that adding language pair where one of languages is sakha would be very good contribution to development of Sakha language.
Which of the published tasks are you interested in? What do you plan to do?
Title: Apertium translation pair for Kazakh and Sakha
I plan to develop Apertium translation pair for Kazakh and Sakha languages, which is currently in a very early development stage.
Why Google and Apertium should sponsor it?
There are around 450 000 Sakha (Yakut) language speakers in the World. Sakha is considered vulnerable language. Supporting languages so they don’t disappear is very important.
How and who it will benefit in society
It would benefit society in whole by keeping diversity supporting vulnerable languages and in particular Kazakh and Sakha language speakers. There is no any machine translation system which translates to/from Sakha as far as I know. Creating language pair where one language is Sakha would greatly support Sakha language leading to further development of Sakha language machine translation.
Workplan
Post Application Period Finish coding challenge Get to know Apertium better: reading documentation and experimenting
Community Bonding Period Discuss strategies and details with mentor Improve knowledge of Kazakh
Week 1
Add nouns and adjectives to bilingual dictionary
Week 2
Write transfer rules for nouns and adjectives
Week 3
Add verbs and other parts of speech to bilingual dictionary
Write transfer rules for verbs
Week 4
Run tests
Update documentation
Prepare for the first evaluation
Deliverable 1: Bilingual dictionary, basic transfer rules
Week 5
Even up nouns and adjectives
Week 6
Even up verbs and other parts of speech
Week 7
Extend bilingual dictionary
Week 8
Run tests
Update documentation
Prepare for the second evaluation
Deliverable 2: Improved bilingual dictionary, transfer rules
Week 9
Week 10
Extend bilingual dictionary Add multiwords
Work on transfer rules
Week 11
Run final tests
Fix issues
Week 12
Brush up the project and documentation
Prepare for final evaluation
Skills
I have Master’s degree in Philology: Japanese language and literature and Master’s degree in Fundamental Informatics and Information Technologies. Currently I am a first year Computer Science PhD student in Saint Petersburg State University, Saint Petersburg, Russia. Now I am a visiting researcher in Department of Computer Science in University of Helsinki.
Besides I have one year C++ working experience in game development company.
Languages: Sakha (native), Russian (native), English (advanced), Japanese (intermediate), Finnish (elementary)
Programming skills: C++, Python, C, C#, Java
Apertium is my first shy step into open-source.
Non-Summer-of-Code plans for the Summer
Maybe a week holiday for Ozora music festival (30 July - 5 August) during which I will be unable to spend more than 10-15 hours for the project. For the rest of summer I will be able to dedicate at least 30 hours a week to the project.
Coding challenge
https://github.com/apertium/apertium-kaz-sah - coding challenge started for last year GSoC, where I added words to dictionary, but I didn’t manage to upload proposal on time last year, so I would like to try this year :D. I will update it as soon as I figure out how to write transfer rules.