User:Eirien/Proposal

From Apertium
< User:Eirien
Revision as of 16:49, 3 April 2017 by Eirien (talk | contribs) (Created page with "Category: GSoC 2017 Student Proposals ==Contact Info== '''Name:''' Sardana Ivanova '''E-mail:''' i.sardana.n@gmail.com '''IRC:''' Eirien '''Github:''' https://github....")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Contact Info

Name: Sardana Ivanova

E-mail: i.sardana.n@gmail.com

IRC: Eirien

Github: https://github.com/varie

Location: Yakutsk, Russia

Timezone: UTC/GMT+9

Why is it you are interested in machine translation?

Since I have degrees both in Philology and Information Technologies, my background naturally lead me to computational linguistics and to machine translation in particular. I had experience in Japanese translation and interpretation and in programming so I think machine translation is very nice topic where I could combine both skills.

Why is it that you are interested in Apertium?

I’ve heard about Apertium first time from Roman Yangarber, researcher in University of Helsinki during my exchange semester there. We were interested in adding Sakha language to Revita - language learning system which they are developing. For that purpose we needed Sakha morphological analyser and Roman introduced Apertium to me. Saying “I was surprised” will not fully pass my wonderment that time. Someone is developing Sakha morphological analyser and somewhere so far away from place where Sakha is commonly spoken. We couldn’t find morphological analyser which would fulfill all requirements of Revita [1], so we decided to try to enhance Apertium’s Sakha morphological transducer. And then I thought that adding language pair where one of languages is sakha would be very good contribution to development of Sakha language.

Which of the published tasks are you interested in? What do you plan to do?

Title: Kazakh-Sakha translator

I plan to add new language pair Kazakh-Sakha.

Why Google and Apertium should sponsor it?

There are around 450 000 Sakha (Yakut) language speakers in the World. Sakha is considered vulnerable language. Supporting languages so they don’t disappear is very important.

How and who it will benefit in society

It would benefit society in whole by keeping diversity supporting vulnerable language and in particular Kazakh and Sakha language speakers. There is no any machine translation system which translates to/from Sakha as far as I know. Creating language pair where one language is Sakha would greatly support Sakha language leading to further development of Sakha language machine translation.

Workplan

Post Application Period Finish coding challenge Get to know Apertium better: reading documentation and experimenting

Community Bonding Period Discuss strategies and details with mentor Improve knowledge of Kazakh

Week 1

Add nouns and adjectives to bilingual dictionary

Week 2

Write transfer rules for nouns and adjectives

Week 3

Add verbs and other parts of speech to bilingual dictionary

Write transfer rules for verbs

Week 4

Run tests

Update documentation

Prepare for the first evaluation

Deliverable 1: Bilingual dictionary, basic transfer rules

Week 5

Even up nouns and adjectives

Week 6

Even up verbs and other parts of speech

Week 7

Extend bilingual dictionary

Week 8

Run tests

Update documentation

Prepare for the second evaluation

Deliverable 2: Improved bilingual dictionary, transfer rules

Week 9

Week 10

Extend bilingual dictionary Add multiwords

Work on transfer rules

Week 11

Run final tests

Fix issues

Week 12

Brush up the project and documentation

Prepare for final evaluation

Skills

I have Master’s degree in Philology: Japanese language and literature and Bachelor degree in Fundamental Informatics and Information Technologies. Currently I am second year Master student pursuing degree in Fundamental Informatics and Information Technologies at North-Eastern Federal University, Yakutsk, Russia.

Besides I have one year C++ working experience in game development company.

Languages: Sakha (native), Russian (native), English (advanced), Japanese (intermediate), Finnish (elementary)

Programming skills: C++, Python, C, C#, Java

Apertium is my first shy step into open-source.

Non-Summer-of-Code plans for the Summer

I will defend my thesis in June, but I don’t know exact dates now, so in some week in June I will be unable to spend more than 10-15 hours for the project. For the rest of summer I can spend 40 hours a week.

Coding challenge

https://github.com/varie/apertium-kaz-sah in progress

References

1. https://revita.cs.helsinki.fi/