User:Eirien/Proposal2018

From Apertium
< User:Eirien
Revision as of 11:45, 27 March 2018 by Eirien (talk | contribs)
Jump to navigation Jump to search


Contact Info

Name: Sardana Ivanova

E-mail: i.sardana.n@gmail.com

IRC: Eirien

Github: https://github.com/varie

Location: Helsinki, Finland

Timezone: UTC/GMT+3

Why is it you are interested in machine translation?

Since I have degrees both in Philology and Information Technologies, my background naturally lead me to Computational Linguistics and to machine translation in particular. I had experience in Japanese translation and interpretation and in programming so I think that machine translation is a very nice topic where I could combine both skills.

Why is it that you are interested in Apertium?

I’ve heard about Apertium first time from Roman Yangarber, researcher in University of Helsinki during my exchange semester there. We were interested in adding Sakha language to Revita - language learning system which they are developing. For that purpose we needed Sakha morphological analyser and Roman introduced Apertium to me. Saying “I was surprised” will not fully pass my wonderment that time. Someone is developing Sakha morphological analyser and somewhere so far away from place where Sakha is commonly spoken. We couldn’t find morphological analyser which would fulfill all requirements of Revita [1], so we decided to try to enhance Apertium’s Sakha morphological transducer. And then I thought that adding language pair where one of languages is sakha would be very good contribution to development of Sakha language.

Which of the published tasks are you interested in? What do you plan to do?

Title: Apertium translation pair for Kazakh and Sakha

I plan to develop Apertium translation pair for Kazakh and Sakha languages, which is currently in a very early development stage.

Why Google and Apertium should sponsor it?

There are around 450 000 Sakha (Yakut) language speakers in the World. Sakha is considered vulnerable language. Supporting languages so they don’t disappear is very important.

How and who it will benefit in society

It would benefit society in whole by keeping diversity supporting vulnerable languages and in particular Kazakh and Sakha language speakers. There is no any machine translation system which translates to/from Sakha as far as I know. Creating language pair where one language is Sakha would greatly support Sakha language leading to further development of Sakha language machine translation.

Workplan

Post Application Period Finish coding challenge Get to know Apertium better: reading documentation and experimenting

Community Bonding Period Discuss strategies and details with mentor Improve knowledge of Kazakh

Week 1

Add nouns and adjectives to bilingual dictionary

Week 2

Write transfer rules for nouns and adjectives

Week 3

Add verbs and other parts of speech to bilingual dictionary

Write transfer rules for verbs

Week 4

Run tests

Update documentation

Prepare for the first evaluation

Deliverable 1: Bilingual dictionary, basic transfer rules

Week 5

Even up nouns and adjectives

Week 6

Even up verbs and other parts of speech

Week 7

Extend bilingual dictionary

Week 8

Run tests

Update documentation

Prepare for the second evaluation

Deliverable 2: Improved bilingual dictionary, transfer rules

Week 9

Week 10

Extend bilingual dictionary Add multiwords

Work on transfer rules

Week 11

Run final tests

Fix issues

Week 12

Brush up the project and documentation

Prepare for final evaluation

Skills

I have Master’s degree in Philology: Japanese language and literature and Master’s degree in Fundamental Informatics and Information Technologies. Currently I am a first year Computer Science PhD student in Saint Petersburg State University, Saint Petersburg, Russia. Now I am a visiting researcher in Department of Computer Science in University of Helsinki.

Besides I have one year C++ working experience in game development company.

Languages: Sakha (native), Russian (native), English (advanced), Japanese (intermediate), Finnish (elementary)

Programming skills: C++, Python, C, C#, Java

Apertium is my first shy step into open-source.

Non-Summer-of-Code plans for the Summer

Maybe a week holiday for Ozora music festival (30 July - 5 August) during which I will be unable to spend more than 10-15 hours for the project. For the rest of summer I will be able to dedicate at least 30 hours a week to the project.

Coding challenge

https://github.com/apertium/apertium-kaz-sah - coding challenge started for last year GSoC, where I added words to dictionary, but I didn’t manage to upload proposal on time last year, so I would like to try this year :D. I will update it as soon as I figure out how to write transfer rules.

References

1. https://revita.cs.helsinki.fi/