User:Bibaeva/proposal

From Apertium
< User:Bibaeva
Revision as of 00:49, 3 April 2017 by Bibaeva (talk | contribs) (Created page with "Category:GSoC 2017 Student Proposals == Contact information == Name: Maria Bibaeva <br> E-mail: melisanushk@gmail.com <br> IRC: melisan <br> Phone: 89030131839 <br> Plac...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search


Contact information

Name: Maria Bibaeva
E-mail: melisanushk@gmail.com
IRC: melisan
Phone: 89030131839
Place: Moscow, Russia (UTC+3)
Github: https://github.com/mbibaeva

Why is it you are interested in machine translation?

Machine Translation is one of the major tasks of modern computational linguistics. It is a very developed but still not perfectly built area, and I really want not only to know more about it but to see exactly how it works. Such experience would be extremely usefull for me as a computational linguist.

Why is it that you are interested in Apertium?

As a computational linguist, I find it both interesting and advantageous to contribute to developing a language tool like Apertium. Another reason is that Apertium works with minor languages, which gives me an opportunity to use not only my programming skills but also the knowledge of minor languages like Moksha or Hill Mari.

Reasons why Google and Apertium should sponsor it

There is only one Uralic language that Apertium works with right now, and ever though it is in pair with a language of a different family, it would be nice to add some other languages of this family, so that at least there are several monolingual dictionaries of Uralic languages.

Skills

Programming and computer skills: Python 3, HTML, R, JS
Languages: Russian(native), English(advanced), German(intermediate), French(intermediate), Japanese(beginner), Moksha(as an object of research), Hill Mari(as an object of research)
Usefull courses: Natural Language Processing, Theory of computation, Language Diversity, Lexicography, Formal Semantics

The Task

I think that the best task for me would be to adopt an unreleased language pair, particularly the Moksha-Russian language pair (mdf-rus), but I could also work with Erzya(myv) or Hill Mari(mrj).

Work plan

Postapplication perion:
Learn more about Apertium and the tool, install Linux and get used to it, get acquainted with the code of other Apertium bilingual dictionaries.

Summer
Week 1: define frequency of both Russian and Moksha words, using corpora and adopt existing Moksha dictionary for work.
Week 2-3: creating dictionary for the minor language in my pair.
Week 4: start creating dictionary for Russian language.

Deliverable 1: a proper monolingual dictionary.

Week 5: continue working on Russian dictionary.
Week 6-7: start working on the bilingual dictionary, creating noun transfer.
Week 7-8: working on verb transfer.

Deliverable 2: two proper monolingual dictionaries and part of the bilingual dictionary.

Week 9-10: finish the verb transfer and procede to pronouns and postpositions.
Week 11: testing, fixing, adding whatever needs to be added.
Week 12: final debug, documentation and cleaning up the code.

Non-Summer-of-Code plans for the Summer

I have several exams in June, but I will try to pass them earlier, but if I do not succeed, it might take about 3 hours from daily worktime. I am free for the rest of the summer so I might be able to devote 45-50 hours per week to the task.