User:Vyhuholl/GSoC Proposal 2018: Esperanto and Russian
Contents
Name[edit]
Olga Pichuzhkina
Contact information[edit]
E-mail address: olga-p-98@mail.ru
Location: Russia
IRC nick: vyhuholl
GitHub profile: vyhuholl
Why I am interested in Apertium?[edit]
I am a linguist and I am interested in computational linguistics and natural language processing.
Which of the published tasks am I interested in? What do I plan to do?[edit]
I'm interested in adopting an unreleased language pair(Esperanto-Russian).
Reasons why Google and Apertium should sponsor it?[edit]
There aren’t esperanto-russian translator in Apertium.
Work Plan[edit]
Coding challenge
- I’ve installed the prerequisites for Mac OS.
- Bootstrapped the new language pair (epo-rus) with existing rus monodix.
- Created an (incomplete) epo monodix with morphology based on this online dictionary (https://www.esperanto.mv.ru/Vortaro/), containing 30 654 words. Note: I was using a python script while adding morphological markup, so some of the words may be marked incorrectly, for example, the pronoun mi (I) is ending with -i, like verbs, and so was marked as a verb.
- Created an Esperanto-Russian .dix file based on this (https://github.com/apertium/apertium-eo-ru/blob/master/apertium-ru-eo.ru-eo.dix) Russian-Esperanto .dix file, containing 30 654 words.
Week Plan
- Week 1. Continue working on monodix, cleaning data, removing incorrect morphological markup.
- Week 2. Continue working on monodix, cleaning data, removing incorrect morphological markup.
- Week 3. Implementation of transfer rules from Esperanto to Russian.
- Week 4. Implementation of transfer rules from Esperanto to Russian, evaluation.
- Week 5. Implementation of transfer rules from Esperanto to Russian.
- Week 6. Implementation of transfer rules from Esperanto to Russian.
- Week 7. Constraint grammar design.
- Week 8. Constraint grammar design, evaluation.
- Week 9. testvoc.
- Week 10. testvoc.
- Week 11. testvoc.
- Week 12. Final evaluation and completion of documentation.
Skills & Qualifications[edit]
I am a second-year student of linguistics in Higher School of Economics (Moscow, Russia). I know Python and R, have basic knowledge of NLP and am interested in machine learning. Here is an example of my code: https://github.com/vyhuholl/homeworks/blob/master/additional\_hw/additional\_hw.py (a webpage in Olg Church Slavonic language, made with flask). I know Esperanto and had an experience in it. For this project, I am familiar with XML and Mac OS.
My non-Summer-of-Code plans for the Summer[edit]
From May 14th to June 30th I will also have classes (21 hours a week). From May 14th to May 31th I will also be working (3 hours a week). For the remaining part of the summer, I have nothing of non-Summer of Code plans.