User:Vyhuholl/GSoC Proposal 2018: Esperanto and Russian

From Apertium
Jump to navigation Jump to search

Name

Olga Pichuzhkina

Contact information

E-mail address: olga-p-98@mail.ru

Location: Russia

IRC nick: vyhuholl

GitHub profile: vyhuholl

Why I am interested in Apertium?

I am a linguist and I am interested in computational linguistics and natural language processing.

Which of the published tasks am I interested in? What do I plan to do?

I'm interested in adopting an unreleased language pair(Esperanto-Russian).

Reasons why Google and Apertium should sponsor it?

There aren’t esperanto-russian translator in Apertium.

Work Plan

Coding challenge

  • I’ve installed the prerequisites for Mac OS.
  • Bootstrapped the new language pair (epo-rus) with existing rus monodix.
  • Created an (incomplete) epo monodix with morphology based on this online dictionary (https://www.esperanto.mv.ru/Vortaro/), containing 30 654 words. Note: I was using a python script while adding morphological markup, so some of the words may be marked incorrectly, for example, the pronoun mi (I) is ending with -i, like verbs, and so was marked as a verb.
  • Created an Esperanto-Russian .dix file based on this (https://github.com/apertium/apertium-eo-ru/blob/master/apertium-ru-eo.ru-eo.dix) Russian-Esperanto .dix file, containing 30 654 words.

Week Plan

  • Week 1. Continue working on monodix, cleaning data, removing incorrect morphological markup.
  • Week 2. Continue working on monodix, cleaning data, removing incorrect morphological markup.
  • Week 3. Implementation of transfer rules from Esperanto to Russian.
  • Week 4. Implementation of transfer rules from Esperanto to Russian, evaluation.
  • Week 5. Implementation of transfer rules from Esperanto to Russian.
  • Week 6. Implementation of transfer rules from Esperanto to Russian.
  • Week 7. Constraint grammar design.
  • Week 8. Constraint grammar design, evaluation.
  • Week 9. testvoc.
  • Week 10. testvoc.
  • Week 11. testvoc.
  • Week 12. Final evaluation and completion of documentation.

Skills & Qualifications

I am a second-year student of linguistics in Higher School of Economics (Moscow, Russia). I know Python and R, have basic knowledge of NLP and am interested in machine learning. Here is an example of my code: https://github.com/vyhuholl/homeworks/blob/master/additional\_hw/additional\_hw.py (a webpage in Olg Church Slavonic language, made with flask). I know Esperanto and had an experience in it. For this project, I am familiar with XML and Mac OS.

My non-Summer-of-Code plans for the Summer

From May 14th to June 30th I will also have classes (21 hours a week). From May 14th to May 31th I will also be working (3 hours a week). For the remaining part of the summer, I have nothing of non-Summer of Code plans.