User:Denis Rakhman/proposal

From Apertium
Revision as of 14:01, 3 April 2017 by Denis Rakhman (talk | contribs)
Jump to navigation Jump to search

Contact information

Name: Denis Rakhman
E-mail: drahman2@mail.ru
IRC: Denis_Rakhman
Phone number: 8-968-815-43-81
Location: Moscow

Why am I interested in machine translation?

It is obvious that the machine translation is one of the main areas of the computational linguistics. The usability of a good machine translator can hardly be overrated.
But that's not what excite me in the machine translation.
When I knew nothing about both theoretical and computational linguistics, I never thought about natural languages as about some set of rules. In fact, I did, but in my mind they were invented by a group of very smart people in heavy glasses. It was a shock to me to realize that the linguistic rules are no less strict than the physical ones. I thought: "Wow! Maybe the language can be modelled as an alhorythm?". And than I have been told about NLP and, in particular, about machine translation.
Machine translation is one of a few areas in NLP that deals not only with the particular language structure, but also with language typology. That means an increased (in comparison with other NLP problems) part of linguistic theory in it, which also attracts me.

Why am I interested in Apertium?

The main thing that attracts me in Apertium is its interest in minority language. This area is both very interesting for me and very important for the society. Minority languages are often the endangered ones, and the fact that some language is not only being described by linguists, but also used in machine translation, can encourage its speakers and help to give it a new life.
I am also personally interested in machine translation for minority languages. Firstly, it is machine translation. Secondly, minority languages (for example, Hill Mari) are a very important part of our university and, in particular, my own research activity.
Apertium also has an extremely friendly community, and this fact attracts me even more.

The task

I would like to work with Hill Mari, for example with Hill Mari - Russian language pair. But some other tasks (for example, related with Chukchi) are also possible.

Why should Google and Apertium sponsor it and which social benefits can it bring?
The purpose of this work is to create a mrj-rus transducer. It will be a complete product, which one will be able to use in any purposes.
Moreover, Hill Mari is one of the official languages of Mari El Republic. That means that, besides some social benefits described above, such a translator can be useful for local schools, libraries etc.

Work plan:

  • As soon as possible finish the coding challenge
  • Community bonding period:
  • Skills, knowledge and experience

    At this moment I am the 3rd year bachelor student of the Linguistic Department of the NRU HSE, Moscow.
    Knowledge:
    Programming:

  • python 3
    Linguistics:
  • both functional and formal approaches to the syntax
  • morphology
  • phonetics
  • lexical semantics
  • language typology
    Languages:
  • Russian (native)
  • English (advanced)
  • Italian (intermediate)
  • French (intermediate)
    Skills:
    Programming:
  • python 3, pymorphy2 (a morphological analyser for Russian)
  • HTML, CSS Linguistics:
  • grammar description during the field work, glossing, older grammar descriptions and theories analysis
    Experience:
    Coding:
  • distant verb arguments extraction in case of coordinate clauses
    Linguistics:
  • purpose clauses in Hill Mari (field research)