Difference between revisions of "User:Denis Rakhman/proposal"

From Apertium
Jump to navigation Jump to search
 
(34 intermediate revisions by 2 users not shown)
Line 6: Line 6:
Phone number: 8-968-815-43-81 <br />
Phone number: 8-968-815-43-81 <br />
Location: Moscow
Location: Moscow

== Why am I interested in machine translation? ==
It is obvious that the machine translation is one of the main areas of the computational linguistics. The usability of a good machine translator can hardly be overrated. <br />
But that's not what excite me in the machine translation.<br />
When I knew nothing about both theoretical and computational linguistics, I never thought about natural languages as about some set of rules. In fact, I did, but in my mind they were invented by a group of very smart people in heavy glasses. It was a shock to me to realize that the linguistic rules are no less strict than the physical ones. I thought: "Wow! Maybe the language can be modelled as an alhorythm?". And than I have been told about NLP and, in particular, about machine translation. <br />
Machine translation is one of a few areas in NLP that deals not only with the particular language structure, but also with language typology. That means an increased (in comparison with other NLP problems) part of linguistic theory in it, which also attracts me.

== Why am I interested in Apertium? ==
The main thing that attracts me in Apertium is its interest in minority language. This area is both very interesting for me and very important for the society. Minority languages are often the endangered ones, and the fact that some language is not only being described by linguists, but also used in machine translation, can encourage its speakers and help to give it a new life.
<br />I am also personally interested in machine translation for minority languages. Firstly, it is machine translation. Secondly, minority languages (for example, Hill Mari) are a very important part of our university and, in particular, my own research activity.
<br />Apertium also has an extremely friendly community, and this fact attracts me even more.

== The task ==
I would like to work with Hill Mari, for example with Hill Mari - Russian language pair.
<br /><br />
'''Why should Google and Apertium sponsor it and which social benefits can it bring?'''
<br /> The purpose of this work is to create a mrj-rus transducer. It will be a complete product, which one will be able to use in any purposes.
<br /> Moreover, Hill Mari is one of the official languages of Mari El Republic. That means that, besides some social benefits described above, such a translator can be useful for local schools, libraries etc.
<br /><br />
'''Work plan:'''<br />
-1. As soon as possible finish the coding challenge<br />
0. Community bonding period:
<li>Get closer with apertium
<li>Clarify the work plan
<li>Find relevant literature
<li>Get Hill Mari texts corpus, increase the knowledge of its grammar<br />

1. First work period (May 30 - June 30):
<li>Create Hill Mari dictionnaries
<li>Create rules for Hill Mari grammar
<li>Do other work related to Hill Mari and its corpus<br />

2.Second work period (June 30 - July 28):
<li>Keep on working with Hill Mari
<li>Start working with Russian and connecting it with Hill Mari<br />

3. Third work period (July 28 - August 29):
<li>Keep on working with language pair
<li>Improve the results and finish the project
<li>Create the documentation



== Skills, knowledge and experience ==
== Skills, knowledge and experience ==
Line 18: Line 59:
<li>phonetics
<li>phonetics
<li>lexical semantics
<li>lexical semantics
<li>typology
<li>language typology
<li>sociolinguistics <br \>
<br \>
Languages:
Languages:
<li>Russian (native)
<li>Russian (native)
Line 25: Line 66:
<li>Italian (intermediate)
<li>Italian (intermediate)
<li>French (intermediate) <br />
<li>French (intermediate) <br />

Maths and data science: <br />
<li>basic knowledge of statistics, data science and machine learning theory<br />
'''Skills:'''<br />
'''Skills:'''<br />
Programming:
Programming:
<li>python 3, pymorphy2 (a morphological analyser for Russian)<br />
<li> python 3, pymorphy2 (a morphological analyser for Russian)
<li>HTML, CSS
<li> HTML, CSS
Linguistics:<br />
Linguistics:<br />
grammar description during the field work, glossing, older grammar descriptions and theories analysis
<li>grammar description during the field work, glossing, older grammar descriptions and theories analysis
<br />
'''Experience:'''<br />
Coding:
<li> distant verb arguments extraction in case of coordinate clauses (a python program and its theoretical base)<br />
Linguistics:
<li> purpose clauses in Hill Mari (field research)
==Non-GSoC summer plans==
In the end of May I will be finishing my 3-rd year term project.
In the end of June I will have my exams (during one or two weeks). During those periods I will not be able to work as much as usual.<br />
In the end of July I'll do Hill Mari field work for approximately two weeks. I will not be able to do my GSoC work during this period.


[[Category:GSoC 2017 Student Proposals]]
[[Category:GSoC 2017 Student Proposals]]

Latest revision as of 14:30, 3 April 2017

Contact information[edit]

Name: Denis Rakhman
E-mail: drahman2@mail.ru
IRC: Denis_Rakhman
Phone number: 8-968-815-43-81
Location: Moscow

Why am I interested in machine translation?[edit]

It is obvious that the machine translation is one of the main areas of the computational linguistics. The usability of a good machine translator can hardly be overrated.
But that's not what excite me in the machine translation.
When I knew nothing about both theoretical and computational linguistics, I never thought about natural languages as about some set of rules. In fact, I did, but in my mind they were invented by a group of very smart people in heavy glasses. It was a shock to me to realize that the linguistic rules are no less strict than the physical ones. I thought: "Wow! Maybe the language can be modelled as an alhorythm?". And than I have been told about NLP and, in particular, about machine translation.
Machine translation is one of a few areas in NLP that deals not only with the particular language structure, but also with language typology. That means an increased (in comparison with other NLP problems) part of linguistic theory in it, which also attracts me.

Why am I interested in Apertium?[edit]

The main thing that attracts me in Apertium is its interest in minority language. This area is both very interesting for me and very important for the society. Minority languages are often the endangered ones, and the fact that some language is not only being described by linguists, but also used in machine translation, can encourage its speakers and help to give it a new life.
I am also personally interested in machine translation for minority languages. Firstly, it is machine translation. Secondly, minority languages (for example, Hill Mari) are a very important part of our university and, in particular, my own research activity.
Apertium also has an extremely friendly community, and this fact attracts me even more.

The task[edit]

I would like to work with Hill Mari, for example with Hill Mari - Russian language pair.

Why should Google and Apertium sponsor it and which social benefits can it bring?
The purpose of this work is to create a mrj-rus transducer. It will be a complete product, which one will be able to use in any purposes.
Moreover, Hill Mari is one of the official languages of Mari El Republic. That means that, besides some social benefits described above, such a translator can be useful for local schools, libraries etc.

Work plan:
-1. As soon as possible finish the coding challenge
0. Community bonding period:

  • Get closer with apertium
  • Clarify the work plan
  • Find relevant literature
  • Get Hill Mari texts corpus, increase the knowledge of its grammar
    1. First work period (May 30 - June 30):
  • Create Hill Mari dictionnaries
  • Create rules for Hill Mari grammar
  • Do other work related to Hill Mari and its corpus
    2.Second work period (June 30 - July 28):
  • Keep on working with Hill Mari
  • Start working with Russian and connecting it with Hill Mari
    3. Third work period (July 28 - August 29):
  • Keep on working with language pair
  • Improve the results and finish the project
  • Create the documentation

    Skills, knowledge and experience[edit]

    At this moment I am the 3rd year bachelor student of the Linguistic Department of the NRU HSE, Moscow.
    Knowledge:
    Programming:

  • python 3
    Linguistics:
  • both functional and formal approaches to the syntax
  • morphology
  • phonetics
  • lexical semantics
  • language typology
    Languages:
  • Russian (native)
  • English (advanced)
  • Italian (intermediate)
  • French (intermediate)
    Skills:
    Programming:
  • python 3, pymorphy2 (a morphological analyser for Russian)
  • HTML, CSS Linguistics:
  • grammar description during the field work, glossing, older grammar descriptions and theories analysis
    Experience:
    Coding:
  • distant verb arguments extraction in case of coordinate clauses (a python program and its theoretical base)
    Linguistics:
  • purpose clauses in Hill Mari (field research)

    Non-GSoC summer plans[edit]

    In the end of May I will be finishing my 3-rd year term project. In the end of June I will have my exams (during one or two weeks). During those periods I will not be able to work as much as usual.
    In the end of July I'll do Hill Mari field work for approximately two weeks. I will not be able to do my GSoC work during this period.