Difference between revisions of "User:Edgeandpearl/proposal"

From Apertium
Jump to navigation Jump to search
 
(20 intermediate revisions by 2 users not shown)
Line 6: Line 6:
 
Moscow (GMT+3)<br />
 
Moscow (GMT+3)<br />
   
  +
==Why? ==
==Why is it you are interested in machine translation? ==
 
   
 
===Why machine translation? ===
I am now finishing my bachelor’s degree in theoretical and computational linguistics at HSE, Moscow. Though most of my study was about the former aspect of the specialization, I am very interested in the latter as well. Machine translation appears to me, on the one hand, an exciting topic for research and, on the other hand, a very useful and perspective line of work.
 
   
 
I am now finishing my bachelor’s degree in theoretical and computational linguistics at HSE, Moscow. Though most of my study was about the former aspect of the specialization, I am very interested in the latter as well. Machine translation appears to me, on the one hand, an exciting topic for research and, on the other, a very useful and perspective line of work.
==Why is it that you are interested in Apertium?==
 
  +
  +
===Why Apertium?===
   
 
I first learned about Apertium from a friend of mine, who participated in GSoC last year. Even though I haven’t worked with rule-based machine translation before, I was fascinated by the possibility of contributing to a project like that.<br />
 
I first learned about Apertium from a friend of mine, who participated in GSoC last year. Even though I haven’t worked with rule-based machine translation before, I was fascinated by the possibility of contributing to a project like that.<br />
 
What impresses me most is that Apertium works with minority languages. Apart from that, the mechanism of rule-based machine translation implies that, when adopting a language pair, one has to deal with language structure. As a linguist, these factors make me extremely interested in working for Apertium.
 
What impresses me most is that Apertium works with minority languages. Apart from that, the mechanism of rule-based machine translation implies that, when adopting a language pair, one has to deal with language structure. As a linguist, these factors make me extremely interested in working for Apertium.
   
  +
==The task==
==Which of the published tasks are you interested in? ==
 
  +
 
===Which of the published tasks are you interested in? ===
   
 
Adopting Faroese -> Norwegian (Bokmål) pair.
 
Adopting Faroese -> Norwegian (Bokmål) pair.
   
==Why Google and Apertium should sponsor it?==
+
===Why Google and Apertium should sponsor it?===
   
As a result of my work, a free open source Faroese-Norwegian translator will be brought into existence. It is going to be the first machine translation system for these two languages ever.
+
As a result of my work, a prototype of a free open source Faroese-Norwegian translator will be brought into existence. It is going to be the first machine translation system for these two languages so far.
   
==Work plan==
+
===Work plan===
   
===Post-application period===
+
====Post-application period====
   
* diving into Apertium documentation and manuals,<br />
+
* Diving into Apertium documentation and manuals,<br />
 
* improving my knowledge of Faroese and Norwegian,<br />
 
* improving my knowledge of Faroese and Norwegian,<br />
* working on the coding challenge.<br />
+
* working on translation of [http://www.unilang.org/ulrview.php?res=416,400 the Story].<br />
  +
''During the post-application period, the following plan will become more detailed, as I get closer with the task.''<br />
   
===Community bonding period===
+
====Community bonding period====
   
* exploring and evaluating the available resources for Faroese.
+
* Exploring and evaluating the available resources for Faroese.
   
===Work period===
+
====Work period====
   
  +
<i>A relevant workplan for the project can be found [http://wiki.apertium.org/wiki/Faroese_and_Norwegian/Workplan here].</i>
/will be possible to write in detail after getting done with the coding challenge and learning more about the field of work/
 
  +
  +
<s>*'''Weeks 1-2:''' replenishing the Faroese dictionary
  +
*'''Week 3:''' replenishing the Norwegian Bokmål dictionary, if necessary
  +
*'''Week 4:''' start compiling the bilingual dictionary
  +
'''Deliverable #1'''
  +
*'''Week 5:''' continue compiling the bilingual dictionary
  +
*'''Week 6:''' writing the lexical choices
  +
*'''Weeks 7-8:''' writing transfer rules
  +
'''Deliverable #2'''
  +
*'''Week 9:''' checking the validity of the rules written
  +
*'''Weeks 10-11:''' evaluating and testing the whole thing, adding minor fixes
  +
*'''Week 12:''' cleaning up the code, writing documentation
  +
'''Project completed!''' A prototype of Faroese-Norwegian pair is brought into existence.</s>
   
 
==Skills and qualifications==
 
==Skills and qualifications==
   
 
By summer 2017 I will have graduated with a bachelor’s degree of theoretical and applied linguistics at NRU HSE, Moscow.<br />
 
By summer 2017 I will have graduated with a bachelor’s degree of theoretical and applied linguistics at NRU HSE, Moscow.<br />
Languages: Russian (native), English (advanced), French (intermediate), Mandarin (elementary), Norwegian Bokmål (elementary).<br />
+
'''Languages:''' Russian (native), English (advanced), French (intermediate), German (elementary), Mandarin (elementary), Norwegian Bokmål (elementary).<br />
Programming skills: Python, R, bash.<br />
+
'''Programming skills:''' Python, R, bash.<br />
Other computer skills: HTML+CSS.<br />
+
'''Other computer skills:''' HTML+CSS.<br />
   
 
==Non-GSoC summer plans==
 
==Non-GSoC summer plans==
   
 
I’m defending my thesis in June, so in the very first week I will probably be unable to spend more than 10-15 hours on the task.<br />
 
I’m defending my thesis in June, so in the very first week I will probably be unable to spend more than 10-15 hours on the task.<br />
I’m also going to go visit some friends of mine in late June or July, right after the first or the second evaluation. That week I will probably work for about 20-25 hours.<br />
 
 
For the rest of the time, I plan to work full-time, up to 50 hours a week.<br />
 
For the rest of the time, I plan to work full-time, up to 50 hours a week.<br />
  +
In July and first half of August I will stay at my parents' place. It will not affect my work, but it is GMT+10.<br />
  +
  +
[[Category:GSoC 2017 Student Proposals|Edgeandpearl]]

Latest revision as of 20:18, 2 June 2017

Contacts[edit]

Marina Kustova
marinakoustova@gmail.com
Github, SourceForge, IRC: edgeandpearl
Moscow (GMT+3)

Why?[edit]

Why machine translation?[edit]

I am now finishing my bachelor’s degree in theoretical and computational linguistics at HSE, Moscow. Though most of my study was about the former aspect of the specialization, I am very interested in the latter as well. Machine translation appears to me, on the one hand, an exciting topic for research and, on the other, a very useful and perspective line of work.

Why Apertium?[edit]

I first learned about Apertium from a friend of mine, who participated in GSoC last year. Even though I haven’t worked with rule-based machine translation before, I was fascinated by the possibility of contributing to a project like that.
What impresses me most is that Apertium works with minority languages. Apart from that, the mechanism of rule-based machine translation implies that, when adopting a language pair, one has to deal with language structure. As a linguist, these factors make me extremely interested in working for Apertium.

The task[edit]

Which of the published tasks are you interested in?[edit]

Adopting Faroese -> Norwegian (Bokmål) pair.

Why Google and Apertium should sponsor it?[edit]

As a result of my work, a prototype of a free open source Faroese-Norwegian translator will be brought into existence. It is going to be the first machine translation system for these two languages so far.

Work plan[edit]

Post-application period[edit]

  • Diving into Apertium documentation and manuals,
  • improving my knowledge of Faroese and Norwegian,
  • working on translation of the Story.

During the post-application period, the following plan will become more detailed, as I get closer with the task.

Community bonding period[edit]

  • Exploring and evaluating the available resources for Faroese.

Work period[edit]

A relevant workplan for the project can be found here.

*Weeks 1-2: replenishing the Faroese dictionary

  • Week 3: replenishing the Norwegian Bokmål dictionary, if necessary
  • Week 4: start compiling the bilingual dictionary

Deliverable #1

  • Week 5: continue compiling the bilingual dictionary
  • Week 6: writing the lexical choices
  • Weeks 7-8: writing transfer rules

Deliverable #2

  • Week 9: checking the validity of the rules written
  • Weeks 10-11: evaluating and testing the whole thing, adding minor fixes
  • Week 12: cleaning up the code, writing documentation

Project completed! A prototype of Faroese-Norwegian pair is brought into existence.

Skills and qualifications[edit]

By summer 2017 I will have graduated with a bachelor’s degree of theoretical and applied linguistics at NRU HSE, Moscow.
Languages: Russian (native), English (advanced), French (intermediate), German (elementary), Mandarin (elementary), Norwegian Bokmål (elementary).
Programming skills: Python, R, bash.
Other computer skills: HTML+CSS.

Non-GSoC summer plans[edit]

I’m defending my thesis in June, so in the very first week I will probably be unable to spend more than 10-15 hours on the task.
For the rest of the time, I plan to work full-time, up to 50 hours a week.
In July and first half of August I will stay at my parents' place. It will not affect my work, but it is GMT+10.