- 1 Contacts
- 2 Why?
- 3 The task
- 4 Skills and qualifications
- 5 Non-GSoC summer plans
Github, SourceForge, IRC: edgeandpearl
Why machine translation?
I am now finishing my bachelor’s degree in theoretical and computational linguistics at HSE, Moscow. Though most of my study was about the former aspect of the specialization, I am very interested in the latter as well. Machine translation appears to me, on the one hand, an exciting topic for research and, on the other, a very useful and perspective line of work.
I first learned about Apertium from a friend of mine, who participated in GSoC last year. Even though I haven’t worked with rule-based machine translation before, I was fascinated by the possibility of contributing to a project like that.
What impresses me most is that Apertium works with minority languages. Apart from that, the mechanism of rule-based machine translation implies that, when adopting a language pair, one has to deal with language structure. As a linguist, these factors make me extremely interested in working for Apertium.
Which of the published tasks are you interested in?
Adopting Faroese -> Norwegian (Bokmål) pair.
Why Google and Apertium should sponsor it?
As a result of my work, a prototype of a free open source Faroese-Norwegian translator will be brought into existence. It is going to be the first machine translation system for these two languages so far.
- Diving into Apertium documentation and manuals,
- improving my knowledge of Faroese and Norwegian,
- working on translation of the Story.
During the post-application period, the following plan will become more detailed, as I get closer with the task.
Community bonding period
- Exploring and evaluating the available resources for Faroese.
A relevant workplan for the project can be found here.
*Weeks 1-2: replenishing the Faroese dictionary
- Week 3: replenishing the Norwegian Bokmål dictionary, if necessary
- Week 4: start compiling the bilingual dictionary
- Week 5: continue compiling the bilingual dictionary
- Week 6: writing the lexical choices
- Weeks 7-8: writing transfer rules
- Week 9: checking the validity of the rules written
- Weeks 10-11: evaluating and testing the whole thing, adding minor fixes
- Week 12: cleaning up the code, writing documentation
Project completed! A prototype of Faroese-Norwegian pair is brought into existence.
Skills and qualifications
By summer 2017 I will have graduated with a bachelor’s degree of theoretical and applied linguistics at NRU HSE, Moscow.
Languages: Russian (native), English (advanced), French (intermediate), German (elementary), Mandarin (elementary), Norwegian Bokmål (elementary).
Programming skills: Python, R, bash.
Other computer skills: HTML+CSS.
Non-GSoC summer plans
I’m defending my thesis in June, so in the very first week I will probably be unable to spend more than 10-15 hours on the task.
For the rest of the time, I plan to work full-time, up to 50 hours a week.
In July and first half of August I will stay at my parents' place. It will not affect my work, but it is GMT+10.