Difference between revisions of "User:Kvld/Proposal"

From Apertium
Jump to navigation Jump to search
m (dictionaries)
 
(12 intermediate revisions by 2 users not shown)
Line 3: Line 3:
 
*'''E-mail:''' kiryukhinv(at)gmail.com
 
*'''E-mail:''' kiryukhinv(at)gmail.com
 
*'''IRC:''' kvld
 
*'''IRC:''' kvld
  +
*'''SourceForge:''' kvld
   
 
==Why is it you are interested in machine translation?==
 
==Why is it you are interested in machine translation?==
Machine translation involves linguistics and programming, in which I am interested in. It's very important area that enables people to have information about a variety of things from different parts of world across a language barrier. Special value MT has for lesser-known languages, when human translators are not available.
+
Machine translation involves linguistics and programming, which I am interested in. It is a very important area that enables people to have information about a variety of things from different parts of world across a language barrier. MT has special value for lesser-known languages, when human translators are frequently not available.
   
 
==Why is it that you are interested in the Apertium project?==
 
==Why is it that you are interested in the Apertium project?==
I had completed a few small tasks for Apertium during Google Code-In 2012. During that Google Code-In I was a high school student and now that I'm over 18 and study at the university I can dedicate time for more significant contribution.
+
I completed some tasks for Apertium during Google Code-In 2012. During that Google Code-In I was a high school student and now that I'm over 18 and studying at a university, I can dedicate time for more significant contribution.
   
 
==Which of the published tasks are you interested in? What do you plan to do?==
 
==Which of the published tasks are you interested in? What do you plan to do?==
Line 15: Line 16:
   
 
===Reasons why Google and Apertium should sponsor it===
 
===Reasons why Google and Apertium should sponsor it===
Currently Apertium has no any language pairs with Belarusian. My plan is complete the ''bel-ru'' pair and bring it to the release quality.
+
Currently Apertium has no language pairs with Belarusian. My plan is complete the ''bel-rus'' pair and bring it to the release quality. If you will give me chance, I think I have all to achieve this goal.
   
 
===A description of how and who it will benefit in society===
 
===A description of how and who it will benefit in society===
Line 23: Line 24:
 
===Work plan===
 
===Work plan===
 
'''Community bonding period:'''
 
'''Community bonding period:'''
* Getting closer with Apertium,
+
* Getting more familiar with Apertium,
* Reading available documentation and studiyng the existing pairs,
+
* Reading available documentation and studying the existing pairs,
* Finding any language resourses and dictionaries which can be used,
+
* Finding any language resources and dictionaries which can be used,
** [https://ru.wiktionary.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9_%D1%8F%D0%B7%D1%8B%D0%BA Belarusian section at the Russian Wiktionary] – CC-BY-SA and GDFL<ref>[https://en.wiktionary.org/wiki/Wiktionary:Copyrights Wiktionary – Copyrights]</ref>
+
** [https://ru.wiktionary.org/wiki/%D0%9A%D0%B0%D1%82%D0%B5%D0%B3%D0%BE%D1%80%D0%B8%D1%8F:%D0%91%D0%B5%D0%BB%D0%BE%D1%80%D1%83%D1%81%D1%81%D0%BA%D0%B8%D0%B9_%D1%8F%D0%B7%D1%8B%D0%BA Belarusian section at the Russian Wiktionary]
** [http://www.belmova.org/ BelMova Dictionary] — CC-BY-NS-SA
 
 
** Various dictionaries from [http://www.slounik.org/ slounik.org] available in StarDict format
 
** Various dictionaries from [http://www.slounik.org/ slounik.org] available in StarDict format
  +
** Grammar database from [http://bnkorpus.info/ bnkorpus.info]
* Checking the existing Belarusian files in Incubator.
+
* Checking the existing Belarusian files in Incubator,
  +
* Adding prepositions, interjections, particles, conjunctions.
   
 
'''Work period:'''
 
'''Work period:'''
Line 38: Line 40:
 
|-
 
|-
 
| Week 1
 
| Week 1
  +
| Add and check nouns.
| Write parsers for dictionaries and transform parsed data to Apertium dictionary formats.
 
 
|-
 
|-
 
| Week 2
 
| Week 2
| Add and check nouns, pronouns and numerals.
+
| Add and check nouns and proper nouns.<br>Evaluation.
 
|-
 
|-
 
| Week 3
 
| Week 3
| Add and check nouns, pronouns and numerals.
+
| Add and check pronouns and numerals.
 
|-
 
|-
 
| Week 4
 
| Week 4
| Add and check conjunctions.<br>Add necessary bel-ru transfer rules.
+
| Add and check adverbs.<br>Add necessary bel-rus transfer rules.<br>Evaluation.
 
|-
 
|-
 
| '''Deliverable #1'''
 
| '''Deliverable #1'''
| Updated bel monodix, bel-ru bidix and some bel-ru transfer rules.
+
| Updated bel monodix, bel-ru bidix and some bel-rus transfer rules.
 
|-
 
|-
| Week 5
+
| Week 5 (''midterm'')
| Add and check adverbs.
+
| Add and check adjectives.
 
|-
 
|-
 
| Week 6
 
| Week 6
| Add and check verbs.
+
| Add and check adjectives.<br>Evaluation.
 
|-
 
|-
| Week 7 (''midterm'')
+
| Week 7
| Add and check verbs. Start adding adjectives.
+
| Add and check adjectives. Start adding verbs.
 
|-
 
|-
 
| Week 8
 
| Week 8
| Add and check adjectives.<br>Add bel-ru transfer rules.
+
| Add and check verbs.<br>Add bel-rus transfer rules.<br>Evaluation.
 
|-
 
|-
 
| '''Deliverable #2'''
 
| '''Deliverable #2'''
| Almost finished bel monodix, bel-ru bidix and bel-ru transfer rules.
+
| Almost finished bel monodix, bel-rus bidix and bel-rus transfer rules.
 
|-
 
|-
 
| Week 9
 
| Week 9
  +
| Add and check verbs.
| Extend word coverage.<br>Adjust transfer rules as necessary.<br>Run testvoc.
 
 
|-
 
|-
 
| Week 10
 
| Week 10
| Extend word coverage.<br>Adjust transfer rules as necessary.<br>Run testvoc.
+
| Extend word coverage.<br>Adjust transfer rules as necessary.<br>Run testvoc.<br>Evaluation.
 
|-
 
|-
 
| Week 11
 
| Week 11
Line 77: Line 79:
 
|-
 
|-
 
| Week 12
 
| Week 12
 
| Extend word coverage.<br>Adjust transfer rules as necessary.<br>Run testvoc.
| Write documentation and cleanup code.
 
  +
|-
  +
| Week 13
  +
| Cleanup code. Last fixes.
 
|-
 
|-
 
| '''Projection completed'''
 
| '''Projection completed'''
 
| Finished language pair.
 
| Finished language pair.
 
|}
 
|}
  +
  +
There is more detailed work plan [[User:Kvld/Proposal/Workplan|here]].
   
 
==List your skills and give evidence of your qualifications==
 
==List your skills and give evidence of your qualifications==
I'm currently a 2nd year bachelor student in Saint Petersburg University ITMO (Russia).<br>
+
I'm currently a 2nd year bachelor student in Computer Science at Saint Petersburg University ITMO (Russia).<br>
 
Languages: Russian (Native), Belarusian (Good), basic knowledge in Polish.<br>
 
Languages: Russian (Native), Belarusian (Good), basic knowledge in Polish.<br>
Programming skills: C, C++, Java, Python and some scripting languages. Basic knowledge in Machine Learning.
+
Programming skills: C, C++, Java, Python and some scripting languages. Basic knowledge in Machine Learning.<br><br>
   
  +
I also completed the following as part of the coding challenge:
==List any non-Summer-of-Code plans you have for the Summer==
 
  +
* Created [https://svn.code.sf.net/p/apertium/svn/incubator/apertium-bel apertium-bel] and [https://svn.code.sf.net/p/apertium/svn/incubator/apertium-bel-rus apertium-bel-rus].
I have no non-GSoC plans for the summer and I can spend about 50 hours a week on task.
 
  +
* Created translator for [https://svn.code.sf.net/p/apertium/svn/incubator/apertium-bel-rus/texts test text].
  +
* Added nouns to monodix (about 40k words).
  +
* Added some nouns to bidix (about 11k entries).
  +
* Current WER (test text, on 23 March):
  +
** 28.81% (bel -> rus)
  +
** 12.43% (rus -> bel)
   
 
==List any non-Summer-of-Code plans you have for the Summer==
==References==
 
 
I have no non-GSoC plans for the summer and I can spend about 40-50 hours a week on task.
<references/>
 

Latest revision as of 23:48, 24 March 2016

Contact information[edit]

  • Name: Vladislav Kiryukhin
  • E-mail: kiryukhinv(at)gmail.com
  • IRC: kvld
  • SourceForge: kvld

Why is it you are interested in machine translation?[edit]

Machine translation involves linguistics and programming, which I am interested in. It is a very important area that enables people to have information about a variety of things from different parts of world across a language barrier. MT has special value for lesser-known languages, when human translators are frequently not available.

Why is it that you are interested in the Apertium project?[edit]

I completed some tasks for Apertium during Google Code-In 2012. During that Google Code-In I was a high school student and now that I'm over 18 and studying at a university, I can dedicate time for more significant contribution.

Which of the published tasks are you interested in? What do you plan to do?[edit]

Title[edit]

New Belarusian <-> Russian language pair

Reasons why Google and Apertium should sponsor it[edit]

Currently Apertium has no language pairs with Belarusian. My plan is complete the bel-rus pair and bring it to the release quality. If you will give me chance, I think I have all to achieve this goal.

A description of how and who it will benefit in society[edit]

Performing this will give free and open source translation system from Belarusian to Russian. Both languages are official in Belarus and automation of translation may help some people in different situations and save a lot of time.
Also this language marked as "vulnerable" on UNESCO list of endangered languages and any projects in Belarusian may help to popularize it.

Work plan[edit]

Community bonding period:

  • Getting more familiar with Apertium,
  • Reading available documentation and studying the existing pairs,
  • Finding any language resources and dictionaries which can be used,
  • Checking the existing Belarusian files in Incubator,
  • Adding prepositions, interjections, particles, conjunctions.

Work period:

Week Target
Week 1 Add and check nouns.
Week 2 Add and check nouns and proper nouns.
Evaluation.
Week 3 Add and check pronouns and numerals.
Week 4 Add and check adverbs.
Add necessary bel-rus transfer rules.
Evaluation.
Deliverable #1 Updated bel monodix, bel-ru bidix and some bel-rus transfer rules.
Week 5 (midterm) Add and check adjectives.
Week 6 Add and check adjectives.
Evaluation.
Week 7 Add and check adjectives. Start adding verbs.
Week 8 Add and check verbs.
Add bel-rus transfer rules.
Evaluation.
Deliverable #2 Almost finished bel monodix, bel-rus bidix and bel-rus transfer rules.
Week 9 Add and check verbs.
Week 10 Extend word coverage.
Adjust transfer rules as necessary.
Run testvoc.
Evaluation.
Week 11 Extend word coverage.
Adjust transfer rules as necessary.
Run testvoc.
Week 12 Extend word coverage.
Adjust transfer rules as necessary.
Run testvoc.
Week 13 Cleanup code. Last fixes.
Projection completed Finished language pair.

There is more detailed work plan here.

List your skills and give evidence of your qualifications[edit]

I'm currently a 2nd year bachelor student in Computer Science at Saint Petersburg University ITMO (Russia).
Languages: Russian (Native), Belarusian (Good), basic knowledge in Polish.
Programming skills: C, C++, Java, Python and some scripting languages. Basic knowledge in Machine Learning.

I also completed the following as part of the coding challenge:

  • Created apertium-bel and apertium-bel-rus.
  • Created translator for test text.
  • Added nouns to monodix (about 40k words).
  • Added some nouns to bidix (about 11k entries).
  • Current WER (test text, on 23 March):
    • 28.81% (bel -> rus)
    • 12.43% (rus -> bel)

List any non-Summer-of-Code plans you have for the Summer[edit]

I have no non-GSoC plans for the summer and I can spend about 40-50 hours a week on task.