Difference between revisions of "User:Ljmocic/GSoC 2016 proposal"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by 2 users not shown)
Line 1: Line 1:
====== Level 6 ======
+
== Contact information ==
Name: Ljubiša Moćić
 
   
 
'''Name:''' Ljubiša Moćić
ljubisa.mocic@gmail.com
 
IRC: ljmocic (freenode.net)
 
Twitter: @ljmocic
 
SourceForge: lmocic
 
   
 
'''E-mail address:''' ljubisa.mocic[at]gmail.com
   
 
'''IRC:''' ljmocic
Why is it you are interested in machine translation?
 
  +
 
'''SourceForge:''' lmocic
  +
  +
 
== Why is it you are interested in machine translation? ==
   
 
I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting.
 
I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting.
 
My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.
 
My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.
   
Why is it that you are interested in the Apertium project?
+
== Why is it that you are interested in the Apertium project? ==
   
 
I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation.
 
I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation.
 
Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.
 
Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.
   
Which of the published tasks are you interested in? What do you plan to do?
+
== Which of the published tasks are you interested in? What do you plan to do? ==
  +
  +
=== Title ===
 
Adopt an unreleased language pair.
 
Adopt an unreleased language pair.
   
   
Reasons why Google and Apertium should sponsor it.
+
== Reasons why Google and Apertium should sponsor it. ==
   
 
It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).
 
It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).
   
How and who it will benefit in society.
+
== How and who it will benefit in society. ==
   
 
Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.
 
Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.
   
Work plan
+
== Work plan ==
  +
 
Before the commencement of coding period of GSoC, I will be focused on:
 
Before the commencement of coding period of GSoC, I will be focused on:
 
- Connecting with community.
 
- Connecting with community.
Line 37: Line 42:
 
- Enhancing knowledge related to hbs-rus language pair.
 
- Enhancing knowledge related to hbs-rus language pair.
   
  +
Week 1:
Week 1: Finish coding challenge, run testvoc
+
*Finish coding challenge, run testvoc
Week 2: Write lexical selection rules, write transfer rules
 
Week 3: Adding more nouns, verbs, pronouns
 
Week 4: Adding more nouns, adjectives
 
   
  +
Week 2:
Deliverable 1: Extended dictionary, added/improved lexical/transfer rules.
 
 
*Write lexical selection rules, write transfer rules
   
Week 5: Adding more adverbs, verbs
+
Week 3:
 
*Adding more nouns, verbs, pronouns
Week 6: Continue extending hbs-rus bilingual dictionary
 
Week 7: Add/improve transfer rules, extend word coverage
 
Week 8: Cleaning up, run testvoc
 
   
  +
Week 4:
Deliverable 2: Extended dictionary to trunk level, higher level word coverage.
 
 
*Adding more nouns, adjectives
   
 
'''Deliverable #1:''' Extended dictionary, added/improved lexical/transfer rules.
Week 9: Add/adjust rules as necessary, extend word coverage
 
Week 10: Perform thorough testings
 
Week 11: Writing wiki pages
 
Week 12: Cleaning up, last minute fixes.
 
   
  +
Week 5:
Deliverable 3: language pair(release quality) and documentation.
 
  +
*Adding more adverbs, verbs
   
  +
Week 6:
List your skills and give evidence of your qualifications.
 
 
*Continue extending hbs-rus bilingual dictionary
  +
  +
Week 7:
 
*Add/improve transfer rules, extend word coverage
  +
  +
Week 8:
 
*Cleaning up, run testvoc
  +
 
'''Deliverable #2:''': Extended dictionary to trunk level, higher level word coverage.
  +
  +
Week 9:
 
*Add/adjust rules as necessary, extend word coverage
  +
  +
Week 10:
 
*Perform thorough testings
  +
  +
Week 11:
 
*Writing wiki pages
  +
  +
Week 12:
 
*Cleaning up, last minute fixes.
  +
 
'''Deliverable #3:''' language pair(release quality) and documentation.
  +
 
== List your skills and give evidence of your qualifications. ==
   
 
Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad.
 
Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad.
  +
 
Languages: Can’t say that Serbian or Russian is my native language, because i speak both as long as i remember. I know Croatian, Bosnian, Montenegrin languages on a good level, mainly due their similarity. Beside these slavic languages, i have learned b1 level of german language.
 
Languages: Can’t say that Serbian or Russian is my native language, because i speak both as long as i remember. I know Croatian, Bosnian, Montenegrin languages on a good level, mainly due their similarity. Beside these slavic languages, i have learned b1 level of german language.
  +
 
Open Source:
 
Open Source:
 
I have experience on working on this language pair while I was on Google-Code-In.
 
I have experience on working on this language pair while I was on Google-Code-In.
 
Contributed to Apertium, Amarok, Opensuse, SurveyMonkey and SymPy.
 
Contributed to Apertium, Amarok, Opensuse, SurveyMonkey and SymPy.
  +
 
Programming languages:
 
Programming languages:
 
Most used: Python, C/C++.
 
Most used: Python, C/C++.
  +
 
Experience through projects: Bash, HTML, CSS, XML, Matlab/Octave.
 
Experience through projects: Bash, HTML, CSS, XML, Matlab/Octave.
Basic familiarity: Java, JavaScript, Assembler, VHDL.
 
   
 
Basic familiarity: Java, JavaScript, Assembler, VHDL.
   
List any non-Summer-of-Code plans you have for the Summer
+
== List any non-Summer-of-Code plans you have for the Summer ==
   
 
Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.
 
Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.

Latest revision as of 01:50, 20 March 2016

Contact information[edit]

Name: Ljubiša Moćić

E-mail address: ljubisa.mocic[at]gmail.com

IRC: ljmocic

SourceForge: lmocic


Why is it you are interested in machine translation?[edit]

I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting. My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.

Why is it that you are interested in the Apertium project?[edit]

I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation. Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.

Which of the published tasks are you interested in? What do you plan to do?[edit]

Title[edit]

Adopt an unreleased language pair.


Reasons why Google and Apertium should sponsor it.[edit]

It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).

How and who it will benefit in society.[edit]

Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.

Work plan[edit]

Before the commencement of coding period of GSoC, I will be focused on: - Connecting with community. - Exploring and understanding of Apertium developing environment. - Researching more about machine learning - Enhancing knowledge related to hbs-rus language pair.

Week 1:

  • Finish coding challenge, run testvoc

Week 2:

  • Write lexical selection rules, write transfer rules

Week 3:

  • Adding more nouns, verbs, pronouns

Week 4:

  • Adding more nouns, adjectives

Deliverable #1: Extended dictionary, added/improved lexical/transfer rules.

Week 5:

  • Adding more adverbs, verbs

Week 6:

  • Continue extending hbs-rus bilingual dictionary

Week 7:

  • Add/improve transfer rules, extend word coverage

Week 8:

  • Cleaning up, run testvoc

Deliverable #2:: Extended dictionary to trunk level, higher level word coverage.

Week 9:

  • Add/adjust rules as necessary, extend word coverage

Week 10:

  • Perform thorough testings

Week 11:

  • Writing wiki pages

Week 12:

  • Cleaning up, last minute fixes.

Deliverable #3: language pair(release quality) and documentation.

List your skills and give evidence of your qualifications.[edit]

Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad.

Languages: Can’t say that Serbian or Russian is my native language, because i speak both as long as i remember. I know Croatian, Bosnian, Montenegrin languages on a good level, mainly due their similarity. Beside these slavic languages, i have learned b1 level of german language.

Open Source: I have experience on working on this language pair while I was on Google-Code-In. Contributed to Apertium, Amarok, Opensuse, SurveyMonkey and SymPy.

Programming languages: Most used: Python, C/C++.

Experience through projects: Bash, HTML, CSS, XML, Matlab/Octave.

Basic familiarity: Java, JavaScript, Assembler, VHDL.

List any non-Summer-of-Code plans you have for the Summer[edit]

Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.