Difference between revisions of "User:Ljmocic/GSoC 2016 proposal"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
====== Level 2 ======
== Contact information ==
Name: Ljubiša Moćić
'''Name:''' Ljubiša Moćić


ljubisa.mocic@gmail.com
ljubisa.mocic@gmail.com
Line 8: Line 8:




Why is it you are interested in machine translation?
== Why is it you are interested in machine translation? ==


I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting.
I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting.
My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.
My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.


Why is it that you are interested in the Apertium project?
== Why is it that you are interested in the Apertium project? ==


I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation.
I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation.
Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.
Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.


Which of the published tasks are you interested in? What do you plan to do?
== Which of the published tasks are you interested in? What do you plan to do? ==
Adopt an unreleased language pair.
Adopt an unreleased language pair.




Reasons why Google and Apertium should sponsor it.
== Reasons why Google and Apertium should sponsor it. ==


It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).
It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).


How and who it will benefit in society.
== How and who it will benefit in society. ==


Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.
Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.


Work plan
== Work plan ==

Before the commencement of coding period of GSoC, I will be focused on:
Before the commencement of coding period of GSoC, I will be focused on:
- Connecting with community.
- Connecting with community.
Line 58: Line 59:
Deliverable 3: language pair(release quality) and documentation.
Deliverable 3: language pair(release quality) and documentation.


List your skills and give evidence of your qualifications.
== List your skills and give evidence of your qualifications. ==


Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad.
Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad.
Line 71: Line 72:




List any non-Summer-of-Code plans you have for the Summer
== List any non-Summer-of-Code plans you have for the Summer ==


Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.
Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.

Revision as of 15:50, 12 March 2016

Contact information

Name: Ljubiša Moćić

ljubisa.mocic@gmail.com IRC: ljmocic (freenode.net) Twitter: @ljmocic SourceForge: lmocic


Why is it you are interested in machine translation?

I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting. My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.

Why is it that you are interested in the Apertium project?

I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation. Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.

Which of the published tasks are you interested in? What do you plan to do?

Adopt an unreleased language pair.


Reasons why Google and Apertium should sponsor it.

It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).

How and who it will benefit in society.

Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.

Work plan

Before the commencement of coding period of GSoC, I will be focused on: - Connecting with community. - Exploring and understanding of Apertium developing environment. - Researching more about machine learning - Enhancing knowledge related to hbs-rus language pair.

Week 1: Finish coding challenge, run testvoc Week 2: Write lexical selection rules, write transfer rules Week 3: Adding more nouns, verbs, pronouns Week 4: Adding more nouns, adjectives

Deliverable 1: Extended dictionary, added/improved lexical/transfer rules.

Week 5: Adding more adverbs, verbs Week 6: Continue extending hbs-rus bilingual dictionary Week 7: Add/improve transfer rules, extend word coverage Week 8: Cleaning up, run testvoc

Deliverable 2: Extended dictionary to trunk level, higher level word coverage.

Week 9: Add/adjust rules as necessary, extend word coverage Week 10: Perform thorough testings Week 11: Writing wiki pages Week 12: Cleaning up, last minute fixes.

Deliverable 3: language pair(release quality) and documentation.

List your skills and give evidence of your qualifications.

Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad. Languages: Can’t say that Serbian or Russian is my native language, because i speak both as long as i remember. I know Croatian, Bosnian, Montenegrin languages on a good level, mainly due their similarity. Beside these slavic languages, i have learned b1 level of german language. Open Source: I have experience on working on this language pair while I was on Google-Code-In. Contributed to Apertium, Amarok, Opensuse, SurveyMonkey and SymPy. Programming languages: Most used: Python, C/C++. Experience through projects: Bash, HTML, CSS, XML, Matlab/Octave. Basic familiarity: Java, JavaScript, Assembler, VHDL.


List any non-Summer-of-Code plans you have for the Summer

Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.