User:Ljmocic/GSoC 2016 proposal

From Apertium
< User:Ljmocic
Revision as of 15:57, 12 March 2016 by Francis Tyers (talk | contribs) (Francis Tyers moved page Ljmocic GSoC 2016 proposal to User:Ljmocic/GSoC 2016 proposal without leaving a redirect)
Jump to navigation Jump to search

Contact information

Name: Ljubiša Moćić

E-mail address: ljubisa.mocic[at]gmail.com

IRC: ljmocic

SourceForge: lmocic


Why is it you are interested in machine translation?

I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting. My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.

Why is it that you are interested in the Apertium project?

I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation. Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.

Which of the published tasks are you interested in? What do you plan to do?

Title

Adopt an unreleased language pair.


Reasons why Google and Apertium should sponsor it.

It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).

How and who it will benefit in society.

Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.

Work plan

Before the commencement of coding period of GSoC, I will be focused on: - Connecting with community. - Exploring and understanding of Apertium developing environment. - Researching more about machine learning - Enhancing knowledge related to hbs-rus language pair.

Week 1:

  • Finish coding challenge, run testvoc

Week 2:

  • Write lexical selection rules, write transfer rules

Week 3:

  • Adding more nouns, verbs, pronouns

Week 4:

  • Adding more nouns, adjectives

Deliverable #1: Extended dictionary, added/improved lexical/transfer rules.

Week 5:

  • Adding more adverbs, verbs

Week 6:

  • Continue extending hbs-rus bilingual dictionary

Week 7:

  • Add/improve transfer rules, extend word coverage

Week 8:

  • Cleaning up, run testvoc

Deliverable #2:: Extended dictionary to trunk level, higher level word coverage.

Week 9:

  • Add/adjust rules as necessary, extend word coverage

Week 10:

  • Perform thorough testings

Week 11:

  • Writing wiki pages

Week 12:

  • Cleaning up, last minute fixes.

Deliverable #3: language pair(release quality) and documentation.

List your skills and give evidence of your qualifications.

Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad. Languages: Can’t say that Serbian or Russian is my native language, because i speak both as long as i remember. I know Croatian, Bosnian, Montenegrin languages on a good level, mainly due their similarity. Beside these slavic languages, i have learned b1 level of german language. Open Source: I have experience on working on this language pair while I was on Google-Code-In. Contributed to Apertium, Amarok, Opensuse, SurveyMonkey and SymPy. Programming languages: Most used: Python, C/C++. Experience through projects: Bash, HTML, CSS, XML, Matlab/Octave. Basic familiarity: Java, JavaScript, Assembler, VHDL.


List any non-Summer-of-Code plans you have for the Summer

Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.