User:Ljmocic/GSoC 2016 proposal

From Apertium
Jump to navigation Jump to search

Contact information

Name: Ljubiša Moćić

ljubisa.mocic@gmail.com IRC: ljmocic (freenode.net) Twitter: @ljmocic SourceForge: lmocic


Why is it you are interested in machine translation?

I am interested in machine translation because I found out many applications of machine translation very useful to my community and me. First one is removing language barriers which is one of the best things that machine translation can provide. Also machine translation is very complex area, development is not easy and it requires a lot of time spent developing, updating and refining. So it is clear that it is challenging, but that makes it so interesting. My interest in machine translation began developing after I found out a lot of ways to make it even more useful when it combines with artificial intelligence, robotics and natural language processing.

Why is it that you are interested in the Apertium project?

I am interested in the Apertium project for many reasons. First one is the accuracy. While many projects try to create very accurate machine translations, most fail at this job. But Apertium takes advantage, mainly because of focusing on quality over quantity. Even if complexity “Under the hood” is high, it delivers quality translation. Of course, open-source is the one of the main reasons why Apertium is amazing. I have plans for using Apertium for my research in future, so it would be very useful to extend Apertium library of language pairs. Also, I’ve worked with Apertium on Google-Code-in and I liked team and atmosphere.

Which of the published tasks are you interested in? What do you plan to do?

Adopt an unreleased language pair.


Reasons why Google and Apertium should sponsor it.

It should be sponsored because there is no existing high quality machine translation tool for Serbo-Croatian to Russian language, and this language pair would create it with help of Apertium and Google. This language pair would be useful by wide public community of Serbia, Croatia, Russia( also Montenegro, Bosnia and Herzegovina because of similarity between languages).

How and who it will benefit in society.

Besides benefiting the ones who are learning sh-ru in one direction or the other, I would introduce my professors and colleagues to this project. Particularly, it would benefit those students who wish to delve deeper into the subject of Machine Translation. Also 170 million people(Russia, Croatia, Serbia, Bosnia, Montenegro). Possible useful documentation for future development.

Work plan

Before the commencement of coding period of GSoC, I will be focused on: - Connecting with community. - Exploring and understanding of Apertium developing environment. - Researching more about machine learning - Enhancing knowledge related to hbs-rus language pair.

Week 1: Finish coding challenge, run testvoc Week 2: Write lexical selection rules, write transfer rules Week 3: Adding more nouns, verbs, pronouns Week 4: Adding more nouns, adjectives

Deliverable 1: Extended dictionary, added/improved lexical/transfer rules.

Week 5: Adding more adverbs, verbs Week 6: Continue extending hbs-rus bilingual dictionary Week 7: Add/improve transfer rules, extend word coverage Week 8: Cleaning up, run testvoc

Deliverable 2: Extended dictionary to trunk level, higher level word coverage.

Week 9: Add/adjust rules as necessary, extend word coverage Week 10: Perform thorough testings Week 11: Writing wiki pages Week 12: Cleaning up, last minute fixes.

Deliverable 3: language pair(release quality) and documentation.

List your skills and give evidence of your qualifications.

Education: I am on the 2th year of Bachelor’s degree in Computer Science and Engineering at the Faculty of Technical Sciences in University of Novi Sad. Languages: Can’t say that Serbian or Russian is my native language, because i speak both as long as i remember. I know Croatian, Bosnian, Montenegrin languages on a good level, mainly due their similarity. Beside these slavic languages, i have learned b1 level of german language. Open Source: I have experience on working on this language pair while I was on Google-Code-In. Contributed to Apertium, Amarok, Opensuse, SurveyMonkey and SymPy. Programming languages: Most used: Python, C/C++. Experience through projects: Bash, HTML, CSS, XML, Matlab/Octave. Basic familiarity: Java, JavaScript, Assembler, VHDL.


List any non-Summer-of-Code plans you have for the Summer

Exams at my faculty are scheduled to take place from June 12th to July 15th and in that period I will probably be forced to spend less time working on project, but I will compensate these hours. I plan to work at least 30 hours per week on average.