User:Quirille/GSOC proposal 2013

From Apertium
Jump to navigation Jump to search

Contact information[edit]

Name: Krylov Kirill

Email: knp...@gmail.com

IRC: quirille

Other contact information can be provided to the mentor.

Why is it you are interested in machine translation?[edit]

I am very interested in both linguistics and computer science which are the main constituents of machine translation. In school I had 10 years in-depth courses of English and Russian. They were one of my favorite subjects and I examined many linguistic issues (concerned not only Russian and English). Although in the university I mostly make study of programming and computer science, I keep up my passion for linguistics. I find the fields of natural language processing and machine translation very attractive and prospective and want to specialize in them.

Why is it that you are interested in the Apertium project?[edit]

The Apertium project could give me the opportunity to be engaged in the field of machine translation. In addition, Apertium is open source which is very interesting approach to the software development. Also Apertium has many tasks which are so amazing to be realized.

Which of the published tasks are you interested in? What do you plan to do?[edit]

Title[edit]

Ukrainian-Russian language pair for unidirectional translation from Ukrainian to Russian

Reasons why Google and Apertium should sponsor it[edit]

Currently Apertium has no release quality language pair with Russian and there is uncompleted Ukrainian-Russian language pair in the incubator. It should be brought to the release quality. Also there are uncoordinated morphological and morphophonological files for Russian in the different catalogues, they should be arranged.

A description of how and who it will benefit in society[edit]

Performing this task will give free and open source translation system from Ukrainian to Russian. It will help to support the language diversity in Russia and Ukraine. Ukrainian and Russian are the two most spoken languages in Ukraine, so automation of translation will help to save a lot of time. Also getting this translation pair may extend contacts between Russian speaking and Ukrainian speaking people.

Work plan[edit]

Community bonding period (May 27 - June 16):

  • Getting closer with Apertium tools and community
  • Finding the language resources for Ukrainian and Russian
  • Studying testvocing
  • Studying the existing ru-uk monodices, bidix and transfer rules

Work Period (June 17 - September 15)

Week 1:

  • Start extending ukrainian monodix to the size of Russian, adding new entries to bidix and adding necessary uk-ru transfer rules.
  • Check and add conjunctions and prepositions to uk monodix

Week 2:

  • Check and add adverbs to uk monodix

Week 3:

  • Check and add numerals to uk monodix

Week 4:

  • Check and add pronouns and determiners to uk monodix

Deliverable #1: updated uk monodix, ru-uk bidix and ru-uk transfer rules

Week 5:

  • Check and add nouns to uk monodix

Week 6:

  • Check and add nouns to uk monodix

Week 7 (Midterm July 29 - August 2):

  • Check and add adjectives to uk monodix

Deliverable #2: updated uk monodix, ru-uk bidix and ru-uk transfer rules

Week 8:

  • Check and add adjectives to uk monodix

Week 9:

  • Check and add verbs to uk monodix

Week 10:

  • Check and add verbs to uk monodix

Deliverable #3: finished uk monodix, ru-uk bidix and ru-uk transfer rules

Week 11:

  • testing

Week 12:

  • testing

Week 13:

  • testing

Project completion (September 16 - September 23):

  • Tidying up, releasing

Final evaluation (September23- September 27)

List your skills and give evidence of your qualifications[edit]

I am on the 4th (last but one) year of the spetsialist (специалист, russian degree between Bachelor's and Master's) degree in Computer Science and Engineering at the Institute of Management and Information Technologies of the Saint Petersburg State Polytechnical University (Russia).

I am native speaker of Russian. As Ukrainian is close to Russian I can understand it. Also I am able to find out morphological and syntactical peculiarities of Ukrainian.

Programming skills: C, C++, C# and .NET, Matlab, Python, git. I am ready to learn Perl (if necessary).

In the institute I had courses of Machine Learning and Automata Theory. I think knowledge of them will help me to understand Apertium more deep, especially Finite State Transducers. Also I have done some works concerned NLP during my studies. As a course paper of Machine Learning discipline I wrote text attribution program in Matlab based on Bag of Words approach and machine learning algorithms (using libraries randomforest-matlab by Abhishek Jaiantilal and libsvm). As a course paper of Machine Vision discipline I wrote C# program for image classification based on Bag of Words model and SVM algorithm (using EmguCV – C# wrapper of OpenCV).

During last year I worked in company Mallenom Systems attached to our institute as a tester in 2 projects: traffic simulation system Road Manager and program complex Automated rolling stock car identification system ARSCIS. This job gave me team-working skills, knowledge of such a great program as git and helped me to look at the programmers’ job “from the other side of the barricade”.

List any non-Summer-of-Code plans you have for the Summer[edit]

I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However I have exams in the institute from the 3d of June till the 21st of June, and the next term starts at the 2nd of September. So I will start the community bonding period earlier.