User:Quirille/GSOC proposal 2013

From Apertium
Jump to navigation Jump to search

Contact information

Name: Krylov Kirill

E-mail address: knpnvv[at]gmail.com

IRC: quirille

Other contact information can be provided to the mentor.

Why is it you are interested in machine translation?

I am very interested in both linguistics and computer science which are the main constituents of machine translation. In school I had 10 years in-depth courses of English and Russian. They were one of my favorite subjects and I examined many linguistic issues (concerned not only Russian and English). Although in the university I mostly make study of programming and computer science, I keep up my passion for linguistics. I find the fields of natural language processing and machine translation very attractive and prospective and want to specialize in them.

Why is it that you are interested in the Apertium project?

The Apertium project could give me the opportunity to be engaged in the field of machine translation. In addition, Apertium is open source which is very interesting approach to the software development. Also Apertium has many tasks which are so amazing to be realized.

Which of the published tasks are you interested in? What do you plan to do?

Title

Ukrainian-Russian language pair for unidirectional translation from Ukrainian to Russian

Reasons why Google and Apertium should sponsor it

Currently Apertium has no release quality language pair with Russian and there is uncompleted Ukrainian-Russian language pair in the incubator. It should be brought to the release quality. Also there are uncoordinated morphological and morphophonological files for Russian in the different catalogues, they should be arranged.

A description of how and who it will benefit in society

Performing this task will give free and open source translation system from Ukrainian to Russian. It will help to support the language diversity in Russia and Ukraine. Ukrainian and Russian are the two most spoken languages in Ukraine, so automation of translation will help to save a lot of time. Also getting this translation pair may extend contacts between Russian speaking and Ukrainian speaking people.

Work plan

Community bonding period (May 27 - June 16):

  • Getting closer with Apertium tools and community
  • Finding the language resources for Ukrainian and Russian
  • Studying testvocing
  • Studying the existing ru-uk monodices, bidix and transfer rules
  • Studying the existing Russian monodices

Work Period (June 17 - September 15)

Week 1:

  • Start working on ru&uk monodices

Week 2:

  • Continue working on ru&uk monodices

Week 3:

  • Continue working on ru&uk monodices

Week 4:

  • Checking up ru&uk monodices

Deliverable #1: updated ru&uk monodices, coordinated ru monodices

Week 5:

  • Start working on ru-uk bidix

Week 6:

  • Continue working on ru-uk bidix

Week 7 (Midterm July 29 - August 2):

  • Checking up ru-uk bidix

Deliverable #2: updated bidix

Week 8:

  • Start working on ru-uk transfer rules

Week 9:

  • Continue working on ru-uk transfer rules

Week 10:

  • Continue working on ru-uk transfer rules

Week 11:

  • Checking up ru-uk transfer rules

Deliverable #3: updated ru-uk transfer rules

Week 12:

  • testvocing

Week 13:

  • testvocing

Project completion (September 16 - September 23)

Final evaluation (September23- September 27)

List your skills and give evidence of your qualifications

I am on the 4th (last but one) year of the spetsialist (специалист, russian degree between Bachelor's and Master's) degree in Computer Science and Engineering at the Institute of Management and Information Technologies of the Saint Petersburg State Polytechnical University (Russia).

I am native speaker of Russian. As Ukrainian is close to Russian I can understand it. Also I am able to find out morphological and syntactical peculiarities of Ukrainian.

Programming skills: C, C++, C# and .NET, Matlab, Python, git. I am ready to learn Perl (if necessary).

In the institute I had courses of Machine Learning and Automata Theory. I think knowledge of them will help me to understand Apertium more deep, especially Finite State Transducers. Also I have done some works concerned NLP during my studies. As a course paper of Machine Learning discipline I wrote text attribution program in Matlab based on Bag of Words approach and machine learning algorithms (using libraries randomforest-matlab by Abhishek Jaiantilal and libsvm). As a course paper of Machine Vision discipline I wrote C# program for image classification based on Bag of Words model and SVM algorithm (using EmguCV – C# wrapper of OpenCV).

During last year I worked in company Mallenom Systems attached to our institute as a tester in 2 projects: traffic simulation system Road Manager and program complex Automated rolling stock car identification system ARSCIS. This job gave me team-working skills, knowledge of such a great program as git and helped me to look at the programmers’ job “from the other side of the barricade”.

List any non-Summer-of-Code plans you have for the Summer

I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However I have exams in the institute from the 3d of June till the 21st of June, and the next term starts at the 2nd of September. So I will start the community bonding period earlier.