User:Surajkawade/GSOC proposal: Marathi and English

From Apertium
Jump to navigation Jump to search

Name

Suraj Kawade

Contact information

IRC nick : develover

E-mail : suraj.kawade@gmail.com / suraj.kawade@hotmail.com

Phone no : +918983005859 / +919404943130

skype username : yesiamsuraj

Why are you interested in machine translation?

As I am interested in linguistics and I love programming, machine translation is magnet for me! World is culturally diverse and languages are barrier cum ways to these cultures. I have read that (on Wikipedia) "There are between 6000 and 7000 languages currently spoken, and that between 50-90% of those will have become extinct by the year 2100". I was shocked but I don't want to feel helpless. Though humans speak in different tongues, they express the same thing! Then why shouldn't I gather my curiosity to know more how these languages are related and how they differ in something? And to help society not to blank out the gift their ancestors gave them? Everything is going digital and fast and so is the field of NLP, and MT is helping a large part in it and I want to be a (small though) part of it.

Why are you interested in the Apertium project?

The best things in the world are free (as in 'freedom')! Open Source is free and Apertium is Open Source. So by the law of trasitivity Apertium is best thing. If I say I do not want languages dying in front of my eyes, I should help avoiding it and thus I found Apertium. I think Apertium is community of knowledgeable, inspiring people who are really enthusiastic on a common cause and most importantly, they love what they do and the other way around.(I figured this out while talking to them in the IRC channel.) And most importantly to "do" something for preserving a language, with Apertium, you really need less resources at the beginning, which is really helpful, less hectic and hence encouraging. Apertium uses rule-based translation methods and not the dictionary based, which makes it work with the meanings of words and not just the words, hence more close to humans.

Why Google and Apertium should sponsor it?

On knowing there is nothing done of release quality in Apertium regarding Marathi, I decided will work on it. Marathi is written in Devanagari script and Apertium is yet to release pair containing a Devanagari script language(most of them are in incubator). Doing extensive work and bringing Marathi-English pair to release quality will also encourage adaptation of those Devanagari languages in incubator.

How and who it will benefit in society?

Marathi is 19th most spoken language in the world and an official language of state of Maharashtra. Though Marathi has rich literature and glorious history, there is no reliable and quality translation solutions available as of now. Even Google Translate do not provide Marathi translation services. It is observed that Marathi speaking students struggle more in learning English as compared to that of other Indian students, who, at some extent, have digital tools available to them. Due to the tablet and smartphone explosion and easy availability of Internet in India, people are using lot social-networking sites, they have stated blogging and are reading news online. In such days, not having good Marathi-English translation tool feels inconvenient. Creating a tool using machine translation with Apertium will not only server the need but also benefit large community.

Which of the published tasks are you interested in? What do you plan to do?

As there is no Marathi-English pair in Apertium so far, I am starting to work on it from scratch. There is Marathi-Hindi bilingual dictionary in incubator but I don't know how much it is completed. I will try if it helps me in my project. My interest and enthusiasm says that I am going to try to bring Marathi-English pair to release quality.

Work plan

Coding challenge

Installation

  • I installed new Ubuntu machine in VirtualBox for Aprtium installation.
  • firespeaker helped me to install the system on my machine.
  • After installing Apertium and lttolbox, I decided to install en-es language pair.
  • Initially I got lots of errors and problems regarding permissions but with the help of firespeaker I succeeded to install the system.

Getting Started

  • Then I got introduced to spectie in IRC channel. He gave me links to documentation on "How to start a new language pair". He also gave me links to study the basic structural and functional elements of Apertium system and it's working.
  • spectie gave me a document(a story) in English to translate it to Marathi. I completed it and sent it to him.
  • spectie created a basic Marathi-English system for me and since I was familiar with the symbols and terminologies(through documentation reading), I understood it quickly. He also added some words in monolingual and bilingual dictionaries and gave me a list of words to add by myself. Initially I felt I was doing it too slow but after adding more words I got the mechanism and I am comfortable in it now.
  • spectie told me how to pull, make changes and commit the changes. I did it successfully.

Community Bonding Period

  • I first joined the #apertium IRC channel, where I got introduced to fellow members of the community.
  • Then I joined the mailing list and created an user account on Apertium wiki.
  • I got proper directions about stuff I should read, how to install the Apertium system on my machine form community members(especially firespeaker) in the IRC.
  • I got a real push and encouragement from spectie who helped me to create a basic mar-eng pair and helped me through the coding challenge.
  • I have been learning a lot from IRC channel and the documentation available.
  • I have become so positive about Apertium that even if my project gets rejected I will work for my Marathi-English pair. Because I am working 'for' my people 'with' good people!
  • I am planning to learn as much as possible before the actual coding commences to make me 'bleed' less in the 'war' and win it.

Week Plan

WEEK DATE PLANS
Week 01 06.17-06.23
Week 02 06.24-06.30
Week 03 07.01-07.07
Week 04 07.08-07.14
Deliverable #1
Week 05 07.15-07.21
Week 06 07.22-07.28
Week 07 07.29-08.04
Week 08 08.05-08.11
Deliverable #2
Week 09 08.12-08.18
Week 10 08.19-08.25
Week 11 08.26-09.01
Week 12 09.02-09.08
Deliverable #3

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

List your skills and give evidence of your qualifications

My non-Summer-of-Code plans for the Summer

I will be a bit busy for university exams until they finish on 10th of June, after which I have no other commitments. I will be available for about 40 hours a week for entire project period. On Saturdays and Sundays I can add more hours to work.