Difference between revisions of "User:Surajkawade/GSOC proposal: Marathi and English"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:

== Name ==
== Name ==

Suraj Kawade
'''Suraj Kawade'''


== Contact information ==
== Contact information ==
Line 71: Line 71:
|-
|-
|Week 01
|Week 01
|06.17-06.23
|Jun 17-Jun 23
|
|
|-
|-
|Week 02
|Week 02
|06.24-06.30
|Jun 24-Jun 30
|
|
|-
|-
|Week 03
|Week 03
|Jul 1-Jul 7
|07.01-07.07
|
|
|-
|-
|Week 04
|Week 04
|07.08-07.14
|Jul 8-Jul 14
|
|
|-
|-
Line 91: Line 91:
|-
|-
|Week 05
|Week 05
|07.15-07.21
|Jul 15-Jul 21
|
|
|-
|-
|Week 06
|Week 06
|07.22-07.28
|Jul 22-Jul 28
|
|
|-
|-
|Week 07
|Week 07
|07.29-08.04
|Jul 29-Aug 4
|
|
|-
|-
|Week 08
|Week 08
|08.05-08.11
|Aug 5-Aug 11
|
|
|-
|-
Line 111: Line 111:
|-
|-
|Week 09
|Week 09
|08.12-08.18
|Aug 12-Aug 18
|
|
|-
|-
|Week 10
|Week 10
|08.19-08.25
|Aug 19-Aug 25
|
|
|-
|-
|Week 11
|Week 11
|08.26-09.01
|Aug 26-Sept 1
|
|
|-
|-
|Week 12
|Week 12
|Sept 2-Sept 8
|09.02-09.08
|
|
|-
|-

Revision as of 17:20, 2 May 2013

Name

Suraj Kawade

Contact information

IRC nick : develover

E-mail : suraj.kawade@gmail.com / suraj.kawade@hotmail.com

Phone no : +918983005859 / +919404943130

skype username : yesiamsuraj

Why are you interested in machine translation?

As I am interested in linguistics and I love programming, machine translation is magnet for me! World is culturally diverse and languages are barrier cum ways to these cultures. I have read that (on Wikipedia) "There are between 6000 and 7000 languages currently spoken, and that between 50-90% of those will have become extinct by the year 2100". I was shocked but I don't want to feel helpless. Though humans speak in different tongues, they express the same thing! Then why shouldn't I gather my curiosity to know more how these languages are related and how they differ in something? And to help society not to blank out the gift their ancestors gave them? Everything is going digital and fast and so is the field of NLP, and MT is helping a large part in it and I want to be a (small though) part of it.

Why are you interested in the Apertium project?

The best things in the world are free (as in 'freedom')! Open Source is free and Apertium is Open Source. So by the law of trasitivity Apertium is best thing. If I say I do not want languages dying in front of my eyes, I should help avoiding it and thus I found Apertium. I think Apertium is community of knowledgeable, inspiring people who are really enthusiastic on a common cause and most importantly, they love what they do and the other way around.(I figured this out while talking to them in the IRC channel.) And most importantly to "do" something for preserving a language, with Apertium, you really need less resources at the beginning, which is really helpful, less hectic and hence encouraging. Apertium uses rule-based translation methods and not the dictionary based, which makes it work with the meanings of words and not just the words, hence more close to humans.

Why Google and Apertium should sponsor it?

On knowing there is nothing done of release quality in Apertium regarding Marathi, I decided will work on it. Marathi is written in Devanagari script and Apertium is yet to release pair containing a Devanagari script language(most of them are in incubator). Doing extensive work and bringing Marathi-English pair to release quality will also encourage adaptation of those Devanagari languages in incubator.

How and who it will benefit in society?

Marathi is 19th most spoken language in the world and an official language of state of Maharashtra. Though Marathi has rich literature and glorious history, there is no reliable and quality translation solutions available as of now. Even Google Translate do not provide Marathi translation services. It is observed that Marathi speaking students struggle more in learning English as compared to that of other Indian students, who, at some extent, have digital tools available to them. Due to the tablet and smartphone explosion and easy availability of Internet in India, people are using lot social-networking sites, they have stated blogging and are reading news online. In such days, not having good Marathi-English translation tool feels inconvenient. Creating a tool using machine translation with Apertium will not only server the need but also benefit large community.

Which of the published tasks are you interested in? What do you plan to do?

As there is no Marathi-English pair in Apertium so far, I am starting to work on it from scratch. There is Marathi-Hindi bilingual dictionary in incubator but I don't know how much it is completed. I will try if it helps me in my project. My interest and enthusiasm says that I am going to try to bring Marathi-English pair to release quality.

Work plan

Coding challenge

Installation

  • I installed new Ubuntu machine in VirtualBox for Aprtium installation.
  • firespeaker helped me to install the system on my machine.
  • After installing Apertium and lttolbox, I decided to install en-es language pair.
  • Initially I got lots of errors and problems regarding permissions but with the help of firespeaker I succeeded to install the system.

Getting Started

  • Then I got introduced to spectie in IRC channel. He gave me links to documentation on "How to start a new language pair". He also gave me links to study the basic structural and functional elements of Apertium system and it's working.
  • spectie gave me a document(a story) in English to translate it to Marathi. I completed it and sent it to him.
  • spectie created a basic Marathi-English system for me and since I was familiar with the symbols and terminologies(through documentation reading), I understood it quickly. He also added some words in monolingual and bilingual dictionaries and gave me a list of words to add by myself. Initially I felt I was doing it too slow but after adding more words I got the mechanism and I am comfortable in it now.
  • spectie told me how to pull, make changes and commit the changes. I did it successfully.

Community Bonding Period

  • I first joined the #apertium IRC channel, where I got introduced to fellow members of the community.
  • Then I joined the mailing list and created an user account on Apertium wiki.
  • I got proper directions about stuff I should read, how to install the Apertium system on my machine form community members(especially firespeaker) in the IRC.
  • I got a real push and encouragement from spectie who helped me to create a basic mar-eng pair and helped me through the coding challenge.
  • I have been learning a lot from IRC channel and the documentation available.
  • I have become so positive about Apertium that even if my project gets rejected I will work for my Marathi-English pair. Because I am working 'for' my people 'with' good people!
  • I am planning to learn as much as possible before the actual coding commences to make me 'bleed' less in the 'war' and win it.

Week Plan

WEEK DATE PLANS
Week 01 Jun 17-Jun 23
Week 02 Jun 24-Jun 30
Week 03 Jul 1-Jul 7
Week 04 Jul 8-Jul 14
Deliverable #1
Week 05 Jul 15-Jul 21
Week 06 Jul 22-Jul 28
Week 07 Jul 29-Aug 4
Week 08 Aug 5-Aug 11
Deliverable #2
Week 09 Aug 12-Aug 18
Week 10 Aug 19-Aug 25
Week 11 Aug 26-Sept 1
Week 12 Sept 2-Sept 8
Deliverable #3

List your skills and give evidence of your qualifications

My non-Summer-of-Code plans for the Summer

I will be a bit busy for university exams until they finish on 10th of June, after which I have no other commitments. I will be available for about 40 hours a week for entire project period. On Saturdays and Sundays I can add more hours to work.