Difference between revisions of "User:Surajkawade/GSOC proposal: Marathi and English"

From Apertium
Jump to navigation Jump to search
Line 48: Line 48:


* Then I got introduced to spectie in IRC channel. He gave me links to documentation on "How to start a new language pair". He also gave me links to study the basic structural and functional elements of Apertium system and it's working.
* Then I got introduced to spectie in IRC channel. He gave me links to documentation on "How to start a new language pair". He also gave me links to study the basic structural and functional elements of Apertium system and it's working.

* spectie gave me a document(a story) in English to translate it to Marathi. I completed it and sent it to him.

* spectie created a basic Marathi-English system for me and since I was familiar with the symbols and terminologies(through documentation reading), I understood it quickly. He also added some words in monolingual and bilingual dictionaries and gave me a list of words to add by myself. Initially I felt I was doing it too slow but after adding more words I got the mechanism and I am comfortable in it now.

* spectie told me how to pull, make changes and commit the changes. I did it successfully.


=== Community Bonding Period ===
=== Community Bonding Period ===

Revision as of 14:59, 2 May 2013

Name

Suraj Kawade

Contact information

IRC nick : develover

E-mail : suraj.kawade@gmail.com / suraj.kawade@hotmail.com

Phone no : +918983005859 / +919404943130

skype username : yesiamsuraj

blog :: http://develover.wordpress.com/

Why are you interested in machine translation?

As I am interested in linguistics and I love programming, machine translation is magnet for me! World is culturally diverse and languages are barrier cum ways to these cultures. I have read that (on Wikipedia) "There are between 6000 and 7000 languages currently spoken, and that between 50-90% of those will have become extinct by the year 2100". I was shocked but I don't want to feel helpless. Though humans speak in different tongues, they express the same thing! Then why shouldn't I gather my curiosity to know more how these languages are related and how they differ in something? And to help society not to blank out the gift their ancestors gave them? Everything is going digital and fast and so is the field of NLP, and MT is helping a large part in it and I want to be a (small though) part of it.

Why are you interested in the Apertium project?

The best things in the world are free (as in 'freedom')! Open Source is free and Apertium is Open Source. So by the law of commutativity Apertium is best thing. If I say I do not want languages dying in front of my eyes, I should help avoiding it and thus I found Apertium. I think Apertium is community of knowledgeable, inspiring people who are really enthusiastic on a common cause and most importantly, they love what they do and the other way around.(I figured this out while talking to them in the IRC channel.) And most importantly to "do" something for preserving a language, with Apertium, you really need less resources at the beginning, which is really helpful, less hectic and hence encouraging. Apertium uses rule-based translation methods and not the dictionary based, which makes it work with the meanings of words and not just the words, hence more close to humans.

Why Google and Apertium should sponsor it?

On knowing there is nothing done of release quality in Apertium regarding Marathi, I decided will work on it. Marathi is written in Devanagari script and Apertium is yet to release pair containing a Devanagari script language(most of them are in incubator). Doing extensive work and bringing Marathi-English pair to release quality will also encourage adaptation of those Devanagari languages in incubator.

How and who it will benefit in society?

Marathi is 19th most spoken language in the world and an official language of state of Maharashtra. Though Marathi has rich literature and glorious history, there is no reliable and quality translation solutions available as of now. Even Google Translate do not provide Marathi translation services. It is observed that Marathi speaking students struggle more in learning English as compared to that of other Indian students, who, at some extent, have digital tools available to them. Due to the tablet and smartphone explosion and easy availability of Internet in India, people are using lot social-networking sites, they have stated blogging and are reading news online. In such days, not having good Marathi-English translation tool feels inconvenient. Creating a tool using machine translation with Apertium will not only server the need but also benefit large community.

Which of the published tasks are you interested in? What do you plan to do?

As there is no Marathi-English pair in Apertium so far, I am starting to work on it from scratch. There is Marathi-Hindi bilingual dictionary in incubator but I don't know how much it is completed. I will try if it helps me in my project. My interest and enthusiasm says that I am going to try to bring Marathi-English pair to release quality.

Work plan

Coding challenge

Installation

  • I installed new Ubuntu machine in VirtualBox for Aprtium installation.
  • firespeaker helped me to install the system on my machine.
  • After installing Apertium and lttolbox, I decided to install en-es language pair.
  • Initially I got lots of errors and problems regarding permissions but with the help of firespeaker I succeeded to install the system.

Getting Started

  • Then I got introduced to spectie in IRC channel. He gave me links to documentation on "How to start a new language pair". He also gave me links to study the basic structural and functional elements of Apertium system and it's working.
  • spectie gave me a document(a story) in English to translate it to Marathi. I completed it and sent it to him.
  • spectie created a basic Marathi-English system for me and since I was familiar with the symbols and terminologies(through documentation reading), I understood it quickly. He also added some words in monolingual and bilingual dictionaries and gave me a list of words to add by myself. Initially I felt I was doing it too slow but after adding more words I got the mechanism and I am comfortable in it now.
  • spectie told me how to pull, make changes and commit the changes. I did it successfully.

Community Bonding Period

Week 1

Week 2

Week 3

Week 4

Week 5

Week 6

Week 7

Week 8

Week 9

Week 10

Week 11

Week 12

List your skills and give evidence of your qualifications

My non-Summer-of-Code plans for the Summer