User:Surajkawade/GSOC proposal: Marathi and English
Contents
- 1 Name
- 2 Contact information
- 3 Why are you interested in machine translation?
- 4 Why are you interested in the Apertium project?
- 5 Why Google and Apertium should sponsor it?
- 6 How and who it will benefit in society?
- 7 Which of the published tasks are you interested in? What do you plan to do?
- 8 Work plan
- 9 List your skills and give evidence of your qualifications
- 10 My non-Summer-of-Code plans for the Summer
Name
Suraj Kawade
Contact information
IRC nick : develover
E-mail : suraj.kawade@gmail.com / suraj.kawade@hotmail.com
Phone no : +918983005859 / +919404943130
skype username : yesiamsuraj
Why are you interested in machine translation?
As I am interested in linguistics and I love programming, machine translation is magnet for me! World is culturally diverse and languages are barrier cum ways to these cultures. I have read that (on Wikipedia) "There are between 6000 and 7000 languages currently spoken, and that between 50-90% of those will have become extinct by the year 2100". I was shocked but I don't want to feel helpless. Though humans speak in different tongues, they express the same thing! Then why shouldn't I gather my curiosity to know more how these languages are related and how they differ in something? And to help society not to blank out the gift their ancestors gave them? Everything is going digital and fast and so is the field of NLP, and MT is helping a large part in it and I want to be a (small though) part of it.
Why are you interested in the Apertium project?
The best things in the world are free (as in 'freedom')! Open Source is free and Apertium is Open Source. So by the law of commutativity Apertium is best thing. If I say I do not want languages dying in front of my eyes, I should help avoiding it and thus I found Apertium. I think Apertium is community of knowledgeable, inspiring people who are really enthusiastic on a common cause and most importantly, they love what they do and the other way around.(I figured this out while talking to them in the IRC channel.) And most importantly to "do" something for preserving a language, with Apertium, you really need less resources at the beginning, which is really helpful, less hectic and hence encouraging. Apertium uses rule-based translation methods and not the dictionary based, which makes it work with the meanings of words and not just the words, hence more close to humans.
Why Google and Apertium should sponsor it?
On knowing there is nothing done of release quality in Apertium regarding Marathi, I decided will work on it. Marathi is written in Devanagari script and Apertium is yet to release pair containing a Devanagari script language(most of them are in incubator). Doing extensive work and bringing Marathi-English pair to release quality will also encourage adaptation of those Devanagari languages in incubator.
How and who it will benefit in society?
Marathi is 19th most spoken language in the world and an official language of state of Maharashtra. Though Marathi has rich literature and glorious history, there is no reliable and quality translation solutions available as of now. Even Google Translate do not provide Marathi translation services. It is observed that Marathi speaking students struggle more in learning English as compared to that of other Indian students, who, at some extent, have digital tools available to them. Due to the tablet and smartphone explosion and easy availability of Internet in India, people are using lot social-networking sites, they have stated blogging and are reading news online. In such days, not having good Marathi-English translation tool feels inconvenient. Creating a tool using machine translation with Apertium will not only server the need but also benefit large community.
Which of the published tasks are you interested in? What do you plan to do?
As there is no Marathi-English pair in Apertium so far, I am starting to work on it from scratch. There is Marathi-Hindi bilingual dictionary in incubator but I don't know how much it is completed. I will try if it helps me in my project. My interest and enthusiasm says that I am going to try to bring Marathi-English pair to release quality.
Work plan
Coding challenge
Installation
- I installed new Ubuntu machine in VirtualBox for Aprtium installation.
- firespeaker helped me to install the system on my machine.
- After installing Apertium and lttolbox, I decided to install en-es language pair.
- Initially I got lots of errors and problems regarding permissions but with the help of firespeaker I succeeded to install the system.
Getting Started
- Then I got introduced to spectie in IRC channel. He gave me links to documentation on "How to start a new language pair". He also gave me links to study the basic structural and functional elements of Apertium system and it's working.
- spectie gave me a document(a story) in English to translate it to Marathi. I completed it and sent it to him.
- spectie created a basic Marathi-English system for me and since I was familiar with the symbols and terminologies(through documentation reading), I understood it quickly. He also added some words in monolingual and bilingual dictionaries and gave me a list of words to add by myself. Initially I felt I was doing it too slow but after adding more words I got the mechanism and I am comfortable in it now.
- spectie told me how to pull, make changes and commit the changes. I did it successfully.