User:Amanmehta/Application

From Apertium
Jump to navigation Jump to search

Contact details


Name: Aman Mehta
E-mail: amanmehta1997@gmail.com
Svn: aman-mehta
IRC-nick: amanmehta
Mobile: +91 8329139961
Timezone: UTC+05:30
Github link: https://github.com/amanmehta-maniac
I stay online on IRC for most of my time so as to be easily accessible

Interest in machine translation I am passionate about computers. Automation of tasks such as translation fascinates me. The core problem that translation of a text from one language to other can’t be solved by simple substitution of words, catches my interest. The idea of building a translation system and automating translation intrigues me. As MT gives people opportunity to access knowledge in multiple languages, it can play a pivotal role in education for all mission, not only in India but also across the globe. It adheres to the idea that knowledge should be free and accessible to all, which even I believe in strongly. Working on and around machine translation, serves my interest as well as my motivation.

Interested published tasks and project goals I plan to “Adopt an unreleased language pair”, or to be precise, three language pairs: mar-hin, guj-hin, mar-guj. Mar-hin and guj-hin pairs are in incubator and mar-guj pair is still unreleased. My goal is to bring incubator pair: mar-hin and an unreleased pair: mar-guj, both to release quality. I also plan on expanding dictionaries for guj-hin pair and make further improvements to coverage and WER to the extent possible.

Interest in Apertium Given my interest in machine translation, I decided to contribute to Apertium and enjoy adding my contribution to Apertium. I developed my interest in Apertium project in last couple of months during which I spent my time on resolving few svn bugs as well as on improving mar-hin pair. It is, at present, one of the best open-source machine translation platforms. Spending my summer to work for this platform would give me an opportunity to add my contribution in an area that fascinates me.

Reasons for Google and Apertium to sponsor The mar-hin and mar-guj pairs can be brought to a production quality without much effort due to lexical similarities. I am very well acquainted with apertium as well as with the language pairs I am proposing to work on. The odds of finding a polyglot who could add these pairs to Apertium in a single summer would probably be low. If successful, this would add a couple of more language pairs to Apertium which would triple the number of Indian language pairs. The release of these pairs could also help Apertium in expanding language pairs for many other Indian languages. It has been ~2 months since I have joined Apertium and I am very much familiar with it. I have fixed quite a few bugs on svn. I have been working around mar-hin pair and I have been successful in adding coverage for adverbs and adjectives by scraping <avy> tags. It has been around a month and hence I have a very good gist on what all is needed to bring this pair to release quality. For detailed information about my tasks completed, refer to the section “Tasks completed till date”.

Who it will benefit in society and how Who? Over 70 million Marathi speakers Over 50 million Gujarati speakers People belonging to non-native lingual state Eg: A gujarati speaker in Maharashtra (like myself) How? Translator available to learn languages Access to Hindi information Hindi media/newspapers Improved coverage of Hindi books to Marathi and Gujarati and vice-versa. Eventually helping people of different native languages to share space and reduce communication gap.