User:Agneet42/proposal
Contents
Contact Info
Name: Agneet Chatterjee
E-mail: agneet257@gmail.com
IRC: agneet42
Location: India
Timezone: UTC+05:30
Why is it you are interested in machine translation?
"Because language plays such a fundamental part in connecting each of us as thinking creatures with the world around us, the subtle nuances of language (which are different even in similar tongues, say the Latin-derived Spanish and Portuguese) actually shape how we think about the world. Learning something of how somebody else speaks from a foreign country actually helps you to understand their mindset a little." I am interested in Machine Translation primarily for two reasons; Firstly, I believe that in this generation of information exchange, one of the biggest challenges is sharing and understanding knowledge in different languages. This is where machine translation comes into picture and interests me for it works for a unified purpose. Secondly, I have deep-rooted interests coupled with experience in the field of Natural Language processing. And I hope to make a difference in the field of machine translation.
Why is it that you are interested in the Apertium project?
Apertium is free/open-source machine translation platform, which means that developers from all over the world can join and work upon new language pair/s to facilitate better translation. Apertium uses Unix “pipelines” which is very useful for quick diagnosis and debugging, enabling me to use additional modules between existing modules, like using the HFST(Helsinki finite-state transducer) for morphological analysis. Furthermore, Apertium uses the novel approach of Rule Based Machine Translation where no bilingual texts are required which makes it possible to create translation systems for languages that have no texts in common, or even no digitized data whatsoever and also RBMT is domain independent which means that rules are usually written in a domain independent manner, so the vast majority of rules will "just work" in every domain, and only a few specific cases per domain may need rules written for them.
Which of the published tasks are you interested in?
Adopting the Hindi<->Bengali language pair.
Why should Google and Apertium sponsor it?
Firstly, Hindi and Bengali are respectively the 4th and 7th most spoken languages in the world with ~295 and ~200 million speakers each. And more so, the speakers of these languages are spread all across the globe. A hindi-bengali translation will not only aid speakers but also facilitate business transactions happening in these bustling business havens.
Currently, there is no single go-to platform for Machine Translation between these two languages, the only one being Google Translate but it has it's own limitations:
- They are not available offline, therefore less accessible.
- They are not open source. Not everybody can contribute.
Apertium makes sure that the above issues do not come in it's path, and that is what makes it a suitable developmental ground for this (or any other) language pair. Furthermore, a hindi-bengali translation will make it easier for translation of similar languages like bengali such as hindi-assamese and hindi-oriya.