Difference between revisions of "User:Agneet42/proposal"

From Apertium
Jump to navigation Jump to search
Line 21: Line 21:


Apertium is free/open-source machine translation platform, which means that developers from all over the world can join and work upon new language pair/s to facilitate better translation. Apertium uses Unix “pipelines” which is very useful for quick diagnosis and debugging, enabling me to use additional modules between existing modules, like using the HFST(Helsinki finite-state transducer) for morphological analysis. Furthermore, Apertium uses the novel approach of Rule Based Machine Translation where no bilingual texts are required which makes it possible to create translation systems for languages that have no texts in common, or even no digitized data whatsoever and also RBMT is domain independent which means that rules are usually written in a domain independent manner, so the vast majority of rules will "just work" in every domain, and only a few specific cases per domain may need rules written for them.
Apertium is free/open-source machine translation platform, which means that developers from all over the world can join and work upon new language pair/s to facilitate better translation. Apertium uses Unix “pipelines” which is very useful for quick diagnosis and debugging, enabling me to use additional modules between existing modules, like using the HFST(Helsinki finite-state transducer) for morphological analysis. Furthermore, Apertium uses the novel approach of Rule Based Machine Translation where no bilingual texts are required which makes it possible to create translation systems for languages that have no texts in common, or even no digitized data whatsoever and also RBMT is domain independent which means that rules are usually written in a domain independent manner, so the vast majority of rules will "just work" in every domain, and only a few specific cases per domain may need rules written for them.

=Which of the published tasks are you interested in?=

Adopting the Hindi<->Bengali language pair.

==Why should Google and Apertium sponsor it?==

Firstly, Hindi and Bengali are respectively the 4th and 7th most spoken languages in the world with ~295 and ~200 million speakers each.

Revision as of 12:36, 2 April 2017


Contact Info

Name: Agneet Chatterjee

E-mail: agneet257@gmail.com

IRC: agneet42

Location: India

Timezone: UTC+05:30

Why is it you are interested in machine translation?

"Because language plays such a fundamental part in connecting each of us as thinking creatures with the world around us, the subtle nuances of language (which are different even in similar tongues, say the Latin-derived Spanish and Portuguese) actually shape how we think about the world. Learning something of how somebody else speaks from a foreign country actually helps you to understand their mindset a little." I am interested in Machine Translation primarily for two reasons; Firstly, I believe that in this generation of information exchange, one of the biggest challenges is sharing and understanding knowledge in different languages. This is where machine translation comes into picture and interests me for it works for a unified purpose. Secondly, I have deep-rooted interests coupled with experience in the field of Natural Language processing. And I hope to make a difference in the field of machine translation.

Why is it that you are interested in the Apertium project?

Apertium is free/open-source machine translation platform, which means that developers from all over the world can join and work upon new language pair/s to facilitate better translation. Apertium uses Unix “pipelines” which is very useful for quick diagnosis and debugging, enabling me to use additional modules between existing modules, like using the HFST(Helsinki finite-state transducer) for morphological analysis. Furthermore, Apertium uses the novel approach of Rule Based Machine Translation where no bilingual texts are required which makes it possible to create translation systems for languages that have no texts in common, or even no digitized data whatsoever and also RBMT is domain independent which means that rules are usually written in a domain independent manner, so the vast majority of rules will "just work" in every domain, and only a few specific cases per domain may need rules written for them.

Which of the published tasks are you interested in?

Adopting the Hindi<->Bengali language pair.

Why should Google and Apertium sponsor it?

Firstly, Hindi and Bengali are respectively the 4th and 7th most spoken languages in the world with ~295 and ~200 million speakers each.