User:Sambit/GSoC proposal 2017: Odia and English

From Apertium
< User:Sambit
Revision as of 17:28, 1 April 2017 by Sambit (talk | contribs) (Created page with "== Name == '''Sambit Mallick''' == Contact information == '''IRC nick :''' sambit '''E-mail :''' sambit95@gmail.com / sambit.mallick@iitg.ac.in '''SourceForge :''' sambit...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Name

Sambit Mallick

Contact information

IRC nick : sambit

E-mail : sambit95@gmail.com / sambit.mallick@iitg.ac.in

SourceForge : sambit95

Location : India

Time Zone : UTC/GMT +5:30

Why am I interested in machine translation?

I've always wondered how Google translator works. After finding about Apertium on GSoC, I've read many well written articles about working process on wiki. Since then I got to know about different uses of machine translation, types of rule-based MT and how they work. This eventually developed my interest in MT.

Why is it that you are interested in Apertium?

As I said earlier how I developed my interest in MT, there couldn't be any better platform than Apertium to work on. It's open source and a shallow-transfer type machine translation system. Apart from these mentors are very friendly and supportive in nature.

Which of the published tasks am I interested in? What do I plan to do?

I'm interested in adopting an unreleased language pair(Odia-English). As there is no Odia-English pair in Apertium, I have to work on monodix first.

Reasons why Google and Apertium should sponsor it?

There are is no reliable Odia-English translator available on Internet.

How and who will it benefit in society?

Odia is the predominant language of the Indian state of Odisha and one of many official languages of India. It the sixth Indian language to be designated a Classical Language in India on the basis of having a long literary history and not having borrowed extensively from other languages. Out of 40 million native speakers, many doesn't know English. As there are is no reliable translation available even in Google translator, it gets difficult for those people who don't understand or struggle to learn English. As English is the International language, it's widely used everywhere i.e Social Media, Internet. Through Apertium these problems an be solved using machine translation. This'll benefit a large community and for non-native speakers, they will get to know about Odia language literature.

Work Plan

Coding challenge

  • I've already installed the prerequisites for Ubuntu.
  • Bootstrapped new language pair(odi.eng) with existing eng monodix.
  • Added some words in odi monodix to work on story.
  • Currently working on transfer rules and reading the pdfs to get more familiar with Apertium.
  • Getting some errors but trying resolve these by asking mentors on IRC.

Thanks to Unhammer, TinoDidriksen and spectie, I've reached this far!

Post Application

I'll try to work on story. But can't give much time because of end semester exams on April end.

Community Bonding Period

  • Get to know the mentors and discuss the plans properly.
  • To know more about the process how to implement very large scale monodix easily and effectively.

Week Plan

Week Task Comment
1 Implementation of Odia monodix. As it's a new language pair so, it'll take mcuh time to improve Odia monodix.
2 Continue to work on monodix.
3 Continue to work on monodix.
4 Continue to work on monodix and start Working on a Odia-English bilingual dictionary.
Deliverable #1 A monolingual dictionary containing at least 3000 words." As it's difficult to add proper words in Odia monodix as I've faced while working on before application, though I'll try my best to add more.
5 Continue adding more words to monolingual dictionary. Continue adding words to bilingual dictionary. NA
6 Continue adding more words to the bilingual dictionary. NA
7 Implementation of disambiguation rules for Odia. NA
8 Implementation of transfer rules for Odia->English. NA
Deliverable #2 A monolingual dictionary containing at least 7000 words and more than 10000 works containing bilingual dictionary. NA
9 Complete the disambiguation, transfer rules implementation and design of constraint grammar. NA
10 testvoc NA
11 testvoc NA
12 Wrap-up testvoc, cleaning up, result evaluation and completion of documentation. NA
Deliverable #3 Completion of the project. NA

Skills & Qualifications

Currently, I'm a pre-final year student of Electronics and Communicating Engineering at IIT Guwahati. Though I'm new to MT, I do coding and have interest in Machine Learning. I am comfortable with C and Python. Besides, for this project I'm familiar with Linux, XML, Odia(my mother-tongue) and English. I've not any previous experience with open source, although I've spent quite a bit of time to understand about Apertium.

My non-Summer-of-Code plans for the Summer

I've no other plans beside preparing for placement other than GSoC. Hence I'll able to give 30+ hours a week to develop for the project.