User:Rupjyoti/Proposal2

From Apertium
Revision as of 11:39, 8 April 2019 by Rupjyoti (talk | contribs) (Created page with " '''Contact Information'''<br /> Name: Rupjyoti Baruah<br /> E-mail address: rupjyotibaruah.rs.cse18@itbhu.ac.in<br /> Alternate email: rupjyotirgu06@gmail.com<br /> Mobile Nu...")
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Contact Information
Name: Rupjyoti Baruah
E-mail address: rupjyotibaruah.rs.cse18@itbhu.ac.in
Alternate email: rupjyotirgu06@gmail.com
Mobile Number: +91 9101151044
IRC nick: rupjyoti
Linkedin: https://www.linkedin.com/in/rupjyoti-baruah/
Timezone: UTC +5.30


Why is it that you are interested in Apertium? Apertium is free/open-source software shallow-transfer Machine translation System.Apertium community is strongly committed to under-resourced languages. I have an urge to improve open source language translation with Apertium. Apertium has a very noble goal, which is bringing languages with low resource language data to life by linking them with machine translation of high resource languages.


Which of the published tasks are you interested in? What do you plan to do? I am interested for improving the published bengali-hindi language pair. If we improve the morphological disambiguation, add several thousands of words in the dictionaries, introduce lexical selection rules and create some more transfer rules, a low (Word Error Rate)WER can be reached.


Why Google and Apertium should sponsor it?

Apertium is a shallow-transfer machine translation.I have seen the pair in https://github.com/apertium/apertium-bn-hi is still in the incubator list,which was updated 9 years ago.The Google and Apertium sponsorship would motivate me to work on my GSoC project and make the pair with a good improvement.

Benefit to society: Bengali also known by its endonym Bangla (বাংলা), is an Indo-Aryan language primarily spoken by the Bengalis in South Asia.It is the official and most widely spoken language of Bangaladesh and second most widely spoken of the 22 scheduled langauges of india, behind Hindi.

On the other hand,hindi is one of the official languages of India. Hindi, written in the Devanagari script, is one of the official langauge of India, along with the English langauge. It is one of the 22 scheduled languages of the Republic of India.

Status of the released language pair:

monolingual dictionaries: Absent
bilingual dictionaries: apertium-as-hi.bn-hi.dix
transfer files: Absent


=== Work plan ===(can be updated during progress)

  • Week 1: Understanding the deficiency in the currently available bilingual dictionaries
  • Week 2: Continue
  • Week 3: Collecting data for Bengali monolingual dictionary.
  • Week 4: Collecting data for Hindi monolingual dictionary.
  • Deliverable #1 submit Bengali and Hindi monolingual dictionaries
  • Week 5: Expanding currently available bilingual dictionary.
  • Week 6: Expanding currently available bilingual dictionary.(reformatting as proposed by mentors)
  • Week 7: Augmenting multi words to Bengali, Hindi monolingual and bilingual dictionaries.
  • Week 8: Collecting the transfer rules.
  • Deliverable #2 Submit POS tagged Hindi to Bengali wordlist and Bengali to Hindi and final monolingual dictionaries.
  • Week 9: Upload the transfer rules.
  • Week 10: Augmenting the correct and efficient transfer files.
  • Week 11: Detecting the Error and correct them.
  • Week 12: Detecting the Error and correct them.
  • Project completed

List your skills and give evidence of your qualifications. Tell us what is your current field of study,major, etc. Convince us that you can do the work:
I am a first year PhD student in the Department of Computer Science and Engineering at IIT BHU,Uttar Pradesh ,India.My research area is Natural Language Processing(NLP).I have completed AMIETE(CSE) in the year 2005 and Mtech(CSE) in the year 2008. Along with the course work of PhD ,I have studied the Apertium engine ,which makes me more interested towards the linguistic features of a language and its computation.Assamese language in my mother tongue.I also know Bengali(rws),Hindi(rws) and English(rws) quite well. As soon as complete my semester examination in the month of April 2019,I’m committed to put it at least 40+ hours a week for the duration of the project. I am proficient in python,xml,c++ ,java and prolog. It is my first work,I definitely complete the task under the guidance of mentors.