User:Rupjyoti/Proposal

From Apertium
Revision as of 16:58, 7 April 2019 by Rupjyoti (talk | contribs) (Improving the Assamese-Hindi language pair)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigation Jump to search

Contact Information

Name: Rupjyoti Baruah
E-mail address: rupjyotibaruah.rs.cse18@itbhu.ac.in
Alternate email: rupjyotirgu06@gmail.com
Mobile Number: +91 9101151044
IRC nick: Rupjyoti
Linkedin: www.linkedin.com/in/rupjyoti-baruah/
Timezone: UTC +5.30


Why is it that you are interested in Apertium?

Apertium community is strongly committed to under-resourced languages. I have an urge to improve open source language translation with Apertium. Apertium has a very noble goal, which is bringing languages with low resource language data to life by linking them with machine translation of high resource languages.


Which of the published tasks are you interested in? What do you plan to do?

I am interested for improving the published assamese-hindi language pair. If we improve the morphological disambiguation, add several thousands of words in the dictionaries, introduce lexical selection rules and create some more transfer rules, a low (Word Error Rate)WER can be reached.


Why Google and Apertium should sponsor it?

Apertium is a shallow-transfer machine translation.I have seen the pair in https://github.com/apertium/apertium-as-hi is still in the incubator list,which was updated 7-8 years ago.The Google and Apertium sponsorship would motivate me to work on my GSoC project and make the pair with a good improvement.

Benefit to society: Assamese language (Asamiya)is spoken mainly in the Indian state of Assam, where it is an official language and spoken by 15 million speakers and serves as a lingua franka in the region. On the other hand,hindi is one of the official languages of India. According to the Census ,Hindi speakers in the Assam have increased over day by day in Assam.Hindi speakers top the list with the highest percentage of 43.63. As a linguistic variety,hindi is the fourth most-spoken first language in the world.So ,those hindi speaker speaks assamese language as a second language in Assam,India.


Status of the released language pair:

monolingual dictionaries: apertium-as-hi.hi.dix and apertium-as-hi.as.dix bilingual dictionaries: apertium-as-hi.as-hi.dix transfer files: apertium-as-hi.as-hi.t1x I have find that apertium-as-hi.hi-as.t1x (hindi to assamese trasfer rules )are absent here.


=== Work plan ===(can be updated during progress)

  • Week 1: Understanding the deficiency in the currently available monolingual and bilingual dictionaries and transfer rules.
  • Week 2: Continue
  • Week 3: Expanding the currently available Assamese monolingual dictionary. (if there is a scope) with
                            lt-expand  apertium-as-hi.hi.dix
                            lt-expand apertium-as-hi.as.dix
  • Week 4: Expanding the currently available Hindi monolingual dictionary.(if there is a scope) Make a list of correctly spelled words.
  • Deliverable #1 submit modified Assamese and Hindi monolingual dictionaries
  • Week 5: Expanding currently available bilingual dictionary.
  • Week 6: Expanding currently available bilingual dictionary. (reformatting as proposed by mentors)
  • Week 7: Augmenting multi words to Assamese , Hindi monolingual and bilingual dictionaries.
  • Week 8: Correcting the current transfer rules.
  • Deliverable #2 Submit POS tagged Hindi to Assamese wordlist and final monolingual dictionaries.
  • Week 9: Correcting the current transfer rules.
  • Week 10: Augmenting the correct and efficient transfer files.
  • Week 11: Detecting the Error and correct them.
  • Week 12: Detecting the Error and correct them.
  • Project completed

List your skills and give evidence of your qualifications. Tell us what is your current field of study,major, etc. Convince us that you can do the work:
I am first year PhD student in the Department of computer science and Engineering at IIT BHU,Uttar Pradesh ,India.My research area is Natural Language Processing(NLP).I have completed AMIETE(CSE) in the year 2005 and Mtech(CSE) in the year 2008. Along with the course work of PhD ,I have studied the Apertium engine ,which makes me more interested towards the linguistic features of a language and its computation.Assamese language in my mother tongue.I also know Bengali(rws),Hindi(rws) and English(rws) quite well. As soon as complete my semester examination in the month of April 2019,I’m committed to put it at least 40+ hours a week for the duration of the project. I am proficient in python,xml,c++ ,java and prolog. This is my first open source software proposal,I will definitely complete the task under the guidance of mentors.