Difference between revisions of "Category:GSoC 2019 student proposals"

From Apertium
Jump to navigation Jump to search
Line 1: Line 1:
 
[[Category:Morphological Analyzer of Magahi]]
 
[[Category:Morphological Analyzer of Magahi]]
 
Apertium GSOC 2019
 
Apertium GSOC 2019
Morphological Analyzer of Magahi
+
Morphological Analyzer of Braj
  +
  +
 
Contact Information
 
Contact Information
Name: Mohit Raj
 
E-mail address: mohiitraj@gmail.com
 
Mobile Number : +91 9304843938(India)
 
IRC Nick: mohitraj
 
Github: Mohit-Raj123
 
Timezone: UTC +5.30
 
   
  +
Name – Neerav Mathur
Why is it that you are interested in Apertium and Machine Translation ?
 
  +
Location – Agra, Uttar Pradesh, India -282001
I belong to India where 22 shedule language and and almost 100 non-scheduled languages. It is obtained by subsuming several distinct languages under ‘dialects’ of some of the majority languages; languages with less than 10000 speakers are not even recognised and are put under a category called ‘others’. I like the concept of Apertium as an open source language translator that is really a nice thing to the world that would be definitely helpful for students, organinzation and any other body who wants to do work in the field of Machine Translation. My interest in Apertium because it is not only machine translation, but also free resources that can be used for other purposes e.g. dictionary, morphological analyser and spell checkers etc. I am student of Linguistics and My area of interest is Machine Translation and Natural Language Processing. Previously i have completed courses on XML, Python programming, Language Technologies and Machine Translation. I have worked towards the development of parser for Magahi, in collaboration with my classmate Neerav Mathur, for course projects.I took participation in following workshop :-
 
 
E-mail nmathur54@gmail.com
  +
Mobile no. - +919719009548
  +
Github - https://github.com/ommathur54
 
Time Zone - UTC +5.30
   
  +
Skills And Experience
1. 9th IASNLP-2018: IIIT-Hyderabad Advanced School on Natural Language Processing
 
2. SOIL-Tech: Towards Digital India at JNU, New Delhi
 
3. Hands on workshop on Statistical Machine Translation with Moses at K.M.I, Agra
 
   
  +
My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.
I wants to work on marginalised language because it will be definitely helpful to develop tools and technology for marginalised language that would be quite helpful to preserve language heritage of India and the world.
 
   
  +
University Courses:
My Proposal
 
  +
Title
 
  +
Programing (Python, Xml, HTML)
Morphological analyzer of Magahi
 
  +
Computer Tools for Linguistics Research
Why Google and Apertium should sponset it ?
 
  +
Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation)
Many language pairs are available on Apertium but many have to be develop. One of those is Eng-Magahi pair for Machine translation that should be develop. In the machine translation Morphological Analyzer plays an important role in improving the system’s performance for morphologically rich language like Magahi. So I am interested in developing morph analyzer of Magahi.
 
  +
Theories of Machine Translation and Machine Translation (practical)
How and who it will benefit in society ?
 
  +
There are almost 12.7 million native speaker of Magahi accordingto census of 2011 and aditional speakers counted under Hindi. Eng-Magahi MT would be quite helpful for Magahi users and it would also play an important role in preserving the marginalised language.
 
  +
Work Plan
 
  +
Technical Skills:
Week1
 
  +
Preparing linguistic rule for Morphological analyzer
 
  +
Programing Language – Python.
Week2
 
  +
Web Design – HTML .
Continue....
 
  +
Databases – MySQL.
Week3
 
  +
Project and Experience :
Tokenizing the data
 
  +
Languages – Hindi (Native), Braj, English.
Week4
 
  +
Preparing the tagset
 
  +
Interest In Machine Translation ?
  +
  +
I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.
  +
  +
  +
Interest In Apertium ?
  +
  +
This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages.
  +
Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.
  +
  +
Task And Plan ?
  +
Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.
  +
  +
Reason Why Apertium And Google Sponsor Me ?
  +
Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).
  +
  +
Description Of How And Who It Will Benefit In Society ?
  +
Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system.
  +
Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.
  +
 
Work Plan
 
Week 1 - Preparing linguistic rule for Morphological analyzer
  +
Week 2 - Preparing linguistic rule for Morphological analyzer
 
Week 3 - Tokenizing the data
  +
Week 4 - Prepare tag set.
 
Deliverable #1
 
Deliverable #1
 
Submit the Tokenized and prepared tagset
 
Submit the Tokenized and prepared tagset
  +
Week5
 
Preparing the affix list
+
Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus
  +
Week 6 – Writing the program to develop Magahi morphological analyzer.
Week6
 
Writing the programm to develop Magahi morphological analyzer
+
Week 7 - Writing the program to develop Magahi morphological analyzer
  +
Week7
 
  +
Continueing programming
 
 
Week 8 - Train and test the model
Week8
 
Train and test the model
 
 
Deliverable #2
 
Deliverable #2
Submit the programm and trained, test model
+
Submit the program and trained, test model
  +
Week9
 
Test the model with different domain of word
+
Week 9 - Test the model with different domain of word.
  +
Week 10 - Fixing the occurring error in model.
Week10
 
Fixing the occuring error in model
+
Week 11 -Again train and test the model
 
Week 12 - Evaluation of results or model
Week11
 
  +
Again train and test the model
 
Week12
 
Evaluation of results or model
 
 
Project Completed
 
Project Completed
 
Submission of project
 
Submission of project

Revision as of 18:04, 25 March 2019

Apertium GSOC 2019 Morphological Analyzer of Braj


Contact Information

Name – Neerav Mathur Location – Agra, Uttar Pradesh, India -282001 E-mail – nmathur54@gmail.com Mobile no. - +919719009548 Github - https://github.com/ommathur54 Time Zone - UTC +5.30

Skills And Experience

My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.

University Courses:

Programing (Python, Xml, HTML) Computer Tools for Linguistics Research Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation) Theories of Machine Translation and Machine Translation (practical)


Technical Skills:

Programing Language – Python. Web Design – HTML . Databases – MySQL. Project and Experience : Languages – Hindi (Native), Braj, English.

Interest In Machine Translation ?

I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.


Interest In Apertium ?

This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages. Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.

Task And Plan ? Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.

Reason Why Apertium And Google Sponsor Me ? Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).

Description Of How And Who It Will Benefit In Society ? Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system. Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.

Work Plan Week 1 - Preparing linguistic rule for Morphological analyzer Week 2 - Preparing linguistic rule for Morphological analyzer Week 3 - Tokenizing the data Week 4 - Prepare tag set. Deliverable #1 Submit the Tokenized and prepared tagset

Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus Week 6 – Writing the program to develop Magahi morphological analyzer. Week 7 - Writing the program to develop Magahi morphological analyzer


Week 8 - Train and test the model Deliverable #2 Submit the program and trained, test model

Week 9 - Test the model with different domain of word. Week 10 - Fixing the occurring error in model. Week 11 -Again train and test the model Week 12 - Evaluation of results or model

Project Completed Submission of project