Difference between revisions of "Category talk:GSoC 2019 student proposals"

From Apertium
Jump to navigation Jump to search
(Created page with "Apertium GSOC 2019 Morphological Analyzer of Braj Contact Information Name – Neerav Mathur Location – Agra, Uttar Pradesh, India -282001 E-mail – nmathur54@gmail.com ...")
 
Line 1: Line 1:
Apertium GSOC 2019
+
'''Apertium GSOC 2019
Morphological Analyzer of Braj
+
Morphological Analyzer of Braj'''
   
   
Contact Information
+
'''Contact Information'''
   
 
Name – Neerav Mathur
 
Name – Neerav Mathur
Line 11: Line 11:
 
Github - https://github.com/ommathur54
 
Github - https://github.com/ommathur54
 
Time Zone - UTC +5.30
 
Time Zone - UTC +5.30
  +
'''
 
Skills And Experience
+
Skills And Experience'''
   
 
My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.
 
My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.
   
University Courses:
+
'''University Courses:'''
   
 
Programing (Python, Xml, HTML)
 
Programing (Python, Xml, HTML)
Line 23: Line 23:
 
Theories of Machine Translation and Machine Translation (practical)
 
Theories of Machine Translation and Machine Translation (practical)
   
 
'''Technical Skills:'''
 
Technical Skills:
 
   
 
Programing Language – Python.
 
Programing Language – Python.
Line 32: Line 31:
 
Languages – Hindi (Native), Braj, English.
 
Languages – Hindi (Native), Braj, English.
   
Interest In Machine Translation ?
+
'''Why Interest In Machine Translation ?'''
   
 
I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.
 
I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.
   
   
Interest In Apertium ?
+
'''Why Interest In Apertium ?'''
   
 
This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages.
 
This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages.
 
Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.
 
Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.
   
Task And Plan ?
+
'''Task And Plan ?'''
 
Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.
 
Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.
   
Reason Why Apertium And Google Sponsor Me ?
+
'''Reason Why Apertium And Google Sponsor It ?'''
 
Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).
 
Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).
   
Description Of How And Who It Will Benefit In Society ?
+
'''Description Of How And Who It Will Benefit In Society ?'''
 
Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system.
 
Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system.
 
Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.
 
Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.
   
Work Plan
+
'''Work Plan'''
 
Week 1 - Preparing linguistic rule for Morphological analyzer
 
Week 1 - Preparing linguistic rule for Morphological analyzer
 
Week 2 - Preparing linguistic rule for Morphological analyzer
 
Week 2 - Preparing linguistic rule for Morphological analyzer

Revision as of 18:15, 25 March 2019

Apertium GSOC 2019 Morphological Analyzer of Braj


Contact Information

Name – Neerav Mathur Location – Agra, Uttar Pradesh, India -282001 E-mail – nmathur54@gmail.com Mobile no. - +919719009548 Github - https://github.com/ommathur54 Time Zone - UTC +5.30 Skills And Experience

My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.

University Courses:

Programing (Python, Xml, HTML) Computer Tools for Linguistics Research Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation) Theories of Machine Translation and Machine Translation (practical)

Technical Skills:

Programing Language – Python. Web Design – HTML . Databases – MySQL. Project and Experience : Languages – Hindi (Native), Braj, English.

Why Interest In Machine Translation ?

I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.


Why Interest In Apertium ?

This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages. Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.

Task And Plan ? Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.

Reason Why Apertium And Google Sponsor It ? Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).

Description Of How And Who It Will Benefit In Society ? Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system. Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.

Work Plan Week 1 - Preparing linguistic rule for Morphological analyzer Week 2 - Preparing linguistic rule for Morphological analyzer Week 3 - Tokenizing the data Week 4 - Prepare tag set. Deliverable #1 Submit the Tokenized and prepared tagset

Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus Week 6 – Writing the program to develop Magahi morphological analyzer. Week 7 - Writing the program to develop Magahi morphological analyzer


Week 8 - Train and test the model Deliverable #2 Submit the program and trained, test model

Week 9 - Test the model with different domain of word. Week 10 - Fixing the occurring error in model. Week 11 -Again train and test the model Week 12 - Evaluation of results or model

Project Completed Submission of project