Difference between revisions of "Category talk:GSoC 2019 student proposals"

From Apertium
Jump to navigation Jump to search
(Created page with "Apertium GSOC 2019 Morphological Analyzer of Braj Contact Information Name – Neerav Mathur Location – Agra, Uttar Pradesh, India -282001 E-mail – nmathur54@gmail.com ...")
 
 
(9 intermediate revisions by the same user not shown)
Line 1: Line 1:
Apertium GSOC 2019
'''Apertium GSOC 2019'''
Morphological Analyzer of Braj


== Morphological Analyzer of Braj Language ==


Contact Information


Name – Neerav Mathur
Location – Agra, Uttar Pradesh, India -282001
E-mail – nmathur54@gmail.com
Mobile no. - +919719009548
Github - https://github.com/ommathur54
Time Zone - UTC +5.30


Skills And Experience


== '''Contact Information''' ==


'''Name''' – Neerav Mathur

'''Location''' – Agra, Uttar Pradesh, India -282001

'''E-mail''' – nmathur54@gmail.com

'''Mobile no.''' - +919719009548
'''
Github''' - [https://github.com/ommathur54]

'''Time Zone''' - UTC +5.30




== '''Skills And Experience''' ==
My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.
My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.



University Courses:
== '''University Courses:''' ==



Programing (Python, Xml, HTML)
Programing (Python, Xml, HTML)

Computer Tools for Linguistics Research
Computer Tools for Linguistics Research

Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation)
Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation)

Theories of Machine Translation and Machine Translation (practical)
Theories of Machine Translation and Machine Translation (practical)




Technical Skills:
== '''Technical Skills:''' ==



Programing Language – Python.
Programing Language – Python.

Web Design – HTML .
Web Design – HTML .

Databases – MySQL.
Databases – MySQL.

Project and Experience :
Languages – Hindi (Native), Braj, English.
Languages – Hindi (Native), Braj, English.


Interest In Machine Translation ?



== '''Why Interest In Machine Translation ?''' ==
I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.
I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.




Interest In Apertium ?

== '''Why Interest In Apertium ?''' ==



This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages.
This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages.
Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.
Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.

== '''Task And Plan ?''' ==


Task And Plan ?
Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.
Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.



Reason Why Apertium And Google Sponsor Me ?
== '''Reason Why Apertium And Google Sponsor It ?''' ==
Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).
Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).


Description Of How And Who It Will Benefit In Society ?
== '''Description Of How And Who It Will Benefit In Society ?''' ==
Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system.
Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system.
Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.
Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.


Work Plan
== '''Work Plan''' ==
Week 1 - Preparing linguistic rule for Morphological analyzer
'''Week 1''' - Preparing linguistic rule for Morphological analyzer.

Week 2 - Preparing linguistic rule for Morphological analyzer
'''Week 2''' - Preparing linguistic rule for Morphological analyzer.
Week 3 - Tokenizing the data

Week 4 - Prepare tag set.
'''Week 3''' - Tokenizing the data.
Deliverable #1

'''Week 4''' - Prepare tag set.

'''Deliverable #1'''

Submit the Tokenized and prepared tagset
Submit the Tokenized and prepared tagset


Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus
'''Week 5''' - Preparing the affix list validate (Prepared Suffix list ) in corpus
Week 6 Writing the program to develop Magahi morphological analyzer.
Week 7 - Writing the program to develop Magahi morphological analyzer
'''Week 6''' Writing the program to develop Braj morphological analyzer.


'''Week 7''' - Writing the program to develop Braj morphological analyzer.

'''Week 8''' - Train and test the model.


'''Deliverable #2'''


Week 8 - Train and test the model
Deliverable #2
Submit the program and trained, test model
Submit the program and trained, test model


Week 9 - Test the model with different domain of word.
Week 10 - Fixing the occurring error in model.
Week 11 -Again train and test the model
Week 12 - Evaluation of results or model


'''Week 9''' - Test the model with different domain of word.
Project Completed

'''Week 10''' - Fixing the occurring error in model.

'''Week 11''' -Again train and test the model.

'''Week 12''' - Evaluation of results or model.


'''Project Completed'''
Submission of project
Submission of project

Latest revision as of 18:37, 25 March 2019

Apertium GSOC 2019

Morphological Analyzer of Braj Language[edit]

Contact Information[edit]

Name – Neerav Mathur

Location – Agra, Uttar Pradesh, India -282001

E-mail – nmathur54@gmail.com

Mobile no. - +919719009548 Github - [1]

Time Zone - UTC +5.30



Skills And Experience[edit]

My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.


University Courses:[edit]

Programing (Python, Xml, HTML)

Computer Tools for Linguistics Research

Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation)

Theories of Machine Translation and Machine Translation (practical)


Technical Skills:[edit]

Programing Language – Python.

Web Design – HTML .

Databases – MySQL.

Languages – Hindi (Native), Braj, English.


== Why Interest In Machine Translation ? ==

I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.



Why Interest In Apertium ?[edit]

This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages. Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.

Task And Plan ?[edit]

Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.


Reason Why Apertium And Google Sponsor It ?[edit]

Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).


Description Of How And Who It Will Benefit In Society ?[edit]

Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system. Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.

Work Plan[edit]

Week 1 - Preparing linguistic rule for Morphological analyzer.

Week 2 - Preparing linguistic rule for Morphological analyzer.

Week 3 - Tokenizing the data.

Week 4 - Prepare tag set.

Deliverable #1

Submit the Tokenized and prepared tagset

Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus

Week 6 – Writing the program to develop Braj morphological analyzer.

Week 7 - Writing the program to develop Braj morphological analyzer.

Week 8 - Train and test the model.


Deliverable #2

Submit the program and trained, test model


Week 9 - Test the model with different domain of word.

Week 10 - Fixing the occurring error in model.

Week 11 -Again train and test the model.

Week 12 - Evaluation of results or model.


Project Completed Submission of project