Category talk:GSoC 2019 student proposals
Apertium GSOC 2019
Contents
Morphological Analyzer of Braj Language
Contact Information
Name – Neerav Mathur
Location – Agra, Uttar Pradesh, India -282001
E-mail – nmathur54@gmail.com
Mobile no. - +919719009548 Github - [1]
Time Zone - UTC +5.30
Skills And Experience
My name is Neerav Mathur. I am 4th semester post graduate student at K.M.Institute of Hindi and Linguistics, Dr. Bhimrao Ambedkar University Agra, Uttar Pradesh, India. I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate Mohit also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. As part of this, I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform. I am working on Digital Dictionary for Sema language, and I am interested in making of morphological analyzer for Braj Language. And this rise my more interest in NLP, Machine Learning etc.
University Courses:
Programing (Python, Xml, HTML)
Computer Tools for Linguistics Research
Linguistics Courses (Phonetics, Morphology, Syntax, Semantics, Field work, Sign Language, Machine Translation)
Theories of Machine Translation and Machine Translation (practical)
Technical Skills:
Programing Language – Python.
Web Design – HTML .
Databases – MySQL.
Languages – Hindi (Native), Braj, English.
== Why Interest In Machine Translation ? ==
I am studying Linguistics and during my previous semesters I have learnt Python, Machine Translation, XML. As part of my previous semester course project, I trainer and tested MALT Parser for Magahi Language. In order to do it, me and my classmate also developed a small treebank for the language. In the current semester, we are further expanding the treebank and we plan to implement the first full-fledged parser for the language. I am also working on the development of a machine translation system for English-Magahi language pair. As you would notice, I am more generally interested in developing resources and technologies for under-resourced Indian languages. During my fourth semester I attend Hands-on workshop on machine Translation where I get information about MT systems (Apertium) for under resourced languages and how morphological Analyzer help in increasing the performance of MT in rule based system. From there my interest get rised for MT.
Why Interest In Apertium ?
This organization works on things which are very interesting for me as a linguist & computational linguist: (rule-based) machine translation, languages, NLP and so on. I get more interested with Apertium when I get information in MT workshop that Apertium works with all kind of languages which is very important for support for all languages. Also Apertium community is very friendly and very helpful to new members, members here are always ready to help us ( Apertium community reply frequently on what ever query do we have ). It encourages me to work with Apertium.
Task And Plan ?
Task - I am interested in building a morphological analyzer for Braj Language (spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India) using the Apertium platform.
Reason Why Apertium And Google Sponsor It ?
Braj is one of the most under-resourced Indian language. Which is spoken in Braj Region of Agra, Mathura, Alighar, Bharatpur, etc in Uttar Pradesh and Rajasthan, India. It is spoken by 1,556,314 native speaker (according 2011 census) . There is no Braj translator present online or offline and there is no rule-based translator with morphological analyzer. So I believe we can improve the quality of translation by applying rule-based model (Apertium).
Description Of How And Who It Will Benefit In Society ?
Firstly, Morphological Analyzers will help in increasing the performance of Machine translation in rule based system, especially for morphologically rich languages. I want to work on this project to make morphological analyzer for developing English - Braj MT system. Secondly, no such work has been done on MT for Braj Language, so my work will contribute to reduce the human work and improve the translation for Braj Language.
Work Plan
Week 1 - Preparing linguistic rule for Morphological analyzer.
Week 2 - Preparing linguistic rule for Morphological analyzer.
Week 3 - Tokenizing the data.
Week 4 - Prepare tag set.
Deliverable #1
Submit the Tokenized and prepared tagset
Week 5 - Preparing the affix list validate (Prepared Suffix list ) in corpus
Week 6 – Writing the program to develop Braj morphological analyzer.
Week 7 - Writing the program to develop Braj morphological analyzer.
Week 8 - Train and test the model.
Deliverable #2
Submit the program and trained, test model
Week 9 - Test the model with different domain of word.
Week 10 - Fixing the occurring error in model.
Week 11 -Again train and test the model.
Week 12 - Evaluation of results or model.
Project Completed Submission of project