Difference between revisions of "User:Mohitraj"
(Created page with " Apertium GSOC 2019 Morphological Analyzer of Magahi Contact Information Name: Mohit Raj E-mail address: mohiitraj@gmail.com Mobile Number : +91 9304843938(India) IRC Nic...") |
|||
(4 intermediate revisions by one other user not shown) | |||
Line 1: | Line 1: | ||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
Name: Mohit Raj |
Name: Mohit Raj |
||
Line 18: | Line 24: | ||
⚫ | |||
⚫ | |||
I belong to India where 22 shedule language and and almost 100 non-scheduled languages. It is obtained by subsuming several distinct languages under ‘dialects’ of some of the majority languages; languages with less than 10000 speakers are not even recognised and are put under a category called ‘others’. I like the concept of Apertium as an open source language translator that is really a nice thing to the world that would be definitely helpful for students, organinzation and any other body who wants to do work in the field of Machine Translation. My interest in Apertium because it is not only machine translation, but also free resources that can be used for other purposes e.g. dictionary, morphological analyser and spell checkers etc. I am student of Linguistics and My area of interest is Machine Translation and Natural Language Processing. Previously i have completed courses on XML, Python programming, Language Technologies and Machine Translation. I have worked towards the development of parser for Magahi, in collaboration with my classmate Neerav Mathur, for course projects.I took participation in following workshop :- |
I belong to India where 22 shedule language and and almost 100 non-scheduled languages. It is obtained by subsuming several distinct languages under ‘dialects’ of some of the majority languages; languages with less than 10000 speakers are not even recognised and are put under a category called ‘others’. I like the concept of Apertium as an open source language translator that is really a nice thing to the world that would be definitely helpful for students, organinzation and any other body who wants to do work in the field of Machine Translation. My interest in Apertium because it is not only machine translation, but also free resources that can be used for other purposes e.g. dictionary, morphological analyser and spell checkers etc. I am student of Linguistics and My area of interest is Machine Translation and Natural Language Processing. Previously i have completed courses on XML, Python programming, Language Technologies and Machine Translation. I have worked towards the development of parser for Magahi, in collaboration with my classmate Neerav Mathur, for course projects.I took participation in following workshop :- |
||
Line 35: | Line 41: | ||
My Proposal |
== '''My Proposal''' == |
||
⚫ | |||
⚫ | |||
Morphological analyzer of Magahi |
Morphological analyzer of Magahi |
||
⚫ | |||
⚫ | |||
Many language pairs are available on Apertium but many have to be develop. One of those is Eng-Magahi pair for Machine translation that should be develop. In the machine translation Morphological Analyzer plays an important role in improving the system’s performance for morphologically rich language like Magahi. So I am interested in developing morph analyzer of Magahi. |
Many language pairs are available on Apertium but many have to be develop. One of those is Eng-Magahi pair for Machine translation that should be develop. In the machine translation Morphological Analyzer plays an important role in improving the system’s performance for morphologically rich language like Magahi. So I am interested in developing morph analyzer of Magahi. |
||
Line 45: | Line 56: | ||
There are almost 12.7 million native speaker of Magahi accordingto census of 2011 and aditional speakers counted under Hindi. Eng-Magahi MT would be quite helpful for Magahi users and it would also play an important role in preserving the marginalised language. |
There are almost 12.7 million native speaker of Magahi accordingto census of 2011 and aditional speakers counted under Hindi. Eng-Magahi MT would be quite helpful for Magahi users and it would also play an important role in preserving the marginalised language. |
||
Work Plan |
== '''Work Plan''' == |
||
Week1 |
|||
== '''Week1''' == |
|||
Preparing linguistic rule for Morphological analyzer |
Preparing linguistic rule for Morphological analyzer |
||
Week2 |
|||
== '''Week2''' == |
|||
Continue.... |
Continue.... |
||
Week3 |
|||
== '''Week3''' == |
|||
Tokenizing the data |
Tokenizing the data |
||
Week4 |
|||
== '''Week4''' == |
|||
Preparing the tagset |
Preparing the tagset |
||
⚫ | |||
⚫ | |||
Submit the Tokenized and prepared tagset |
Submit the Tokenized and prepared tagset |
||
Week5 |
|||
== '''Week5''' == |
|||
Preparing the affix list |
Preparing the affix list |
||
Week6 |
|||
== '''Week6''' == |
|||
Writing the programm to develop Magahi morphological analyzer |
Writing the programm to develop Magahi morphological analyzer |
||
Week7 |
|||
== '''Week7''' == |
|||
Continueing programming |
Continueing programming |
||
Week8 |
|||
== '''Week8''' == |
|||
Train and test the model |
Train and test the model |
||
⚫ | |||
⚫ | |||
⚫ | |||
Week9 |
|||
⚫ | |||
== '''Week9''' == |
|||
Test the model with different domain of word |
Test the model with different domain of word |
||
Week10 |
|||
⚫ | |||
== '''Week10''' == |
|||
Week11 |
|||
⚫ | |||
== '''Week11''' == |
|||
Again train and test the model |
Again train and test the model |
||
Week12 |
|||
⚫ | |||
== '''Week12''' == |
|||
⚫ | |||
Project Completed |
Project Completed |
||
Submission of project |
'''Submission of project''' |
||
[[Category:GSoC 2019 student proposals]] |
Latest revision as of 18:25, 27 March 2019
== Apertium GSOC 2019
Morphological Analyzer of Magahi ==
Contents
Contact Information[edit]
Name: Mohit Raj
E-mail address: mohiitraj@gmail.com
Mobile Number : +91 9304843938(India)
IRC Nick: mohitraj
Github: Mohit-Raj123
Timezone: UTC +5.30
Why is it that you are interested in Apertium and Machine Translation ?[edit]
I belong to India where 22 shedule language and and almost 100 non-scheduled languages. It is obtained by subsuming several distinct languages under ‘dialects’ of some of the majority languages; languages with less than 10000 speakers are not even recognised and are put under a category called ‘others’. I like the concept of Apertium as an open source language translator that is really a nice thing to the world that would be definitely helpful for students, organinzation and any other body who wants to do work in the field of Machine Translation. My interest in Apertium because it is not only machine translation, but also free resources that can be used for other purposes e.g. dictionary, morphological analyser and spell checkers etc. I am student of Linguistics and My area of interest is Machine Translation and Natural Language Processing. Previously i have completed courses on XML, Python programming, Language Technologies and Machine Translation. I have worked towards the development of parser for Magahi, in collaboration with my classmate Neerav Mathur, for course projects.I took participation in following workshop :-
1. 9th IASNLP-2018: IIIT-Hyderabad Advanced School on Natural Language Processing
2. SOIL-Tech: Towards Digital India at JNU, New Delhi
3. Hands on workshop on Statistical Machine Translation with Moses at K.M.I, Agra
I wants to work on marginalised language because it will be definitely helpful to develop tools and technology for marginalised language that would be quite helpful to preserve language heritage of India and the world.
My Proposal[edit]
Title
Morphological analyzer of Magahi
== Why Google and Apertium should sponset it ? ==
Many language pairs are available on Apertium but many have to be develop. One of those is Eng-Magahi pair for Machine translation that should be develop. In the machine translation Morphological Analyzer plays an important role in improving the system’s performance for morphologically rich language like Magahi. So I am interested in developing morph analyzer of Magahi. How and who it will benefit in society ?
There are almost 12.7 million native speaker of Magahi accordingto census of 2011 and aditional speakers counted under Hindi. Eng-Magahi MT would be quite helpful for Magahi users and it would also play an important role in preserving the marginalised language.
Work Plan[edit]
Week1[edit]
Preparing linguistic rule for Morphological analyzer
Week2[edit]
Continue....
Week3[edit]
Tokenizing the data
Week4[edit]
Preparing the tagset
Deliverable #1
Submit the Tokenized and prepared tagset
Week5[edit]
Preparing the affix list
Week6[edit]
Writing the programm to develop Magahi morphological analyzer
Week7[edit]
Continueing programming
Week8[edit]
Train and test the model
Deliverable #2
Submit the program and trained, test model
Week9[edit]
Test the model with different domain of word
Week10[edit]
Fixing the occurring error in model
Week11[edit]
Again train and test the model
Week12[edit]
Evaluation of results or model
Project Completed
Submission of project