User:Hiten
Contents
Contact Information
Name: Hiten Vidhani
Location: India
University: Birla Institute of Technology and Science Pilani
Email address: vidhani.hiten2001@gmail.com
IRC: @hi101:matrix.org
Timezone: GMT+5:30
Github: hitenvidhani
Why is it that you are interested in Apertium?
Which of the published tasks are you interested in? What do you plan to do?
I am interested in the task "Bring an unreleased translation pair to releasable quality". I plan to develop the Marwari-Hindi(MWR-HIN) pair.
Proposal
Deliverables:
- Creating the MWR-HIN bilingual dictionary.
- Creating the MWR monolingual dictionary
- Updating the HIN monolingual dictionary, if required.
- Building the transfer rules for the MWR-HIN pair.
- Creating a MWR-HIN translator.
Reasons why Google and Apertium should sponsor it:
- Marwari has about 22 million speakers from India and neighboring countries of India. Despite the popularity the major translation tools like Google Translate don't include it.
- The project adds diversity to Apertium by including Marwari
- This project will be an important addition to the community, which could further be used to build projects or carry out research in low-resource languages which is a growing research area.
- By releasing the first MWR-HIN translator open-source it will further benefit developers in building more related language pairs to Marwari.
How and who it will benefit in society
- The project will benefit the native speakers of this language and the people travelling to Rajasthan which is an Indian state where the most used language is Marwari. The state of Rajasthan is an important tourist attraction all over the world. It would also help tourists to communicate with local people of Rajasathan.
- It will also help researchers in Natural Language Processing to carry out their research in Marwari.
- The developers can use this project to create other language pairs which are closely related to Marwari.
- In the long run, this project aims to reduce the language barrier which exists where people of two different regions find it difficult to communicate.
Work plan
Community bonding period (May 4 - May 28):
- Getting introduced to the organization and community of Apertium.
- Understanding the code/projects which would be needed as a reference for my project.
- Discussing the project ideas and taking suggestions from the community regarding the implementation of the project.
- Exploring and finding resources for Marwari.
Work Period (May 29 - 28 Aug):
Week 1:
- Adding nouns and adjectives to bilingual and MWR monolingual dictionary.
Week 2:
- Getting familiar to the syntax for writing transfer rules.
- Writing transfer rules for nouns and adjectives.
Week 3:
- Adding verbs and other parts of speech to the dictionaries.
- Writing transfer rules for the same.
Week 4:
- Run tests
- Update documentation
- Prepare for the first evaluation
Deliverable 1: Monolingual and Bilingual dictionary, basic transfer rules
Week 5:
- Translating essays/paragraphs and aim to achieve WER < 50%.
- Working on lexical selection rules.
Week 6:
- Using testvoc clean for adjectives.
- Aim to achieve WER < 35%.
Week 7:
- Expanding dictionaries further.
- Working on disambiguation rules for MWR-HIN.
Week 8:
- Expanding bilingual dictionary
- Lexical selection rules
- Disambiguation rules
- Transfer rules
- Prepare for the second evaluation
Deliverable 2: Improved Bilingual dictionary and updated rules Week 9&10:
- Testvoc MWR-HIN
- Discussing documentation details with mentors and organization.
Week 11&12:
- Completing any pending tasks.
- Final discussion and release of the project and documentation.
- Project completed
I am a senior Computer Science undergraduate at Birla Institute of Technology and Science Pilani(BITS Pilani), India, which is an institute of Eminence. I have also done my internship at Ericsson where I build a NLP based ticket-classifier using python. I also developed a POS tagger for Hin-Eng code mixed dataset by using Hidden Markov Model as a part of the Natural Language Processing coursework in my university. I have also interned at Artificial Intelligence Institute of South Carolina where I had worked on the transformer architecture. Through these projects and my university coursework I have gained proficiency in programming languages like Python, C++, XML, HTML/CSS. In general I love solving problems using various programming tools. I am a native hindi speaker and can read and write Marwari. As I have previously worked in Natural Language Processing for my projects, and that I understand two languages HIN and MWR, I believe that I am a good fit for this project. I'd also be glad to be a part of this wonderful community at apertium and learn from them.
Non summer of code plans
I do not have any non-GSoC plans for the coming summer of 2023. I can spend 30 hours a week for this project. Although my university curriculum will be starting from August so I'll be working extensively in the summer to compensate before my university starts.