User:Natasha singh/GSoC2023Proposal

Contact Details

Name: Natasha Singh

E-mail address: natashasi475@gmail.com

IRC: natasha_singh

University: Indiana University - Bloomington, USA

Timezone: EST (GMT-4)

Why is it that you are interested in Apertium?

I am a first year MS Computational Linguistics student at Indiana University - Bloomington. As a trilingual who can speak English, Hindi and Kumaoni/ Kumauni(an indo-aryan language written in Devanagari script), I am interested in contributing to the development of language resources and NLP. A lot of resources are available online for English and Hindi languages but for a language like Kumaoni not much content is published. Since Apertium is a rule based machine translation platform, it is excellent for developing language resources and translation systems for less-resourced languages, which do not have sufficient data to train a good ML or DL based NLP model.

Which of the published tasks are you interested in? What do you plan to do?

I am interested in working on the Morphological analyzer task. Morphological Analysis is an important step for developing any NLP project. The results obtained from this task can be leveraged by many downstream tasks such as POS tagging, Spell checking, Information Retrieval, Named Entity Recognition, Machine Translation, etc.

Recently, UNESCO has designated Kumaoni language as a language in the unsafe category. Most native people are choosing Hindi or English over Kumaoni because these languages offer more resources and opportunities. This calls for consistent efforts to safeguard the language. I believe this project will provide me an opportunity to contribute to the preservation and promotion of language and culture of the Kumaoni community which has less than 0.2% of native speakers in India. This project can serve as the stepping stone in extending various NLP applications in Kumaoni language which will in turn help facilitate communication and access to information for the native speakers.

Work plan

Week 1: 10% token coverage (with 100 lexicon)
Week 2: 30% token coverage (with 500 lexicon)
Week 3: 50% token coverage
Week 4: 70% token coverage (with 2000 lexicon)

Deliverable #1: Close cases completed

Week 5: 75% token coverage
Week 6: 80% token coverage
Week 7: 85% token coverage (with 5000 lexicon)
Week 8: 90% token coverage

Deliverable #2: Evaluation

Week 9: 92.5% token coverage (with 8000 lexicon)
Week 10: 95% token coverage
Week 11: 95% token coverage (with 10000 lexicon)
Week 12: Documentation: Paper

Project completed

User:Natasha singh/GSoC2023Proposal

Contents

Contact Details

Why is it that you are interested in Apertium?

Which of the published tasks are you interested in? What do you plan to do?

Work plan

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools