User:Kamush/GSoC2021Proposal

From Apertium
< User:Kamush
Revision as of 14:14, 15 April 2021 by Kamush (talk | contribs) (Added proposal details)
Jump to navigation Jump to search

Develop a prototype MT system for Kazakh - Uzbek language pair

Contact Information

Name: Barno Kutlimuratova (@Kamush)

Nationality: Uzbekistan

Location: Galicia, Spain

University: Universidade da Coruña

Email: kutlimuratovab0712@gmail.com

Degree/Field of Study: MSc in Advanced English Studies and its Applications

IRC: Kamush

Timezone: GTM+2

Github: kamush901


Short Description of the proposal

Having seen the benefits of the open-source Rule-Based Machine Translation platform - Apertium as an alternative to other free/commercial online translator systems, especially for many low-resource language pairs, I decided to contribute to the platform by extending the list of language pairs my native language - Uzbek has so far.

Being a master student in philology, and having some experience in the creation of language resources, I would like to propose to implement new language pair: Kazakh - Uzbek for Apertium, as these two languages are both low-resource Turkic languages that are official languages of two respective Central Asian countries with so many economical and cultural relationships. But this language pair still lacks an open-source machine translation system.

My proposal is to fill this gap as much as possible during this GSoC2021 program.

Since Uzbek and Kazakh languages from the same language family, they are closely related in terms of grammar, word order, and similarity in vocabulary, so I will try to make a bidirectional translation, with a more focus on Kazakh -> Uzbek side, as Uzbek is my native language and I possess very basic knowledge in Kazakh.


Why is it that you are interested in Apertium?

Having specialized in creating NLP resources as my field of research, I wanted to contribute to my native language as well rather than only English. Apertium is a free and open-source platform for both RBMT as well as the Monolingual language package, I am interested in adding more resources there to support my native language.


Which of the published tasks are you interested in? What do you plan to do?

Title: Apertium translation pair for Kazakh - Uzbek

Besides what the proposal title says, I also can offer flexibility around working on language data, be it monolingual or in pairs where Uzbek is a target language (since it is my native one).

Major goals

Major points of my proposal are as following:

   • Spending a little time on Uzbek lexicon to achieve high-accuracy morphological analyser;
   • Initializing Kazakh-Uzbek pair (kaz-uzb);
   • Adding dictionary words to the Kazakh-Uzbek pair, increasing the coverage to above 80%;
   • Increasing WER on the Kazakh-Uzbek pair (goal: below 30%);
   • Implementing apertium-separable to the kaz-uzb pair;
   • Writing Lexical selection rules for better translation accuracy;
   • Creatng testvoc for testing;
   • Introducing apertium-recursive;


Workplan =

This part is beaing created...

Skills and qualifications

Academic skills: Currently I am a first year master student in Advanced English studies in Spain.


Language skills: Uzbek (native); English (advanced); Russian, Kazakh, Kyrgyz (basic).


Programming skills: I do have a basic understanding of XML and other Markup languages in general, I can work with bash scripts and I also can easily get help from my close people when there is a need for actual coding.


Declaration of Honour

I do declare that I can spend a required amount of hours working with Apertium during Community bonding and an actual working period during summer. I also inform that in case of immediate changes in personal life that might affect the working hours, I will immediately inform mentors and get their permission, with a condition to fulfill the requirements even if the official date is finished.