GSoC 2019 : Adopt an unreleased language pair 
- 1 Contact information
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in the Apertium project?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 List your skills and give evidence of your qualifications
- 6 List any non-Summer-of-Code plans you have for the Summer
Name: Kalabaev Sharapat
Location: Tashkent, Uzbekistan
E-mail address: email@example.com
Tel number: +998911341226
Timezone: GTM +5
Why is it you are interested in machine translation?
In today’s world, many of the remote languages are under the threat of extinction due to shortage of proper information about them. However, availability of modern technologies can have significant impact on preserving them from extinction. This can be achieved through wide availability of machine translation platforms which insures broad usage of languages.
Why is it that you are interested in the Apertium project?
The primary reason is that I desire Apertium project as a standout project amongst the best open source extends on machine interpretation sphere. I intend to develop Karakalpak-Uzbek translation system on Apertium platform. Karakalpak language serves as a bridge of communication between karakalpak people and uzbek government.
Which of the published tasks are you interested in? What do you plan to do?
Adopting an unreleased language pair of uzb-kaa languages.
In this project I am going to create a new language pair uzb-kaa. I have made the project (google play link, github link): rus->kaa and kaa->eng dictionary, therefore I have an access to the biggest Karakalpak language dictionary which I am going to use it here. So I believe that I can easily make a transducer for Karakalpak language. In addition, I have analyzed the existing repository for language pair of uzb-kaa languages (https://github.com/apertium/apertium-uzb-kaa/pull/2) and have found some linguistic errors which deviate true meaning of words. I am competent to fix these mistakes as I am a native speaker of karakalpak language.
Reasons why Google and Apertium should sponsor it
Although these languages are quite related, there is no single translator or dictionary is created till the present days. Additionally, this project would open new ways for Karakalpak language to associate with different languages as well, since now its inclusion level is very low.
A description of how and who it will benefit in society
It would an extraordinary assistance to holders of Karakalpak language, moreover, according to UNESCO, karakalpak language is regarded as vulnerable. The explanation behind this phenomenon is that the scarcity and unexplored status of Karakalpak language, thus, great efforts should be directed to this language. Major stakeholders of the project are native karakalpak people as there will be vast opportunity to explore world knowledge conveniently.
Community bonding period (May 6 - 27):
- Getting closer with Apertium tools and community
- Finding the language resources for Karakalpak and Uzbek
- Begin editing Uzbek - Karakalpak dictionary
Work Period (May 27 - August 19):
- Begin creating Karakalpak monodix using Uzbek monodix to its size.
- Check kaa monodix and fix existing translation errors
- Add nouns and verbs to kaa monodix
- Add adjectives, pronouns, adverbs, conjunctions and prepositions to kaa monodix
- Check the transducer
- Add transfer rules for adjectives, adverbs
- Run tests
- Discuss shortcomings of the performed work with the and fix it
Deliverable #1: updated kaa monodix
- Adding verbs to uzb-kaa bidix and adding necessary uzb-kaa transfer rules
- Adding pronouns, adverbs and others to uzb-kaa bidix and adding necessary uzb-kaa transfer rules
- Adding determinants and more adjectives to uzb-kaa bidix
- Test on a ~500 word story (achieve WER < 20%)
- add rules for concordance between verbs and pronouns
- Work on transfer rules in .t2x and .t3x files
- Test uzb-kaa bidix
- Discuss shortcomings of the performed work with the mentor and fix it
Deliverable #2: updated kaa monodix, uzb-kaa bidix and uzb-kaa transfer rules
- Check kaa, uzb monodix
- Test on ~1000 word story and achieve WER < 10% on it.
Deliverable #3: finished kaa monodix, updated uzb monodix, uzb-kaa bidix and uzb-kaa transfer rules
- Try to achieve WER < 10% on the big stories
- Discuss about performed work with the mentor
- evaluation of results and documentation
- Tidying up, releasing
- Final evaluation
List your skills and give evidence of your qualifications
I am on the 4th year of Bachelor’s degree in Programm Engineering faculty at the Tashkent University of Information Technology named after Al-Khwarizmi. My native language is Karakalpak [Kaa] and I know Uzbek [Uzb] language on a good level too, mainly due to their similarity and I live and study in Tashkent, Uzbekistan. Programming skills: C, C++, Java, Kotlin, Python, git and xml.
List any non-Summer-of-Code plans you have for the Summer
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd and 3rd months I will work ~ 40-50 hours per week in order to compensate.