User:Qareken
GSoC 2019 : Adopt an unreleased language pair [1]
Contents
- 1 Contact information
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in the Apertium project?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 List your skills and give evidence of your qualifications
- 6 List any non-Summer-of-Code plans you have for the Summer
Contact information
Name: Kalabaev Sharapat
Location: Tashkent, Uzbekistan
E-mail address: kalabaevshj@gmail.com
Tel number: +998911341226
IRC: qareken
SourceForge: qareken
Github: sharapat
Timezone: GTM +5
Why is it you are interested in machine translation?
In today’s world, many of the remote languages are under the threat of extinction due to shortage of proper information about them. However, availability of modern technologies can have significant impact on preserving them from extinction. This can be achieved through wide availability of machine translation platforms which insures broad usage of languages.
Why is it that you are interested in the Apertium project?
The primary reason is that I desire Apertium project as a standout project amongst the best open source extends on machine interpretation sphere. I intend to develop Karakalpak-Uzbek translation system on Apertium platform. Karakalpak language serves as a bridge of communication between karakalpak people and uzbek government.
Which of the published tasks are you interested in? What do you plan to do?
Title
Adopting an unreleased language pair of uzb<->kaa languages.
In this project I am going to create a new language pair uzbek-kaa. I have made the project ( google play link, github link): rus->kaa and kaa->eng dictionary, therefore I have an access to the biggest Karakalpak language dictionary which I am going to use it here. So I believe that I can easily make a transducer for Karakalpak language. In addition, I have analyzed the existing repository for language pair of uzb-kaa languages (https://github.com/apertium/apertium-uzb-kaa/pull/1) and have found some linguistic errors which deviate true meaning of words. I am competent to fix these mistakes as I am a native speaker of karakalpak language.
Reasons why Google and Apertium should sponsor it
Although these languages are quite related, there is no single translator or dictionary is created till the present days. Additionally, this project would open new ways for Karakalpak language to associate with different languages as well, since now its inclusion level is very low.
A description of how and who it will benefit in society
It would an extraordinary assistance to holders of Karakalpak language, moreover, according to UNESCO, karakalpak language is regarded as vulnerable. The explanation behind this phenomenon is that the scarcity and unexplored status of Karakalpak language, thus, great efforts should be directed to this language. Major stakeholders of the project are native karakalpak people as there will be vast opportunity to explore world knowledge conveniently.
Work plan
Community bonding period (May 6 - 27):
- Getting closer with Apertium tools and community
- Doing coding challenge
- Finding the language resources for Karakalpak and Uzbek
- Begin editing Uzbek <-> Karakalpak dictionary
Work Period (May 27 - August 19):
Week 1:
- Begin creating Karakalpak monodix using Uzbek monodix to its size.
- Check kaa monodix
Week 2:
- Check kaa monodix
Week 3:
- Check kaa monodix
Week 4:
- Check kaa monodix
Deliverable #1: updated kaa monodix
Week 5:
- Check and add nouns to kaa monodix, adding new entries to uzb-kaa bidix and adding necessary uzb-kaa transfer rules
Week 6:
- Check and add nouns to kaa monodix, adding new entries to uzb-kaa bidix and adding necessary uzb-kaa transfer rules
Week 7:
- Check kaa monodix
Week 8:
- Check kaa monodix
Deliverable #2: updated kaa monodix, uzb-kaa bidix and uzb-kaa transfer rules
Week 9:
- Check kaa, uzb monodix
Week 10:
- Check kaa, uzb monodix
Deliverable #3: finished kaa monodix, updated uzb monodix, uzb-kaa bidix and uzb-kaa transfer rules
Week 11:
- testing
Week 12:
- testing
Project completion:
- Tidying up, releasing
- Final evaluation
List your skills and give evidence of your qualifications
I am on the 4th year of Bachelor’s degree in Programm Engineering faculty at the Tashkent University of Information Technology named after Al-Khwarizmi. My native language is Karakalpak [Kaa] and I know Uzbek [Uzb] language on a good level too, mainly due to their similarity and I live and study in Tashkent, Uzbekistan. Programming skills: C, C++, Java, Kotlin, Python, git and xml.
List any non-Summer-of-Code plans you have for the Summer
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd & 3rd months I will work ~ 50 hours per week in order to compensate.