Difference between revisions of "User:Qareken"

From Apertium
Jump to navigation Jump to search
Line 105: Line 105:
 
== List any non-Summer-of-Code plans you have for the Summer ==
 
== List any non-Summer-of-Code plans you have for the Summer ==
 
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd & 3rd months I will work ~ 50 hours per week in order to compensate.
 
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd & 3rd months I will work ~ 50 hours per week in order to compensate.
  +
  +
[[Category:GSoC 2019 student proposals|Sharapat/GSoC 2019]]

Revision as of 04:59, 8 April 2019

GSoC 2019 : Adopt an unreleased language pair [1]

Contact information

Name: Kalabaev Sharapat

Location: Tashkent, Uzbekistan

E-mail address: kalabaevshj@gmail.com

Tel number: +998911341226

IRC: qareken

SourceForge: qareken

Github: sharapat

Timezone: GTM +5

Why is it you are interested in machine translation?

In today’s world, many of the remote languages are under the threat of extinction due to shortage of proper information about them. However, availability of modern technologies can have significant impact on preserving them from extinction. This can be achieved through wide availability of machine translation platforms which insures broad usage of languages.

Why is it that you are interested in the Apertium project?

The primary reason is that I desire Apertium project as a standout project amongst the best open source extends on machine interpretation sphere. I intend to develop Karakalpak-Uzbek translation system on Apertium platform. Karakalpak language serves as a bridge of communication between karakalpak people and uzbek government.

Which of the published tasks are you interested in? What do you plan to do?

Title

Adopting an unreleased language pair of uzb<->kaa languages.

In this project I am going to create a new language pair uzbek-kaa. I have made the project (google play link, github link): rus->kaa and kaa->eng dictionary, therefore I have an access to the biggest Karakalpak language dictionary which I am going to use it here. So I believe that I can easily make a transducer for Karakalpak language. In addition, I have analyzed the existing repository for language pair of uzb-kaa languages (https://github.com/apertium/apertium-uzb-kaa/pull/1) and have found some linguistic errors which deviate true meaning of words. I am competent to fix these mistakes as I am a native speaker of karakalpak language.

Reasons why Google and Apertium should sponsor it

Although these languages are quite related, there is no single translator or dictionary is created till the present days. Additionally, this project would open new ways for Karakalpak language to associate with different languages as well, since now its inclusion level is very low.

A description of how and who it will benefit in society

It would an extraordinary assistance to holders of Karakalpak language, moreover, according to UNESCO, karakalpak language is regarded as vulnerable. The explanation behind this phenomenon is that the scarcity and unexplored status of Karakalpak language, thus, great efforts should be directed to this language. Major stakeholders of the project are native karakalpak people as there will be vast opportunity to explore world knowledge conveniently.

Work plan

Community bonding period (May 6 - 27):

  • Getting closer with Apertium tools and community
  • Doing coding challenge
  • Finding the language resources for Karakalpak and Uzbek
  • Begin editing Uzbek <-> Karakalpak dictionary

Work Period (May 27 - August 19):

Week 1:

  • Begin creating Karakalpak monodix using Uzbek monodix to its size.
  • Check kaa monodix

Week 2:

  • Check kaa monodix

Week 3:

  • Check kaa monodix

Week 4:

  • Check kaa monodix

Deliverable #1: updated kaa monodix

Week 5:

  • Check and add nouns to kaa monodix, adding new entries to uzb-kaa bidix and adding necessary uzb-kaa transfer rules

Week 6:

  • Check and add nouns to kaa monodix, adding new entries to uzb-kaa bidix and adding necessary uzb-kaa transfer rules

Week 7:

  • Check kaa monodix

Week 8:

  • Check kaa monodix

Deliverable #2: updated kaa monodix, uzb-kaa bidix and uzb-kaa transfer rules

Week 9:

  • Check kaa, uzb monodix

Week 10:

  • Check kaa, uzb monodix

Deliverable #3: finished kaa monodix, updated uzb monodix, uzb-kaa bidix and uzb-kaa transfer rules

Week 11:

  • testing

Week 12:

  • testing

Project completion:

  • Tidying up, releasing
  • Final evaluation

List your skills and give evidence of your qualifications

I am on the 4th year of Bachelor’s degree in Programm Engineering faculty at the Tashkent University of Information Technology named after Al-Khwarizmi. My native language is Karakalpak [Kaa] and I know Uzbek [Uzb] language on a good level too, mainly due to their similarity and I live and study in Tashkent, Uzbekistan. Programming skills: C, C++, Java, Kotlin, Python, git and xml.

List any non-Summer-of-Code plans you have for the Summer

I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd & 3rd months I will work ~ 50 hours per week in order to compensate.