Difference between revisions of "User:Qareken"
(Created page with "'''GSOC 2019 : Adopt an unreleased language pair''' [http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code#Adopt_an_unreleased_language_pair [1]] == Contact informati...") |
|||
(16 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
''' |
'''GSoC 2019 : Adopt an unreleased language pair''' [http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code#Adopt_an_unreleased_language_pair [1]] |
||
== Contact information == |
== Contact information == |
||
Line 18: | Line 18: | ||
'''Timezone''': GTM +5 |
'''Timezone''': GTM +5 |
||
== Why is it you are interested in machine translation? == |
== Why is it you are interested in machine translation? == |
||
In today’s world, many of the remote languages are under the threat of extinction due to shortage of proper information about them. However, availability of modern technologies can have significant impact on preserving them from extinction. This can be achieved through wide availability of machine translation platforms which insures broad usage of languages. |
In today’s world, many of the remote languages are under the threat of extinction due to shortage of proper information about them. However, availability of modern technologies can have significant impact on preserving them from extinction. This can be achieved through wide availability of machine translation platforms which insures broad usage of languages. |
||
Line 33: | Line 31: | ||
=== '''Title''' === |
=== '''Title''' === |
||
Adopting an unreleased language pair of |
Adopting an unreleased language pair of uzb-kaa languages. |
||
⚫ | |||
In this project I am going to create a new language pair |
In this project I am going to create a new language pair uzb-kaa. I have made the project ([https://play.google.com/store/apps/details?id=com.shagalalab.sozlik google play link], [https://github.com/shagalalab/sozlik-android github link]): rus->kaa and kaa->eng dictionary, therefore I have an access to the biggest Karakalpak language dictionary which I am going to use it here. So I believe that I can easily make a transducer for Karakalpak language. In addition, I have analyzed the existing repository for language pair of uzb-kaa languages (https://github.com/apertium/apertium-uzb-kaa/pull/2) and have found some linguistic errors which deviate true meaning of words. I am competent to fix these mistakes as I am a native speaker of karakalpak language. |
||
=== '''Reasons why Google and Apertium should sponsor it''' === |
=== '''Reasons why Google and Apertium should sponsor it''' === |
||
Although these languages are quite related, there is no single translator or dictionary is created till the present days. Additionally, this project would open new ways for Karakalpak language to associate with different languages as well, since now its inclusion level is very low. |
Although these languages are quite related, there is no single translator or dictionary is created till the present days. Additionally, this project would open new ways for Karakalpak language to associate with different languages as well, since now its inclusion level is very low. |
||
A description of how and who it will benefit in society |
=== '''A description of how and who it will benefit in society''' === |
||
It would an extraordinary assistance to holders of Karakalpak language, moreover, according to UNESCO, karakalpak language is regarded as vulnerable. The explanation behind this phenomenon is that the scarcity and unexplored status of Karakalpak language, thus, great efforts should be directed to this language. Major stakeholders of the project are native karakalpak people as there will be vast opportunity to explore world knowledge conveniently. |
It would an extraordinary assistance to holders of Karakalpak language, moreover, according to UNESCO, karakalpak language is regarded as vulnerable. The explanation behind this phenomenon is that the scarcity and unexplored status of Karakalpak language, thus, great efforts should be directed to this language. Major stakeholders of the project are native karakalpak people as there will be vast opportunity to explore world knowledge conveniently. |
||
=== '''Work plan''' === |
=== '''Work plan''' === |
||
Community bonding period: |
'''Community bonding period (May 6 - 27):''' |
||
*Getting closer with Apertium tools and community |
*Getting closer with Apertium tools and community |
||
*Doing coding challenge |
|||
*Finding the language resources for Karakalpak and Uzbek |
*Finding the language resources for Karakalpak and Uzbek |
||
*Begin editing Uzbek |
*Begin editing Uzbek - Karakalpak dictionary |
||
Work Period |
|||
'''Work Period (May 27 - August 19):''' |
|||
Week 1: |
Week 1: |
||
*Begin creating Karakalpak monodix using Uzbek monodix to its size. |
*Begin creating Karakalpak monodix using Uzbek monodix to its size. |
||
*Check kaa monodix |
*Check kaa monodix and fix existing translation errors |
||
*Add nouns and verbs to kaa monodix |
|||
Week 2: |
Week 2: |
||
*Add adjectives, pronouns, adverbs, conjunctions and prepositions to kaa monodix |
|||
*Check kaa monodix |
|||
Week 3: |
Week 3: |
||
*Check |
*Check the transducer |
||
*Add transfer rules for adjectives, adverbs |
|||
Week 4: |
Week 4: |
||
*Run tests |
|||
*Check kaa monodix |
|||
*Discuss shortcomings of the performed work with the and fix it |
|||
'''Deliverable #1:''' updated kaa monodix |
'''Deliverable #1:''' updated kaa monodix |
||
Week 5: |
Week 5: |
||
* |
*Adding verbs to uzb-kaa bidix and adding necessary uzb-kaa transfer rules |
||
Week 6: |
Week 6: |
||
⚫ | |||
*Check kaa monodix |
|||
Week 7 (Midterm June 23 - June 27): |
|||
Week 7: |
|||
*Check kaa monodix |
|||
*Adding determinants and more adjectives to uzb-kaa bidix |
|||
⚫ | |||
*Test on a ~500 word story (achieve WER < 20%) |
|||
*add rules for concordance between verbs and pronouns |
|||
Week 8: |
Week 8: |
||
*Work on transfer rules in .t2x and .t3x files |
|||
*Check kaa, uzb monodix |
|||
*Test uzb-kaa bidix |
|||
*Discuss shortcomings of the performed work with the mentor and fix it |
|||
'''Deliverable #2:''' updated kaa monodix, uzb-kaa bidix and uzb-kaa transfer rules |
|||
Week 9: |
Week 9: |
||
*Check kaa, uzb monodix |
*Check kaa, uzb monodix |
||
Week 10: |
Week 10: |
||
*Test on ~1000 word story and achieve WER < 10% on it. |
|||
*Check kaa, uzb monodix |
|||
'''Deliverable #3:''' finished kaa monodix, updated uzb monodix, uzb-kaa bidix and uzb-kaa transfer rules |
'''Deliverable #3:''' finished kaa monodix, updated uzb monodix, uzb-kaa bidix and uzb-kaa transfer rules |
||
Week 11: |
Week 11: |
||
*Try to achieve WER < 10% on the big stories |
|||
*testing |
|||
*Discuss about performed work with the mentor |
|||
Week 12: |
Week 12: |
||
*evaluation of results and documentation |
|||
*testing |
|||
Project completion: |
Project completion: |
||
*Tidying up, releasing |
*Tidying up, releasing |
||
*Final evaluation |
*Final evaluation |
||
⚫ | |||
== List your skills and give evidence of your qualifications == |
== List your skills and give evidence of your qualifications == |
||
I am on the 4th year of Bachelor’s degree in Programm Engineering faculty at the Tashkent University of Information Technology named after Al-Khwarizmi. |
I am on the 4th year of Bachelor’s degree in Programm Engineering faculty at the Tashkent University of Information Technology named after Al-Khwarizmi. |
||
Line 88: | Line 111: | ||
== List any non-Summer-of-Code plans you have for the Summer == |
== List any non-Summer-of-Code plans you have for the Summer == |
||
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd |
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd and 3rd months I will work ~ 40-50 hours per week in order to compensate. |
||
[[Category:GSoC 2019 student proposals]] |
Latest revision as of 16:08, 9 April 2019
GSoC 2019 : Adopt an unreleased language pair [1]
Contents
- 1 Contact information
- 2 Why is it you are interested in machine translation?
- 3 Why is it that you are interested in the Apertium project?
- 4 Which of the published tasks are you interested in? What do you plan to do?
- 5 List your skills and give evidence of your qualifications
- 6 List any non-Summer-of-Code plans you have for the Summer
Contact information[edit]
Name: Kalabaev Sharapat
Location: Tashkent, Uzbekistan
E-mail address: kalabaevshj@gmail.com
Tel number: +998911341226
IRC: qareken
SourceForge: qareken
Github: sharapat
Timezone: GTM +5
Why is it you are interested in machine translation?[edit]
In today’s world, many of the remote languages are under the threat of extinction due to shortage of proper information about them. However, availability of modern technologies can have significant impact on preserving them from extinction. This can be achieved through wide availability of machine translation platforms which insures broad usage of languages.
Why is it that you are interested in the Apertium project?[edit]
The primary reason is that I desire Apertium project as a standout project amongst the best open source extends on machine interpretation sphere. I intend to develop Karakalpak-Uzbek translation system on Apertium platform. Karakalpak language serves as a bridge of communication between karakalpak people and uzbek government.
Which of the published tasks are you interested in? What do you plan to do?[edit]
Title[edit]
Adopting an unreleased language pair of uzb-kaa languages.
In this project I am going to create a new language pair uzb-kaa. I have made the project (google play link, github link): rus->kaa and kaa->eng dictionary, therefore I have an access to the biggest Karakalpak language dictionary which I am going to use it here. So I believe that I can easily make a transducer for Karakalpak language. In addition, I have analyzed the existing repository for language pair of uzb-kaa languages (https://github.com/apertium/apertium-uzb-kaa/pull/2) and have found some linguistic errors which deviate true meaning of words. I am competent to fix these mistakes as I am a native speaker of karakalpak language.
Reasons why Google and Apertium should sponsor it[edit]
Although these languages are quite related, there is no single translator or dictionary is created till the present days. Additionally, this project would open new ways for Karakalpak language to associate with different languages as well, since now its inclusion level is very low.
A description of how and who it will benefit in society[edit]
It would an extraordinary assistance to holders of Karakalpak language, moreover, according to UNESCO, karakalpak language is regarded as vulnerable. The explanation behind this phenomenon is that the scarcity and unexplored status of Karakalpak language, thus, great efforts should be directed to this language. Major stakeholders of the project are native karakalpak people as there will be vast opportunity to explore world knowledge conveniently.
Work plan[edit]
Community bonding period (May 6 - 27):
- Getting closer with Apertium tools and community
- Finding the language resources for Karakalpak and Uzbek
- Begin editing Uzbek - Karakalpak dictionary
Work Period (May 27 - August 19):
Week 1:
- Begin creating Karakalpak monodix using Uzbek monodix to its size.
- Check kaa monodix and fix existing translation errors
- Add nouns and verbs to kaa monodix
Week 2:
- Add adjectives, pronouns, adverbs, conjunctions and prepositions to kaa monodix
Week 3:
- Check the transducer
- Add transfer rules for adjectives, adverbs
Week 4:
- Run tests
- Discuss shortcomings of the performed work with the and fix it
Deliverable #1: updated kaa monodix
Week 5:
- Adding verbs to uzb-kaa bidix and adding necessary uzb-kaa transfer rules
Week 6:
- Adding pronouns, adverbs and others to uzb-kaa bidix and adding necessary uzb-kaa transfer rules
Week 7:
- Adding determinants and more adjectives to uzb-kaa bidix
- Test on a ~500 word story (achieve WER < 20%)
- add rules for concordance between verbs and pronouns
Week 8:
- Work on transfer rules in .t2x and .t3x files
- Test uzb-kaa bidix
- Discuss shortcomings of the performed work with the mentor and fix it
Deliverable #2: updated kaa monodix, uzb-kaa bidix and uzb-kaa transfer rules
Week 9:
- Check kaa, uzb monodix
Week 10:
- Test on ~1000 word story and achieve WER < 10% on it.
Deliverable #3: finished kaa monodix, updated uzb monodix, uzb-kaa bidix and uzb-kaa transfer rules
Week 11:
- Try to achieve WER < 10% on the big stories
- Discuss about performed work with the mentor
Week 12:
- evaluation of results and documentation
Project completion:
- Tidying up, releasing
- Final evaluation
List your skills and give evidence of your qualifications[edit]
I am on the 4th year of Bachelor’s degree in Programm Engineering faculty at the Tashkent University of Information Technology named after Al-Khwarizmi. My native language is Karakalpak [Kaa] and I know Uzbek [Uzb] language on a good level too, mainly due to their similarity and I live and study in Tashkent, Uzbekistan. Programming skills: C, C++, Java, Kotlin, Python, git and xml.
List any non-Summer-of-Code plans you have for the Summer[edit]
I have no non-GSoC plans for the summer and can contribute from 30 to 40 hours a week. However, my school finishes in the middle of June. Therefore, if it is fine I would like to work ~ 20 hours in the first month and in the 2nd and 3rd months I will work ~ 40-50 hours per week in order to compensate.