Difference between revisions of "User:Ogabek"

From Apertium
Jump to navigation Jump to search
(Blanked the page)
Tag: Blanking
 
Line 1: Line 1:
'''GSOC 2019 : Adopt an unreleased language pair Apertium Turkish-Uzbek''' [http://wiki.apertium.org/wiki/Ideas_for_Google_Summer_of_Code#Adopt_an_unreleased_language_pair]
 
 
== Personal Details ==
 
 
 
=== Contact Information ===
 
 
Name : Ogabek Yusupov <br/>
 
Location : Tashkent, Uzbekistan <br/>
 
Phone number : +998941155873 <br/>
 
Email : ogabekyusupov@gmail.com <br/>
 
IRC : ogabek <br/>
 
Github : [https://github.com/ogabek96 ogabek96] <br/>
 
Timezone : GMT + 5 <br/>
 
 
=== Education ===
 
4th year Bachelor student of Software Engineering Faculty in Tashkent university of information technologies named after Muhammad Al-Khwarizmi.
 
 
 
=== Technical skills ===
 
Programming languages: C++, Java, Javascript, PHP <br/>
 
Databases: MySQL,PostgreSQL <br/>
 
Frameworks: Express.js <br/>
 
Operating systems: Linux, Windows <br/>
 
 
 
=== Related projects ===
 
Open-source Uzbek-Korean language dictionary <br/>
 
 
 
=== Related work experience ===
 
Volunteered on Google translator: Translated sentences from English into Uzbek. <br/>
 
Participated in LIONBRIDGE Language Research: Record my voice reading sentences written in Uzbek language and sent audio files. <br/>
 
 
 
=== Languages ===
 
Uzbek(native), English, Russian <br/>
 
 
 
=== Why is it you are interested in machine translation? ===
 
I have always fascinated by machine translation and I am an active user of it. Machine translation nowadays demanded more than ever because people are travelling more than before and it takes down language barriers. Although the quality of translation improved significantly in recent years we cannot fully rely on it because of errors in translations. As a computer science student I think it is my responsibility to make it better.
 
 
 
=== Why is it that you are interested in the Apertium project? ===
 
The first attribute of Apertium platform that draw my attention is that it is open-source. Nowadays most existing platforms are not free and users cannot use them freely on their projects. Since I am a supporter of open-source I found this project is interesting.Another thing that I like in this project that there are many members who are actively contributing to Turkic languages. Since I am a native speaker of Uzbek I want to improve the translation of my native language too. My contribution to this project will be improving Turkish<->Uzbek language pair because it has not been updated for four years.
 
 
 
== Which of the published tasks are you interested in? What do you plan to do? ==
 
 
=== Title ===
 
Bring a released Turkish<->Uzbek language pair up to state-of-the-art quality. Also I am ready to fix technical errors because I have some experience in software development.
 
Reasons why Google and Apertium should sponsor it.'''
 
Although Uzbek and Turkish are in the same language groups there are no appropriate translation platforms on the internet. Also, although Uzbek language has 33 million native speakers it is not popular on the internet. The information found on the internet is very limited. I believe that my contribution to this platform will raise popularity of Uzbek language.
 
 
 
=== A description of how and who it will benefit in society ===
 
Firstly, It will benefit app developers since Apertium is open-source anyone can use it one their projects.
 
Secondly, the relation between Uzbekistan and Turkey is improving.
 
There are many visitors from Turkey to Uzbekistan for business or for tourism. Releasing Turkish<->Uzbek language pair will take down language barriers between these nations.
 
 
 
== Working plan ==
 
 
=== Doing coding challenge(until May 1) ===
 
Installing Apertium <br/>
 
Creating a wiki page on Apertium <br/>
 
Forking an existing language pair and setting Apertium to add data to an existing language pair. <br/>
 
Preliminary evaluation. Translate the story and try to imrove translation as much as possible <br />
 
Try to learn as much as possible about Apertium platform. <br/>
 
 
=== Community Bonding Period (May 6 - May 27) ===
 
Get closer with Apertium community <br/>
 
Investigate more about machine translation <br/>
 
Reading Apertium documentation, and exploring .dix, lexc and other formats of apertium-uzb and understand how they work <br />
 
Collecting resources in Turkish and Uzbek <br />
 
 
=== Work Period (May 27 - August 26) ===
 
 
==== Week 1: ====
 
Editing apertium-uzb.uzb.lexc and correcting existing translation errors. <br />
 
Write test scripts <br />
 
Add transfer rules for nouns, pronouns. <br />
 
Start working for pronouns, adverbs, and adjectives <br />
 
Add appropriate rules/stems.<br />
 
Achieve a WER < 20% for 1 basic text <br />
 
 
==== Week 2: ====
 
Add transfer rules for adjectives, adverbs <br />
 
Take another 500-word story.<br/>
 
Target: WER <50%
 
Post-edit translated texts. Analyze and look for common rules and add rules
 
 
==== Week 3: ====
 
Finish with lexical selection rules and chunking. <br />
 
Start working on disambiguation and its solutions <br />
 
Refactoring and documentation. <br />
 
 
==== Week 4: ====
 
Run corpus testing to analyze the improvement. <br />
 
Improve morphological analyzer <br />
 
=== Deliverable #1 ===
 
 
==== Week 5: ====
 
Find good parallel corpora and add words in decreasing frequency in apertium-uzb. <br />
 
Coverage ~45% <br />
 
Parallelly start working of tur-uzb bilingual dictionary <br />
 
 
==== Week 6: ====
 
Work on a ~ 700-word story <br />
 
Calculate WER, PER, and document <br />
 
Target WER <=40% <br />
 
Even up nouns, pronouns <br />
 
Even up for verbs, adjectives, adverbs <br />
 
 
==== Week 7: ====
 
Testvoc clean for all classes <br />
 
Working on transfer grammar rules (t1x) using the common rules generated from post-edit analysis <br />
 
WER <=30% <br />
 
Bidix-coverage ~45% <br />
 
 
==== Week 8: ====
 
Continue working on tur-uzb pair: <br/>
 
Add transfer rules for nouns, pronouns <br />
 
Add transfer rules for verbs, adjectives, adverbs. <br />
 
Start working on CG and disambiguation <br />
 
 
=== Deliverable #2 ===
 
 
==== Week 9: ====
 
Continue working on disambiguation and its solutions. <br />
 
Add required transfer/lexical selection rules to improve WER, PER. <br />
 
Begin with chunking and t3x <br />
 
 
==== Week 10: ====
 
Get another ~700 token story for tur-uzb and improve WER. <br />
 
Target WER <=25% <br />
 
Regression testing for tur-uzb pair <br />
 
Evaluate test results, make the required changes, run tests again <br />
 
User acceptance testing, trying evaluation. <br />
 
 
==== Week 11: ====
 
Regression testing for two pairs <br />
 
Achieve WER < 10% on all previous advanced texts and 3 new advanced texts <br />
 
 
==== Week 12: ====
 
Discuss with the mentor about some final changes that must be made. <br />
 
Detailed analysis on what further improvement could be made for the pairs <br />
 
Evaluation of results and documentation.<br />
 
 
=== Final evaluation ===
 
 
== List any non-Summer-of-Code plans you have for the Summer. ==
 
I don’t have non-GSoC plans for the summer I have university exams on July which lasts two weeks during this period I will spend 20 hours a week on this project. Other times I can dedicate 40 hours a week.
 
 
 
[[Category:GSoC 2019 student proposals|Ogabek]]
 

Latest revision as of 11:39, 9 March 2022