Difference between revisions of "User:Ogabek"

From Apertium
Jump to navigation Jump to search
Line 74: Line 74:
Investigate more about machine translation <br/>
Investigate more about machine translation <br/>
Collecting resources in Turkish and Uzbek
Collecting resources in Turkish and Uzbek
Reading Apertium documentation, and exploring .dix and other formats of bilingual dictionary and understand how they work


=== Work Period (May 27 - August 26) ===
=== Work Period (May 27 - August 26) ===

Revision as of 10:05, 8 April 2019

GSOC 2019 : Extend weighted transfer rules[1]

Personal Details

Contact Information

Name : Ogabek Yusupov
Location : Tashkent, Uzbekistan
Phone number : +998941155873
Email : ogabekyusupov@gmail.com
IRC : ogabek
Github : https://github.com/ogabek96
Timezone : GMT + 5


Education

4th year Bachelor student of Software Engineering Faculty in Tashkent university of information technologies named after Muhammad Al-Khwarizmi.


Technical skills

Programming languages: C++, Java, Javascript, PHP
Databases: MySQL,PostgreSQL
Frameworks: Express.js
Operating systems: Linux, Windows


Related projects

Open-source Uzbek-Korean language dictionary


Related work experience

Volunteered on Google translator: Translated sentences from English into Uzbek.
Participated in LIONBRIDGE Language Research: Record my voice reading sentences written in Uzbek language and sent audio files.


Languages

Uzbek(native), English, Russian


Why is it you are interested in machine translation?

I have always fascinated by machine translation and I am an active user of it. Machine translation nowadays demanded more than ever because people are travelling more than before and it takes down language barriers. Although the quality of translation improved significantly in recent years we cannot fully rely on it because of errors in translations. As a computer science student I think it is my responsibility to make it better.


Why is it that you are interested in the Apertium project?

The first attribute of Apertium platform that draw my attention is that it is open-source. Nowadays most existing platforms are not free and users cannot use them freely on their projects. Since I am a supporter of open-source I found this project is interesting.Another thing that I like in this project that there are many members who are actively contributing to Turkic languages. Since I am a native speaker of Uzbek I want to improve the translation of my native language too. My contribution to this project will be improving Turkish<->Uzbek language pair because it has not been updated for four years.


Which of the published tasks are you interested in? What do you plan to do?

Title

Bring a released Turkish<->Uzbek language pair up to state-of-the-art quality. Also I am ready to fix technical errors because I have some experience in software development. Reasons why Google and Apertium should sponsor it. Although Uzbek and Turkish are in the same language groups there are no appropriate translation platforms on the internet. Also, although Uzbek language has 33 million native speakers it is not popular on the internet. The information found on the internet is very limited. I believe that my contribution to this platform will raise popularity of Uzbek language.


A description of how and who it will benefit in society

Firstly, It will benefit app developers since Apertium is open-source anyone can use it one their projects. Secondly, the relation between Uzbekistan and Turkey is improving. There are many visitors from Turkey to Uzbekistan for business or for tourism. Releasing Turkish<->Uzbek language pair will take down language barriers between these nations.


Working plan

Doing coding challenge(until May 1)

Installing Apertium
Forking an existing language pair and setting Apertium to add data to and existing language pair.
Try to learn as much as possible about Apertium platform.
Creating a wiki page on Apertium
Preliminary evaluation. Translate the story total coverage and without diagnostics. Get a baseline WER. Work on disambiguation, the morphological ambiguities in the story should be resolved.

Community Bonding Period (May 6 - May 27)

Get closer with Apertium community
Investigate more about machine translation
Collecting resources in Turkish and Uzbek Reading Apertium documentation, and exploring .dix and other formats of bilingual dictionary and understand how they work

Work Period (May 27 - August 26)

Week 1:

Discussing more detailed work plan with mentors.
Editing apertium-uzb.uzb.lexc and correcting translation errors.

Week 2:

Adding nouns and adjectives to apertium-uzb.uzb.lexc

Week 3:

Week 4:

Week 5:

Week 6:

Week 7:

Week 8:

Week 9:

Week 10:

Week 11:

Week 12:

List any non-Summer-of-Code plans you have for the Summer.

I don’t have non-GSoC plans for the summer I have university exams on July which lasts two weeks during this period I will spend 20 hours a week on this project. Other times I can dedicate 40 hours a week.