Difference between revisions of "User:Qareken/GSoC2019Report"

From Apertium
Jump to navigation Jump to search
Line 2: Line 2:
 
== Develop a releasable Uzbek-Qaraqalpaq translation pair ==
 
== Develop a releasable Uzbek-Qaraqalpaq translation pair ==
   
  +
In this [https://github.com/apertium/apertium-uzb-kaa project] I have worked on unreleased language pair of uzb-kaa languages. For this I had to work in [https://github.com/apertium/apertium-uzb this] and [https://github.com/apertium/apertium-kaa this] repositories. I have created dictionary with about 30000 words.
----
 
  +
In this [https://github.com/apertium/apertium-uzb-kaa project] I have worked on unreleased language pair of uzb-kaa languages. For this I had to work in [https://github.com/apertium/apertium-uzb this] and [https://github.com/apertium/apertium-kaa this] repositories. I have created dictionary with about 30000 words. To do this first of all I have transformed uzbek-russian dictionary which author is Akobirov with words more than 50000 words from djvu to txt format and corrected mistakes, also converted it to latin. Then I started the same work with Baskakov's karakalpak-russian dictionary with more than 20000 words. Also added to [https://github.com/apertium/apertium-kaa/blob/master/tests/vocabulary/input.tsv database] Baskakov's Karakalpak-English dictionary with more than 7000 words. And wrote categories of these karakalpak words database. I have translated more than 30000 words in uzbek words [https://github.com/apertium/apertium-uzb/blob/master/tests/vocabulary/input.csv database] and added categories where is needed.
 
  +
== What is done ==
  +
 
First of all I have transformed uzbek-russian dictionary which author is Akobirov with words more than 50000 words from djvu to txt format and corrected mistakes, also converted it to latin. Then I started the same work with Baskakov's karakalpak-russian dictionary with more than 20000 words. Also added to [https://github.com/apertium/apertium-kaa/blob/master/tests/vocabulary/input.tsv database] Baskakov's Karakalpak-English dictionary with more than 7000 words. And wrote categories of these karakalpak words database. I have translated more than 30000 words in uzbek words [https://github.com/apertium/apertium-uzb/blob/master/tests/vocabulary/input.csv database] and added categories where is needed.
 
Then wrote it to [https://github.com/apertium/apertium-uzb-kaa/blob/master/apertium-uzb-kaa.uzb-kaa.dix this file] in form that we need and fixed problems which had appeared in this file.
 
Then wrote it to [https://github.com/apertium/apertium-uzb-kaa/blob/master/apertium-uzb-kaa.uzb-kaa.dix this file] in form that we need and fixed problems which had appeared in this file.
  +
  +
== What should be done in the future ==
   
 
In the future this work will be done with remaining words. Some words have only categories, some have only translations.
 
In the future this work will be done with remaining words. Some words have only categories, some have only translations.

Revision as of 18:33, 25 August 2019

Develop a releasable Uzbek-Qaraqalpaq translation pair

In this project I have worked on unreleased language pair of uzb-kaa languages. For this I had to work in this and this repositories. I have created dictionary with about 30000 words.

What is done

First of all I have transformed uzbek-russian dictionary which author is Akobirov with words more than 50000 words from djvu to txt format and corrected mistakes, also converted it to latin. Then I started the same work with Baskakov's karakalpak-russian dictionary with more than 20000 words. Also added to database Baskakov's Karakalpak-English dictionary with more than 7000 words. And wrote categories of these karakalpak words database. I have translated more than 30000 words in uzbek words database and added categories where is needed. Then wrote it to this file in form that we need and fixed problems which had appeared in this file.

What should be done in the future

In the future this work will be done with remaining words. Some words have only categories, some have only translations. And will be fixed problems with selection rule.