Difference between revisions of "Kazakh and Sakha/GSoC2018 report"
Jump to navigation
Jump to search
(2 intermediate revisions by the same user not shown) | |||
Line 11: | Line 11: | ||
30 April - 6 July |
30 April - 6 July |
||
Stems were added to Kazakh-Sakha bilingual dictionary. |
Stems were added to the Kazakh-Sakha bilingual dictionary. |
||
7 July - 14 August |
7 July - 14 August |
||
We focused on the Sakha morphological analyser to achieve 90% coverage. Stems were added, grammatical reference was reviewed and test files were created. |
We focused on the Sakha morphological analyser to achieve 90% coverage. Stems were added, grammatical reference[1] was reviewed and test files were created. |
||
===Translator coverage=== |
===Translator coverage=== |
||
Line 56: | Line 56: | ||
==References== |
==References== |
||
# Ubryatova, E.I. (Ed.) (1982), Grammatika sovremennogo yakutskogo literaturnogo yazika, Moscow. |
|||
# [https://sakhatyla.ru/ SakhaTyla.Ru - Sakha Dictionary] |
|||
# [https://sozdik.kz/ Казахско-русский словарь] |
|||
# [https://glosbe.com/ Online dictionary] |
|||
==Future work== |
==Future work== |
Latest revision as of 07:41, 14 August 2018
This page serves as a summary of all the work done in the Kazakh and Sakha pair during Google Summer of Code 2018. The project consisted mainly of building a bilingual bidix and enriching the Sakha morphological analyzer.
Contents
Commits[edit]
My commits can be found here. You can also download my work as a zip file.
Corpora and Coverage[edit]
Our corpora were Kazakh Wikipedia and Sakha Wikipedia articles.
Mostly work consisted of adding stems to dictionaries. Stems were added from frequency lists.
30 April - 6 July
Stems were added to the Kazakh-Sakha bilingual dictionary.
7 July - 14 August
We focused on the Sakha morphological analyser to achieve 90% coverage. Stems were added, grammatical reference[1] was reviewed and test files were created.
Translator coverage[edit]
Corpus | Words | Stems before | Stems after | Coverage before | Coverage after |
---|---|---|---|---|---|
Wikipedia | 10000 | 150 | 2870 | 29.36% | 70.57% |
Sakha morphological analyser coverage[edit]
Corpus | Words | Stems before | Stems after | Coverage before | Coverage after |
---|---|---|---|---|---|
Wikipedia | 95654 | 4070 | 9015 | 73.16% | 88.75% |
References[edit]
- Ubryatova, E.I. (Ed.) (1982), Grammatika sovremennogo yakutskogo literaturnogo yazika, Moscow.
- SakhaTyla.Ru - Sakha Dictionary
- Казахско-русский словарь
- Online dictionary
Future work[edit]
- Add more stems to Sakha monolingual dictionary
- Add more stems to Kazakh-Sakha bilingual dictionary
- Add transfer rules, etc.