Difference between revisions of "Turkish and Kyrgyz/Final report"

From Apertium
Jump to navigation Jump to search
 
(35 intermediate revisions by 3 users not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}
''placeholder''


==Description==
==Description==
The aim of this project was to develop apertium machine translation apertium-tr-ky between Turkish and Kyrgyz languages. It was really challenging and hard at the same time. The translation is not perfect, there are lot of things to improve but it is quite satisfying according to the period of time and work done. By using apertium-tr-ky we have translated several child stories and showed it to native speakers of Kyrgyz and we get very positive reactions from them. Right now apertium-tr-ky is the only MT tool from any language to Kyrgyz so I am sure that it is a great success.
Below is an output from apertium-tr-ky with very few errors. The paragraph is taken from children's story "Snow white".

<pre>
Günlerden bir gün ayna kraliçenin bu sorusuna farklı bir yanıt vermiş; Bunu nasıl söyleyeceğim bilemem
ama Pamuk Prenses sizden güzel kraliçem.
Bunun üzerine çok sinirlenen kraliçe hemen bir avcı bulmuş ve ona Pamuk Prensesi alıp ormana götür ve
bana onun yüreğini getir, diye emretmiş.
Adamcağız Pamuk Prensesi ormana götürmüş ama öldürmeye kıyamamış. Durumu anlayan Pamuk Prenses beni burada bırak.
Bir daha asla geri dönmem merak etme diyerek avcıya yalvarmış.
Avcı da merhamete gelmiş ve onu orada bırakıp bir ceylanın yüreğini kraliçeye götürmüş.

Күндөрдөн бир күн күзгү королеванын бул суроосуна *farklı бир жооп берген; Муну кандай айтармын биле албайм
бирок Пахта Принцесса силерден жакшынакай королевам.
Анда абдан ачууланган королева дароо бир аңчы тапкан жана ага Пахта Принцессасы алып токойго жеткир жана
мага аны жүрөгүн алып кел, деп буйрук берген.
Байкуш киши Пахта Принцессасы токойго жеткирген бирок өлүүгө кыя алган эмес. Акыбалды түшүнгөн Пахта Принцесса мен бул жерде ташта.
Бир дагы такыр артка бурулбайм сарсанаа кылуу дей аңчыга жалынган.
Аңчы да кечиримге келген жана анын ал жакта таштап бир жейрендин жүрөгүн королевага жеткен.
</pre>

===TRmorph===
{{Main|Trmorph}}
We used TRmorph As a morphological analyser/generator for Turkish. Even though there are so many things to improve in TRmorph it is quite usable. Thanks to Çağrı for his great work establishing TRmorph, and to Gianluca for his work fixing it and assistance with it throughout the project.

===kymorph===
And we developed new morphological analyser/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyser/generator for Kyrgyz language.It is developed by lexc+twol on HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speech. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker.

===bidix===
Our bidix entry is almost 7,122 entries and it is very nice amount for now. Not having decent digital dictionary from Turkish to Kyrgyz was a big issue. I am planning to revise it later.

===Transfer rules===
It is really though to come up with certain transfer rules between Turkish and Kyrgyz. Even though we come with some rules which are working very well. Still in my plans to build new rules and revise existing ones.

===CG===
We use same CG which is used in apertium-tr-az. Obviously it must be developed and revised further. And I'd like to thank #zfe and #spectre for their great work.


==Statistics==
==Statistics==


; Dictionaries
; Dictionaries

* TRmorph lexicon: 7970
* <code>apertium-tr-ky.tr-ky.dix</code>:7,122
* <code>apertium-tr-ky.ky.lexc</code>: 8,577


; Coverage
; Coverage


* Turkish Wikipedia ( , std. dev.: )
* Turkish Wikipedia: 70.3284667174 +/- 3.11718448759
* Turkish SETimes ( , std. dev.: )
* Turkish SETimes: 84.078433445 +/- 0.778375681543
* Turkish ... ( , std. dev.: )


; Rules
; Rules

* <code>apertium-tr-ky.tr-ky.t1x</code>: 53


; Error rate
; Error rate
Line 21: Line 61:
! File !! Num. Words !! % OOV !! WER (Sur) !! PER (Sur) !! WER (Lem) !! PER (Lem)
! File !! Num. Words !! % OOV !! WER (Sur) !! PER (Sur) !! WER (Lem) !! PER (Lem)
|-
|-
| <code>setimes.kosova_plate.tr.txt</code> || 243 || - || - || - || - || -
| <code>setimes.kosova_plate.tr.txt</code> || 243 || 14.80% || 42.46% || 40.08% || - || -
|-
|-
| <code>setimes.kosova.tr.txt</code> || 424 || - || - || - || - || -
| <code>setimes.kosova.tr.txt</code> || 424 || 15.51 % || 31.83% || 31.15% || - ||-
|-
|-
| <code>setimes.bulgar.tr.txt</code> || 395 || - || - || - || - || -
| <code>setimes.bulgar.tr.txt</code> || 395 || 10.31% || 28.40% || 27.18% || - || -
|-
|-
| <code>wikipedia.kadinlar_askerler.tr.txt</code> || 1165 || - || - || - || - || -
| <code>wikipedia.kadinlar_askerler.tr.txt</code> || 1165 || 12.88% || 36.62% || 32.88% || - || -
|-
|-
|}
|}


==Future work==
==Future work==
I am planning to revise both bidix and lexc . Right now i am working on transfer rules so coming days i will come up with more precise rules between Turkish and Kyrgyz. I understand that creating detailed lexicon database of Kyrgyz is great need so in my plans to revise firespeakers database.

==Thanks==
Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.

==See Also==
1.Morphology of Kyrgyz Language http://wiki.apertium.org/wiki/Morphology_of_Kyrgyz_language <br/>
2.Subversion Statistics for apertium-tr-ky from firespeaker http://firespeaker.org/snippits/tr-ky/<br/>





Latest revision as of 07:14, 26 August 2011

Description[edit]

The aim of this project was to develop apertium machine translation apertium-tr-ky between Turkish and Kyrgyz languages. It was really challenging and hard at the same time. The translation is not perfect, there are lot of things to improve but it is quite satisfying according to the period of time and work done. By using apertium-tr-ky we have translated several child stories and showed it to native speakers of Kyrgyz and we get very positive reactions from them. Right now apertium-tr-ky is the only MT tool from any language to Kyrgyz so I am sure that it is a great success. Below is an output from apertium-tr-ky with very few errors. The paragraph is taken from children's story "Snow white".

Günlerden bir gün ayna kraliçenin bu sorusuna farklı bir yanıt vermiş; Bunu nasıl söyleyeceğim bilemem 
ama Pamuk Prenses sizden güzel kraliçem.
 Bunun üzerine çok sinirlenen kraliçe hemen bir avcı bulmuş ve ona Pamuk Prensesi alıp ormana götür ve 
bana onun yüreğini getir, diye emretmiş.
Adamcağız Pamuk Prensesi ormana götürmüş ama öldürmeye kıyamamış. Durumu anlayan Pamuk Prenses beni burada bırak. 
Bir daha asla geri dönmem merak etme diyerek avcıya yalvarmış. 
Avcı da merhamete gelmiş ve onu orada bırakıp bir ceylanın yüreğini kraliçeye götürmüş.

Күндөрдөн бир күн күзгү королеванын бул суроосуна *farklı бир жооп берген; Муну кандай айтармын биле албайм 
бирок Пахта Принцесса силерден жакшынакай королевам.
 Анда абдан ачууланган королева дароо бир аңчы тапкан жана ага Пахта Принцессасы алып токойго жеткир жана 
мага аны жүрөгүн алып кел, деп буйрук берген.
Байкуш киши Пахта Принцессасы токойго жеткирген бирок өлүүгө кыя алган эмес. Акыбалды түшүнгөн Пахта Принцесса мен бул жерде ташта. 
Бир дагы такыр артка бурулбайм сарсанаа кылуу дей аңчыга жалынган. 
Аңчы да кечиримге келген жана анын ал жакта таштап бир жейрендин жүрөгүн королевага жеткен.

TRmorph[edit]

Main article: Trmorph

We used TRmorph As a morphological analyser/generator for Turkish. Even though there are so many things to improve in TRmorph it is quite usable. Thanks to Çağrı for his great work establishing TRmorph, and to Gianluca for his work fixing it and assistance with it throughout the project.

kymorph[edit]

And we developed new morphological analyser/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyser/generator for Kyrgyz language.It is developed by lexc+twol on HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speech. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker.

bidix[edit]

Our bidix entry is almost 7,122 entries and it is very nice amount for now. Not having decent digital dictionary from Turkish to Kyrgyz was a big issue. I am planning to revise it later.

Transfer rules[edit]

It is really though to come up with certain transfer rules between Turkish and Kyrgyz. Even though we come with some rules which are working very well. Still in my plans to build new rules and revise existing ones.

CG[edit]

We use same CG which is used in apertium-tr-az. Obviously it must be developed and revised further. And I'd like to thank #zfe and #spectre for their great work.

Statistics[edit]

Dictionaries
  • TRmorph lexicon: 7970
  • apertium-tr-ky.tr-ky.dix:7,122
  • apertium-tr-ky.ky.lexc: 8,577
Coverage
  • Turkish Wikipedia: 70.3284667174 +/- 3.11718448759
  • Turkish SETimes: 84.078433445 +/- 0.778375681543
Rules
  • apertium-tr-ky.tr-ky.t1x: 53
Error rate
File Num. Words % OOV WER (Sur) PER (Sur) WER (Lem) PER (Lem)
setimes.kosova_plate.tr.txt 243 14.80% 42.46% 40.08% - -
setimes.kosova.tr.txt 424 15.51 % 31.83% 31.15% - -
setimes.bulgar.tr.txt 395 10.31% 28.40% 27.18% - -
wikipedia.kadinlar_askerler.tr.txt 1165 12.88% 36.62% 32.88% - -

Future work[edit]

I am planning to revise both bidix and lexc . Right now i am working on transfer rules so coming days i will come up with more precise rules between Turkish and Kyrgyz. I understand that creating detailed lexicon database of Kyrgyz is great need so in my plans to revise firespeakers database.

Thanks[edit]

Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.

See Also[edit]

1.Morphology of Kyrgyz Language http://wiki.apertium.org/wiki/Morphology_of_Kyrgyz_language
2.Subversion Statistics for apertium-tr-ky from firespeaker http://firespeaker.org/snippits/tr-ky/