Turkish and Kyrgyz/Final report
placeholder
Description
The aim of this project was to develop apertium machine translation apertium-tr-ky between Turkish and Kyrgyz languages. It was really challenging and hard at the same time. The translation is not perfect, there are lot of things to improve but it is quite satisfying according to the period of time and work done. By using apertium-tr-ky we have translated several child stories and showed it to native speakers of Kyrgyz and we get very positive reactions from them. Right now apertium-tr-ky is the only MT tool from any language to Kyrgyz so I am sure that it is a great success. Below is an output from apertium-tr-ky with very few errors. The paragraph is taken from child story "Snow white".
Günlerden bir gün ayna kraliçenin bu sorusuna farklı bir yanıt vermiş; Bunu nasıl söyleyeceğim bilemem ama Pamuk Prenses sizden güzel kraliçem. Bunun üzerine çok sinirlenen kraliçe hemen bir avcı bulmuş ve ona Pamuk Prensesi alıp ormana götür ve bana onun yüreğini getir, diye emretmiş. Adamcağız Pamuk Prensesi ormana götürmüş ama öldürmeye kıyamamış. Durumu anlayan Pamuk Prenses beni burada bırak. Bir daha asla geri dönmem merak etme diyerek avcıya yalvarmış. Avcı da merhamete gelmiş ve onu orada bırakıp bir ceylanın yüreğini kraliçeye götürmüş. Күндөрдөн бир күн күзгү королеванын бул суроосуна *farklı бир жооп берген; Муну кандай айтармын биле албайм бирок Пахта Принцесса силерден жакшынакай королевам. Анда абдан ачууланган королева дароо бир аңчы тапкан жана ага Пахта Принцессасы алып токойго жеткир жана мага аны жүрөгүн алып кел, деп буйрук берген. Байкуш киши Пахта Принцессасы токойго жеткирген бирок өлүүгө кыя алган эмес. Акыбалды түшүнгөн Пахта Принцесса мен бул жерде ташта. Бир дагы такыр артка #бур сарсанаа кылуу дей аңчыга жалынган. Аңчы да кечиримге келген жана анын ал жакта таштап бир жейрендин жүрөгүн королевага жеткен.
TRmorph
We used TRmorph As a morphological analyzer/generator for Turkish. Even though there are so many things to improve in TRmorph it is quite usable.Thanks to Çağrı for his great work.
kymorph
And we developed new morphological analyzer/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyzer/generator for Kyrgyz language.It is developed by lexc+twol on HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speeches. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker.
bidix
Our bidix entry is almost 7122 entries and it is very nice amount for now. Not having decent digital dictionary from Turkish to Kyrgyz was a big issue.I am planning to revise it later.
Transfer rules
It is really though to come up with certain transfer rules between Turkish and Kyrgyz. Even though we come with some rules which are working very well. Still in my plans to build new rules and revise existing ones.
CG
We use same CG which is used in apertium-tr-az.Obviously it must be developed and revised further. And I'd like to thank #zfe and #spectre for their great work.
Statistics
- Dictionaries
- trmorph lexicon:
apertium-tr-ky.tr-ky.dix
(unique: , total: ) 7122apertium-tr-ky.ky.lexc
- Coverage
- Turkish Wikipedia ( , std. dev.: )
- Turkish SETimes ( , std. dev.: )
- Turkish ... ( , std. dev.: )
- Rules
- Error rate
File | Num. Words | % OOV | WER (Sur) | PER (Sur) | WER (Lem) | PER (Lem) |
---|---|---|---|---|---|---|
setimes.kosova_plate.tr.txt |
243 | 14.80% | 42.46% | 40.08% | - | - |
setimes.kosova.tr.txt |
424 | 15.51 % | 31.83% | 31.15% | - | - |
setimes.bulgar.tr.txt |
395 | 10.31% | 28.40% | 27.18% | - | - |
wikipedia.kadinlar_askerler.tr.txt |
1165 | 12.88% | 36.62% | 32.88% | - | - |
Future work
I am planning to revise both bidix and lexc . Right now i am working on transfer rules so coming days i will come up with more precise rules between Turkish and Kyrgyz. I understand that creating detailed lexicon database of Kyrgyz is great need so in my plans to revise firespeakers database.
Thanks
Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.
See Also
1.Morphology of Kyrgyz Language http://wiki.apertium.org/wiki/Morphology_of_Kyrgyz_language
2.Subversion Statistics for apertium-tr-ky from firespeaker http://firespeaker.org/snippits/tr-ky/