Difference between revisions of "Turkish and Kyrgyz/Final report"
Line 9: | Line 9: | ||
===kymorph=== |
===kymorph=== |
||
And we developed new morphological analyzer/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyzer/generator for Kyrgyz language.It is developed by HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speeches. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker. |
And we developed new morphological analyzer/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyzer/generator for Kyrgyz language.It is developed by lexc+twol on HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speeches. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker. |
||
===bidix=== |
===bidix=== |
Revision as of 08:21, 25 August 2011
placeholder
Description
The aim of this project was to develop apertium machine translation apertium-tr-ky between Turkish and Kyrgyz languages. It was really challenging and hard at the same time. The translation is not perfect, there are lot of things to improve but it is quite satisfying according to the period of time and work done. By using apertium-tr-ky we have translated several child stories and showed it to native speakers of Kyrgyz and we get very positive reactions from them. Right now apertium-tr-ky is the only MT tool from any language to Kyrgyz so I am sure that it is a great success.
TRmorph
We used TRmorph As a morphological analyzer/generator for Turkish. Even though there are so many things to improve in TRmorph it is quite usable.Thanks to Çağrı for his great work.
kymorph
And we developed new morphological analyzer/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyzer/generator for Kyrgyz language.It is developed by lexc+twol on HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speeches. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker.
bidix
Our bidix entry is almost 7122 entries and it is very nice amount for now. Not having decent digital dictionary from Turkish to Kyrgyz was a big issue.I am planning to revise it later.
Transfer rules
It is really though to come up with certain transfer rules between Turkish and Kyrgyz. Even though we come with some rules which are working very well. Still in my plans to build new rules and revise existing ones.
CG
We use same CG which is used in apertium-tr-az.Obviously it must be developed and revised further. And I'd like to thank #zfe and #spectre for their great work.
Statistics
- Dictionaries
- trmorph lexicon:
apertium-tr-ky.tr-ky.dix
(unique: , total: ) 7122apertium-tr-ky.ky.lexc
- Coverage
- Turkish Wikipedia ( , std. dev.: )
- Turkish SETimes ( , std. dev.: )
- Turkish ... ( , std. dev.: )
- Rules
- Error rate
File | Num. Words | % OOV | WER (Sur) | PER (Sur) | WER (Lem) | PER (Lem) |
---|---|---|---|---|---|---|
setimes.kosova_plate.tr.txt |
243 | - | - | - | - | - |
setimes.kosova.tr.txt |
424 | - | - | - | - | - |
setimes.bulgar.tr.txt |
395 | - | - | - | - | - |
wikipedia.kadinlar_askerler.tr.txt |
1165 | - | - | - | - | - |
Future work
Thanks
Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.