Difference between revisions of "Turkish and Kyrgyz/Final report"

From Apertium
Jump to navigation Jump to search
Line 52: Line 52:
   
 
==Future work==
 
==Future work==
  +
I am planning to revise both bidix and lexc on kymorph. Right now i am working on transfer rules so coming days i will come up with more precise rules between Turkish and Kyrgyz. I understand that creating detailed lexicon database of Kyrgyz is great need so in my plans to revise firespeakers database.
  +
 
==Thanks==
 
==Thanks==
 
Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.
 
Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.

Revision as of 08:24, 25 August 2011

placeholder

Description

The aim of this project was to develop apertium machine translation apertium-tr-ky between Turkish and Kyrgyz languages. It was really challenging and hard at the same time. The translation is not perfect, there are lot of things to improve but it is quite satisfying according to the period of time and work done. By using apertium-tr-ky we have translated several child stories and showed it to native speakers of Kyrgyz and we get very positive reactions from them. Right now apertium-tr-ky is the only MT tool from any language to Kyrgyz so I am sure that it is a great success.

TRmorph

We used TRmorph As a morphological analyzer/generator for Turkish. Even though there are so many things to improve in TRmorph it is quite usable.Thanks to Çağrı for his great work.

kymorph

And we developed new morphological analyzer/generator kymorph for Kyrgyz language from scratch as there is no other. So we can say that kymorph is the only (right now) morphological analyzer/generator for Kyrgyz language.It is developed by lexc+twol on HFST. This part was the toughest part because of not having good resources on Kyrgyz language about morphological structure and lexicon database with part of speeches. We achieved coverage of % on SETimes corpora. And i am really happy with kymorph. Special thanks to firspeaker.

bidix

Our bidix entry is almost 7122 entries and it is very nice amount for now. Not having decent digital dictionary from Turkish to Kyrgyz was a big issue.I am planning to revise it later.

Transfer rules

It is really though to come up with certain transfer rules between Turkish and Kyrgyz. Even though we come with some rules which are working very well. Still in my plans to build new rules and revise existing ones.

CG

We use same CG which is used in apertium-tr-az.Obviously it must be developed and revised further. And I'd like to thank #zfe and #spectre for their great work.

Statistics

Dictionaries
  • trmorph lexicon:
  • apertium-tr-ky.tr-ky.dix (unique: , total: ) 7122
  • apertium-tr-ky.ky.lexc
Coverage
  • Turkish Wikipedia ( , std. dev.: )
  • Turkish SETimes ( , std. dev.: )
  • Turkish ... ( , std. dev.: )
Rules
Error rate
File Num. Words % OOV WER (Sur) PER (Sur) WER (Lem) PER (Lem)
setimes.kosova_plate.tr.txt 243 - - - - -
setimes.kosova.tr.txt 424 - - - - -
setimes.bulgar.tr.txt 395 - - - - -
wikipedia.kadinlar_askerler.tr.txt 1165 - - - - -

Future work

I am planning to revise both bidix and lexc on kymorph. Right now i am working on transfer rules so coming days i will come up with more precise rules between Turkish and Kyrgyz. I understand that creating detailed lexicon database of Kyrgyz is great need so in my plans to revise firespeakers database.

Thanks

Out of the top of my head, I would like to thank firespeaker, spectre #apertium, #hfst, without whom this SoC wouldn’t have been a success.