Difference between revisions of "User:Gantu/Application"
Line 244: | Line 244: | ||
{{comment|And if the words are ambiguous ? - [[User:Francis Tyers|Francis Tyers]]}} |
{{comment|And if the words are ambiguous ? - [[User:Francis Tyers|Francis Tyers]]}} |
||
If I could get part-of-speeches of Turkish words in dictionary I could use it in building apertium tr-ky bilingual dictionary. I am planning to cover at least |
If I could get part-of-speeches of Turkish words in dictionary I could use it in building apertium tr-ky bilingual dictionary. I am planning to cover at least 3000 of word in 12 days. |
||
{{comment|50% of the dictionary means how much ? - [[User:Francis Tyers|Francis Tyers]] 23:36, 3 April 2011 (UTC)}} |
{{comment|50% of the dictionary means how much ? - [[User:Francis Tyers|Francis Tyers]] 23:36, 3 April 2011 (UTC)}} |
Revision as of 09:24, 4 April 2011
Mirlan Ipasov
gantu on IRC: #apertium
, #hfst
Bio
- B. Computer Science at IAAU Bishkek/Kyrgyzstan http://www.iaau.edu.kg in 2000-2004
- MTech in Artificial Intelligence at HCU Hyderabad/India 2005-2007
- PhD Student at IAAU 2010-Approximate completing year 2014
- System Administrator at IAAU Bishkek/Kyrgyzstan 2004-2005
- Software Developer BaraanSoft Hyderabad/India 2005-2007
- Teaching Assistant at IAAU 2008
Languages:
- Native: Kyrgyz, Russian, Turkish
- Fluent: English
- Some: Kazakh,Uzbek
Programming:
- Java, Python, C
Developing new Turkish (tr
) → Kyrgyz (ky
) apertium language pair.
Apertium is one of the effective and developing and at the same time working out of the box open source project based on machine translation out here. In almost all machine translation projects the Turkic languages are very poorly developed, some languages are not developed at all. I will develop a new Turkic language pair Turkish (tr
)-Kyrgyz (ky
) where Kyrgyz (ky
) side will be completely developed from scratch.
And apertium project has all of the tools and platform to do that.
As a part of this project a new bilingual dictionary and morphological analyser for Kyrgyz language will be developed.
Both of them could be used in many purposes. The project deliverables could be used as translation tool from tr-ky and as educational tool as well. And it could help as a tutorial for other Turkic based languages.
Deliverables:
1. Bilingual tr-ky dictionary
- will be prepared from already developed starDict dictionary has almost 4,600 unique entries.
- 2-days investigation
- 12-days for dictionary build up
- total 2 weeks of work
2. A Morphological analyser/generator for Kyrgyz (ky
)
- 7-days of investigation and reading
- 21-days of programming and preparing analyser/generator
- 7-days for testing and documentation
- total of approximately 5 weeks.
3. Transfer rules.
- 7-days of reading,preparing and testing transfer rules.
- total of 1 weeks
4. Script to trim lexica
- 7-days of reading, programming and testing.
- total of 1 week
5. Deploying all into apertium and preparing apertium tr-ky language pair.
- 10-days Testing translations and correcting errors.
- 11-days Documentation
- total of 3 weeks
Language Information
Noun morphology
Kyrgyz language has several cases:
- absolute,
- definite-accusative,
- dative,
- locative,
- ablative,
- genitive
Words in Kyrgyz language morphologically built by applying suffixes in following order:
- plural suffix
- suffix of possession
- personal suffix
- case-ending
китеп (kitep)= book is the stem китеп+plural+pronoun китептер (kitepter) китептер (kitepter) is books китептеримден (kitepterimden) китеп +(pl)тер +(pronoun)им +(case)ден from by books
A noun has 5 cases
ky | Gloss | tr |
---|---|---|
китеп (kitep) | book | kitap |
китептин (kiteptin) | of that book | o kitabın |
китепке (kitepke) | to that book | o kitaba |
китепти (kitepti) | that book | o kitabı |
китепте (kitepte) | in that book | o kitapta |
китептен (kitepten) | from that book | o kitaptan |
Agglutination case
ex:
verb = окуу (okuu) = to read, stem = оку (oku) read (ky) окуп жатам (okwp jatam ) оку+п жат+ам (present continous, pr1, kyrgyz) I am reading (Mostly verbs in present continuous tense defined by two verbs. ex: оку+п жат+ам, жатам --> helping verb to define the present continuous tense). (tr) okuyorum oku+yor+um (present continous, pr1, turkish) I am reading gidiyorum = I am going git (lemma) -i -yor (for continuous tense) -um (for first personal pronoun) (Turkish) (ky) окуп жатам (okup žatam) = I am reading оку (lemma) +п(for continuous tense) жат(second verb for continuous tense) +ам (for first personal pronoun) (present continuous, p1sg, kyrgyz) окудум (okudum) = I read оку (lemma) +ду (for past tense) +м(for first personal pronoun) (past tense, p1sg, Kyrgyz)
Vowel harmony
Generally there is vowel harmony in Kyrgyz language but words imported from other languages like Russian do not obey vowel harmony restrictions.
китеп (kitep) book китептер (kitepter) books китептерим (kitepterim) my books китептеримден (kitepterimden) китеп.тер.им.ден китеп+Pl+Px1Sg+Abl `From my books.' In Turkish the word "bira" (beer) is imported from French, and in Kyrgyz, пиво "pivo" is imported from Russian пиво (pivo) beer пивалар (pivalar) beers пиваларым (pïvalarım) my beers пиваларымдан (pïvalarımdan) from my beers
Noun and Verb comparisons
Noun :
Kyrgyz | Turkish | Gloss |
---|---|---|
китеп (kitep) | kitab | book |
китептер (kitepter) | kitaplar | books |
китебим (kitebim) | kitabım | my book |
китептерим (kitepterim) | kitaplarım | my books |
китептен (kitepten) | kitaptan | from book |
китептерден (kitepterden) | kitaplardan | from books |
китебимден (kitebimden) | kitabımdan | from my book |
китептеримден (kitepterimden) | kitaplarımdan | from my books |
Verb:
Kyrgyz | Turkish | Gloss |
---|---|---|
ойнойм (ojnojm) | oynarım | I play |
ойнойсуң (ojnojsuŋ) | oynarsın | You play |
ойнойт (ojnojt) | oynar | He plays |
ойнойт (ojnojt) | oynar | She plays |
ойнойт (ojnojt) | oynar | It plays |
ойнойсуңуз (ojnojsuŋuz) | oynarsınız | You (pl.) play |
ойнойбуз (ojnojbuz) | oynarız | We play |
ойнощот (ojnošot) | oynarlar | The play |
Dictionary Information
As I mentioned above one part of the project is building bilingual dictionary. Fortunately I am not going to build it from nothing. There is StarDict version of Turkish-Kyrgyz dictionary without part-of-speech definitions. StarDict tr-ky dictionary consist 4,600 unique entries of nouns and verbs mixed. Part-of-speech could be extracted by using trmorph (open-source Turkish morphological analyser).
And if the words are ambiguous ? - Francis Tyers
If I could get part-of-speeches of Turkish words in dictionary I could use it in building apertium tr-ky bilingual dictionary. I am planning to cover at least 3000 of word in 12 days.
50% of the dictionary means how much ? - Francis Tyers 23:36, 3 April 2011 (UTC)
Kyrgyz Language morphological analyser/generator
I am developing it by using HFST (Open source project for developing morphological analyser/generators).I am still learning HFST, but simple Kyrgyz language analyser is ready according to Francis Tyers tutorial (http://wiki.apertium.org/wiki/Starting_a_new_language_with_HFST).
What is done already
An apertium project tr-ky is created as a project on SourceForge which consist of bilingual dictionary and simple Kyrgyz language morphological analyzer.
Resources
1.http://wiki.apertium.org/wiki/Apertium_New_Language_Pair_HOWTO
2.http://wiki.apertium.org/wiki/Turkish_to_Azerbaijani
3.http://wiki.apertium.org/wiki/Starting_a_new_language_with_HFST
4.http://www.let.rug.nl/~coltekin/trmorph/
5.http://wiki.apertium.org/wiki/Kyrgyz
6.http://www.ling.helsinki.fi/kieliteknologia/tutkimus/hfst/