Difference between revisions of "Languages of Central Asia"

From Apertium
Jump to navigation Jump to search
(Created page with "The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenista...")
 
Line 6: Line 6:


=== Transducers ===
=== Transducers ===

{| class="wikitable sortable"
|-
!rowspan=2| name
!rowspan=2| Language
!rowspan=2| native name
!colspan=2 class="unsortable"| ISO 639
!rowspan=2| formalism
!rowspan=2| state
!rowspan=2| stems
!rowspan=2| coverage
!rowspan=2| location
!rowspan=2 class="unsortable"| primary authors
|-class="sortbottom"
! -2
! -3
|-
|| <code>[[apertium-kaz]]</code>
|| [[Kazakh]]
|| қазақ тілі
|| <code>kk</code>
|| <code>kaz</code>
|| HFST (lexc+twol)
|| production
|align="right"| {{#lst:apertium-kaz/stats|stems}}
|align="center"| [[Apertium-kaz#Current_State|~{{:Apertium-kaz/stats/average}}%]]
|| [[apertium-kaz]]&nbsp;([[languages]])
|| [[User:Ilnar.salimzyan|Ilnar]], [[User:Firespeaker|Jonathan]], [[User:Francis Tyers|Fran]], [[User:nathan0n5ire|Nathan]]
|-
| <code>[[apertium-kir]]</code>
| [[Kyrgyz]]
| кыргыз тили
| <code>ky</code>
| <code>kir</code>
| HFST (lexc+twol)
| working
|align="right"|{{:Kymorph/stems}}
|align="center"| [[Kymorph#Current State|~{{:Apertium-kir/stats/average}}%]]
| [[apertium-kir]]&nbsp;([[languages]])
| [[User:Firespeaker|Jonathan]], [[User:gantu|Mirlan]], [[User:Francis Tyers|Fran]]
|-
| <code>[[apertium-uzb]]</code> || [[Uzbek]] || o'zbek tili || <code>uz</code> || <code>uzb</code> || HFST (lexc+twol) || development ||align="right"| {{#lst:apertium-uzb/stats|stems}}||align="center"|[[apertium-uzb#Current_State|~{{:apertium-uzb/stats/average}}%]] || [[apertium-uzb]]&nbsp;([[languages]]) ||
|-
| <code>[[apertium-tuk]]</code>
|| [[Turkmen]]
|| Türkmençe
|| <code>tk</code>
|| <code>tuk</code>
|| HFST (lexc+twol)
|| development
|align="right"| {{#lst:apertium-tuk/stats|stems}}
|align="center"| [[apertium-tuk#Current_State|~{{:apertium-tuk/stats/average}}%]]
|| [[apertium-tuk]]&nbsp;([[languages]])
|| [[User:Francis Tyers|Fran]]
|-
| <code>[[apertium-kaa]]</code>
|| [[Karakalpak]]
|| Qaraqalpaq tili
|| <code>-</code>
|| <code>kaa</code>
|| HFST (lexc+twol)
|| prototype
|align="right"| {{#lst:apertium-kaa/stats|stems}}
|align="center"| [[apertium-kaa#Current_State|~{{:apertium-kaa/stats/average}}%]]
|| [[apertium-kaa]]&nbsp;([[incubator]])
|| [[User:Francis Tyers|Fran]]
|-
|| <code>[[apertium-tgk]]</code>
|| Tajik
| забони тоҷикӣ
|| <code>tg</code>
|| <code>tgk</code>
|| [[lttoolbox]]
|| development
|align="right"|
|align="center"|
||
||
|}



=== Existing language pairs ===
=== Existing language pairs ===
Line 11: Line 91:
{| style="text-align: center;" class="wikitable"
{| style="text-align: center;" class="wikitable"
|- style="background: #ececec"
|- style="background: #ececec"
! !! kaz !! kir !! tuk !! uzb !! uig !! tgk !! kaa !! prs
! !! kaz !! kir !! tuk !! uzb !! kaa !! tgk !! prs !! uig
|-
|-
| '''kaz''' || - || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || || || || || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} ||
| '''kaz''' || - || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || || || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} || || ||
|-
|-
| '''kir''' || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || - || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || || ||
| '''kir''' || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || - || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || || ||
Line 21: Line 101:
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || - || || || ||
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || - || || || ||
|-
|-
| '''uig''' || || || || || - || || ||
| '''kaa''' || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} || || || || - || || ||
|-
|-
| '''tgk''' || || || || || || - || ||
| '''tgk''' || || || || || || - || ||
|-
|-
| '''kaa''' || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} || || || || || || - ||
| '''prs''' || || || || || || || - ||
|-
|-
| '''prs''' || || || || || || || || -
| '''uig''' || || || || || || || || -
|-
|-
| || || || || || || || ||
| || || || || || || || ||

Revision as of 09:19, 9 January 2014

The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenistan, Tajikistan, and Afghanistan. These include Kazakh, Kyrgyz, Uzbek, Turkmen, Tajik, Dari, Pashto, Uyghur, and Karakalpak.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status

Transducers

name Language native name ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-kaz Kazakh қазақ тілі kk kaz HFST (lexc+twol) production 36,595 ~94.5% apertium-kaz (languages) Ilnar, Jonathan, Fran, Nathan
apertium-kir Kyrgyz кыргыз тили ky kir HFST (lexc+twol) working 8,564 ~90.4% apertium-kir (languages) Jonathan, Mirlan, Fran
apertium-uzb Uzbek o'zbek tili uz uzb HFST (lexc+twol) development 34,470 ~82.9% apertium-uzb (languages)
apertium-tuk Turkmen Türkmençe tk tuk HFST (lexc+twol) development 2,988 ~70.7% apertium-tuk (languages) Fran
apertium-kaa Karakalpak Qaraqalpaq tili - kaa HFST (lexc+twol) prototype 25,545 ~86.1% apertium-kaa (incubator) Fran
apertium-tgk Tajik забони тоҷикӣ tg tgk lttoolbox development


Existing language pairs

kaz kir tuk uzb kaa tgk prs uig
kaz - kaz-kir
kaz-kaa
kir kaz-kir
- kir-uzb
tuk -
uzb kir-uzb
-
kaa kaz-kaa
-
tgk -
prs -
uig -
eng eng-kaz
ky-en
fas tg-fa
khk khk-kaz
nog nog-kaz
tat 'kaz-tat
'
tat-kir
tur tur-kir
tuk-tur
tur-uzb