Languages of Central Asia
Revision as of 07:17, 14 December 2014 by StemCounterBot (talk | contribs)
The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenistan, Tajikistan, and Afghanistan. These include Kazakh, Kyrgyz, Uzbek, Turkmen, Tajik, Dari, Pashto, Uyghur, and Karakalpak.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
Transducers
name | Language | native name | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||
apertium-kaz
|
Kazakh | қазақ тілі | kk
|
kaz
|
HFST (lexc+twol) | production | 36,595 | ~94.5% | apertium-kaz (languages) | Ilnar, Jonathan, Fran, Nathan |
apertium-kir
|
Kyrgyz | кыргыз тили | ky
|
kir
|
HFST (lexc+twol) | working | 8,564 | ~90.4% | apertium-kir (languages) | Jonathan, Mirlan, Fran, Qantörö |
apertium-uzb |
Uzbek | o'zbek tili | uz |
uzb |
HFST (lexc+twol) | development | 34,470 | ~82.9% | ||
apertium-tuk
|
Turkmen | Türkmençe | tk
|
tuk
|
HFST (lexc+twol) | development | 2,988 | ~70.7% | apertium-tuk (languages) | Fran |
apertium-kaa
|
Karakalpak | Qaraqalpaq tili | -
|
kaa
|
HFST (lexc+twol) | prototype | 25,545 | ~86.1% | apertium-kaa (languages) | Beknazar, Fran, Jonathan |
apertium-uig
|
Uyghur | ئۇيغۇر تىلى | ug
|
uig
|
HFST (lexc+twol) | prototype | 17,585 | ~54.2% | apertium-uig (incubator) | Jonathan, Märdan, Fran |
apertium-tgk
|
Tajik | забони тоҷикӣ | tg
|
tgk
|
lttoolbox | development |
Existing language pairs
kaz | kir | uzb | kaa | tuk | uig | tgk | prs | |
---|---|---|---|---|---|---|---|---|
kaz | - | kaz-kir ? |
kaz-kaa 5,408 |
kaz-uig 2,728 |
||||
kir | kaz-kir ? |
- | kir-uzb 268 |
|||||
uzb | kir-uzb 268 |
- | ||||||
kaa | kaz-kaa 5,408 |
- | ||||||
tuk | - | |||||||
uig | kaz-uig 2,728 |
- | ||||||
tgk | - | |||||||
prs | - | |||||||
eng | eng-kaz 16,931 |
ky-en ? |
||||||
fas | tg-fa 502 |
|||||||
khk | khk-kaz 134 |
|||||||
nog | nog-kaz 9 |
|||||||
tat | 'kaz-tat ' |
tat-kir |
||||||
tur | tur-kir 7,123 |
tur-uzb 3,519 |
tuk-tur 3,387 |