Difference between revisions of "Languages of Central Asia"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) (Created page with "The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenista...") |
Firespeaker (talk | contribs) (→Status) |
||
Line 6: | Line 6: | ||
=== Transducers === |
=== Transducers === |
||
{| class="wikitable sortable" |
|||
|- |
|||
!rowspan=2| name |
|||
!rowspan=2| Language |
|||
!rowspan=2| native name |
|||
!colspan=2 class="unsortable"| ISO 639 |
|||
!rowspan=2| formalism |
|||
!rowspan=2| state |
|||
!rowspan=2| stems |
|||
!rowspan=2| coverage |
|||
!rowspan=2| location |
|||
!rowspan=2 class="unsortable"| primary authors |
|||
|-class="sortbottom" |
|||
! -2 |
|||
! -3 |
|||
|- |
|||
|| <code>[[apertium-kaz]]</code> |
|||
|| [[Kazakh]] |
|||
|| қазақ тілі |
|||
|| <code>kk</code> |
|||
|| <code>kaz</code> |
|||
|| HFST (lexc+twol) |
|||
|| production |
|||
|align="right"| {{#lst:apertium-kaz/stats|stems}} |
|||
|align="center"| [[Apertium-kaz#Current_State|~{{:Apertium-kaz/stats/average}}%]] |
|||
|| [[apertium-kaz]] ([[languages]]) |
|||
|| [[User:Ilnar.salimzyan|Ilnar]], [[User:Firespeaker|Jonathan]], [[User:Francis Tyers|Fran]], [[User:nathan0n5ire|Nathan]] |
|||
|- |
|||
| <code>[[apertium-kir]]</code> |
|||
| [[Kyrgyz]] |
|||
| кыргыз тили |
|||
| <code>ky</code> |
|||
| <code>kir</code> |
|||
| HFST (lexc+twol) |
|||
| working |
|||
|align="right"|{{:Kymorph/stems}} |
|||
|align="center"| [[Kymorph#Current State|~{{:Apertium-kir/stats/average}}%]] |
|||
| [[apertium-kir]] ([[languages]]) |
|||
| [[User:Firespeaker|Jonathan]], [[User:gantu|Mirlan]], [[User:Francis Tyers|Fran]] |
|||
|- |
|||
| <code>[[apertium-uzb]]</code> || [[Uzbek]] || o'zbek tili || <code>uz</code> || <code>uzb</code> || HFST (lexc+twol) || development ||align="right"| {{#lst:apertium-uzb/stats|stems}}||align="center"|[[apertium-uzb#Current_State|~{{:apertium-uzb/stats/average}}%]] || [[apertium-uzb]] ([[languages]]) || |
|||
|- |
|||
| <code>[[apertium-tuk]]</code> |
|||
|| [[Turkmen]] |
|||
|| Türkmençe |
|||
|| <code>tk</code> |
|||
|| <code>tuk</code> |
|||
|| HFST (lexc+twol) |
|||
|| development |
|||
|align="right"| {{#lst:apertium-tuk/stats|stems}} |
|||
|align="center"| [[apertium-tuk#Current_State|~{{:apertium-tuk/stats/average}}%]] |
|||
|| [[apertium-tuk]] ([[languages]]) |
|||
|| [[User:Francis Tyers|Fran]] |
|||
|- |
|||
| <code>[[apertium-kaa]]</code> |
|||
|| [[Karakalpak]] |
|||
|| Qaraqalpaq tili |
|||
|| <code>-</code> |
|||
|| <code>kaa</code> |
|||
|| HFST (lexc+twol) |
|||
|| prototype |
|||
|align="right"| {{#lst:apertium-kaa/stats|stems}} |
|||
|align="center"| [[apertium-kaa#Current_State|~{{:apertium-kaa/stats/average}}%]] |
|||
|| [[apertium-kaa]] ([[incubator]]) |
|||
|| [[User:Francis Tyers|Fran]] |
|||
|- |
|||
|| <code>[[apertium-tgk]]</code> |
|||
|| Tajik |
|||
| забони тоҷикӣ |
|||
|| <code>tg</code> |
|||
|| <code>tgk</code> |
|||
|| [[lttoolbox]] |
|||
|| development |
|||
|align="right"| |
|||
|align="center"| |
|||
|| |
|||
|| |
|||
|} |
|||
=== Existing language pairs === |
=== Existing language pairs === |
||
Line 11: | Line 91: | ||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! !! kaz !! kir !! tuk !! uzb !! |
! !! kaz !! kir !! tuk !! uzb !! kaa !! tgk !! prs !! uig |
||
|- |
|- |
||
| '''kaz''' || - || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} |
| '''kaz''' || - || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || || || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} || || || |
||
|- |
|- |
||
| '''kir''' || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || - || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || || || |
| '''kir''' || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || - || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || || || |
||
Line 21: | Line 101: | ||
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || - || || || || |
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || - || || || || |
||
|- |
|- |
||
| ''' |
| '''kaa''' || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} || || || || - || || || |
||
|- |
|- |
||
| '''tgk''' || || || || || || - || || |
| '''tgk''' || || || || || || - || || |
||
|- |
|- |
||
| ''' |
| '''prs''' || || || || || || || - || |
||
|- |
|- |
||
| ''' |
| '''uig''' || || || || || || || || - |
||
|- |
|- |
||
| || || || || || || || || |
| || || || || || || || || |
Revision as of 09:19, 9 January 2014
The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenistan, Tajikistan, and Afghanistan. These include Kazakh, Kyrgyz, Uzbek, Turkmen, Tajik, Dari, Pashto, Uyghur, and Karakalpak.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
Transducers
name | Language | native name | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||
apertium-kaz
|
Kazakh | қазақ тілі | kk
|
kaz
|
HFST (lexc+twol) | production | 36,595 | ~94.5% | apertium-kaz (languages) | Ilnar, Jonathan, Fran, Nathan |
apertium-kir
|
Kyrgyz | кыргыз тили | ky
|
kir
|
HFST (lexc+twol) | working | 8,564 | ~90.4% | apertium-kir (languages) | Jonathan, Mirlan, Fran |
apertium-uzb |
Uzbek | o'zbek tili | uz |
uzb |
HFST (lexc+twol) | development | 34,470 | ~82.9% | apertium-uzb (languages) | |
apertium-tuk
|
Turkmen | Türkmençe | tk
|
tuk
|
HFST (lexc+twol) | development | 2,988 | ~70.7% | apertium-tuk (languages) | Fran |
apertium-kaa
|
Karakalpak | Qaraqalpaq tili | -
|
kaa
|
HFST (lexc+twol) | prototype | 25,545 | ~86.1% | apertium-kaa (incubator) | Fran |
apertium-tgk
|
Tajik | забони тоҷикӣ | tg
|
tgk
|
lttoolbox | development |
Existing language pairs
kaz | kir | tuk | uzb | kaa | tgk | prs | uig | |
---|---|---|---|---|---|---|---|---|
kaz | - | kaz-kir |
kaz-kaa |
|||||
kir | kaz-kir |
- | kir-uzb |
|||||
tuk | - | |||||||
uzb | kir-uzb |
- | ||||||
kaa | kaz-kaa |
- | ||||||
tgk | - | |||||||
prs | - | |||||||
uig | - | |||||||
eng | eng-kaz |
ky-en |
||||||
fas | tg-fa |
|||||||
khk | khk-kaz |
|||||||
nog | nog-kaz |
|||||||
tat | 'kaz-tat ' |
tat-kir |
||||||
tur | tur-kir |
tuk-tur |
tur-uzb |