Difference between revisions of "Languages of Central Asia"
Jump to navigation
Jump to search
Firespeaker (talk | contribs) (→Status) |
|||
(5 intermediate revisions by 2 users not shown) | |||
Line 37: | Line 37: | ||
| <code>[[apertium-kir]]</code> |
| <code>[[apertium-kir]]</code> |
||
| [[Kyrgyz]] |
| [[Kyrgyz]] |
||
| {{#lst:apertium-kir/stats|nativename}} |
|||
| кыргыз тили |
|||
| <code>ky</code> |
| <code>ky</code> |
||
| <code>kir</code> |
| <code>kir</code> |
||
Line 44: | Line 44: | ||
|align="right"|{{:Kymorph/stems}} |
|align="right"|{{:Kymorph/stems}} |
||
|align="center"| [[Kymorph#Current State|~{{:Apertium-kir/stats/average}}%]] |
|align="center"| [[Kymorph#Current State|~{{:Apertium-kir/stats/average}}%]] |
||
| |
| {{#lst:apertium-kir/stats|location}} |
||
| {{#lst:apertium-kir/stats|authors}} |
|||
| [[User:Firespeaker|Jonathan]], [[User:gantu|Mirlan]], [[User:Francis Tyers|Fran]] |
|||
|- |
|- |
||
| <code>[[apertium-uzb]]</code> || [[Uzbek]] || o'zbek tili || <code>uz</code> || <code>uzb</code> || HFST (lexc+twol) || development ||align="right"| {{#lst:apertium-uzb/stats|stems}}||align="center"|[[apertium-uzb#Current_State|~{{:apertium-uzb/stats/average}}%]] |
| <code>[[apertium-uzb]]</code> || [[Uzbek]] || o'zbek tili || <code>uz</code> || <code>uzb</code> || HFST (lexc+twol) || development ||align="right"| {{#lst:apertium-uzb/stats|stems}}||align="center"|[[apertium-uzb#Current_State|~{{:apertium-uzb/stats/average}}%]] |
||
| {{#lst:apertium-uzb/stats|location}} |
|||
| {{#lst:apertium-uzb/stats|authors}} |
|||
|- |
|- |
||
| <code>[[apertium-tuk]]</code> |
| <code>[[apertium-tuk]]</code> |
||
Line 63: | Line 65: | ||
| <code>[[apertium-kaa]]</code> |
| <code>[[apertium-kaa]]</code> |
||
|| [[Karakalpak]] |
|| [[Karakalpak]] |
||
|| {{#lst:apertium-kaa/stats|nativename}} |
|||
|| Qaraqalpaq tili |
|||
|| <code>-</code> |
|| <code>-</code> |
||
|| <code>kaa</code> |
|| <code>kaa</code> |
||
Line 70: | Line 72: | ||
|align="right"| {{#lst:apertium-kaa/stats|stems}} |
|align="right"| {{#lst:apertium-kaa/stats|stems}} |
||
|align="center"| [[apertium-kaa#Current_State|~{{:apertium-kaa/stats/average}}%]] |
|align="center"| [[apertium-kaa#Current_State|~{{:apertium-kaa/stats/average}}%]] |
||
| {{#lst:apertium-kaa/stats|location}} |
|||
| {{#lst:apertium-kaa/stats|authors}} |
|||
|| [[User:Francis Tyers|Fran]] |
|||
|- |
|||
| <code>[[apertium-uig]]</code> |
|||
|| [[Uyghur]] |
|||
|| {{#lst:apertium-uig/stats|nativename}} |
|||
|| <code>ug</code> |
|||
|| <code>uig</code> |
|||
|| HFST (lexc+twol) |
|||
|| prototype |
|||
|align="right"| {{#lst:apertium-uig/stats|stems}} |
|||
|align="center"| [[apertium-uig#Current_State|~{{:apertium-uig/stats/average}}%]] |
|||
| {{#lst:apertium-uig/stats|location}} |
|||
| {{#lst:apertium-uig/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-tgk]]</code> |
|| <code>[[apertium-tgk]]</code> |
||
Line 85: | Line 99: | ||
|| |
|| |
||
|} |
|} |
||
=== Existing language pairs === |
=== Existing language pairs === |
||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable dixtable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! !! kaz !! kir !! |
! !! kaz !! kir !! uzb !! kaa !! tuk !! uig !! tgk !! prs |
||
|- |
|- |
||
| '''kaz''' || - || |
| '''kaz''' || - || [[Apertium-kaz-kir|kaz-kir]]<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir_stems}} || || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa_stems}} || || ''[[Apertium-kaz-uig|kaz-uig]]''<br>{{#lst:Apertium-kaz-uig/stats|kaz-uig_stems}} || || |
||
|- |
|- |
||
| '''kir''' || |
| '''kir''' || [[Apertium-kaz-kir|kaz-kir]]<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir_stems}} || - || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb_stems}} || || || || || |
||
|- |
|- |
||
| ''' |
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb_stems}} || - || || || || || |
||
|- |
|- |
||
| ''' |
| '''kaa''' || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa_stems}} || || || - || || || || |
||
|- |
|- |
||
| ''' |
| '''tuk''' || || || || || - || || || |
||
|- |
|- |
||
| ''' |
| '''uig''' || ''[[Apertium-kaz-uig|kaz-uig]]''<br>{{#lst:Apertium-kaz-uig/stats|kaz-uig_stems}} || || || || || - || || |
||
|- |
|- |
||
| ''' |
| '''tgk''' || || || || || || || - || |
||
|- |
|- |
||
| ''' |
| '''prs''' || || || || || || || || - |
||
|- |
|- |
||
| || || || || || || || || |
| || || || || || || || || |
||
|- |
|- |
||
| '''eng''' || ''[[Apertium-eng-kaz|eng-kaz]]''<br>{{#lst:Apertium-eng-kaz/stats|eng- |
| '''eng''' || ''[[Apertium-eng-kaz|eng-kaz]]''<br>{{#lst:Apertium-eng-kaz/stats|eng-kaz_stems}} || ''[[Apertium-ky-en|ky-en]]''<br>{{#lst:Apertium-ky-en/stats|ky-en_stems}} || || || || || || |
||
|- |
|- |
||
| '''fas''' || || || || || || [[Apertium-tg-fa|tg-fa]]<br>{{#lst:Apertium-tg-fa/stats|tg- |
| '''fas''' || || || || || || || [[Apertium-tg-fa|tg-fa]]<br>{{#lst:Apertium-tg-fa/stats|tg-fa_stems}} || |
||
|- |
|- |
||
| '''khk''' || ''[[Apertium-khk-kaz|khk-kaz]]''<br>{{#lst:Apertium-khk-kaz/stats|khk- |
| '''khk''' || ''[[Apertium-khk-kaz|khk-kaz]]''<br>{{#lst:Apertium-khk-kaz/stats|khk-kaz_stems}} || || || || || || || |
||
|- |
|- |
||
| '''nog''' || ''[[Apertium-nog-kaz|nog-kaz]]''<br>{{#lst:Apertium-nog-kaz/stats|nog- |
| '''nog''' || ''[[Apertium-nog-kaz|nog-kaz]]''<br>{{#lst:Apertium-nog-kaz/stats|nog-kaz_stems}} || || || || || || || |
||
|- |
|- |
||
| '''tat''' || '''[[Apertium-kaz-tat|kaz-tat]]'''<br>'''{{#lst:Apertium-kaz-tat/stats|kaz- |
| '''tat''' || '''[[Apertium-kaz-tat|kaz-tat]]'''<br>'''{{#lst:Apertium-kaz-tat/stats|kaz-tat_stems}}''' || ''[[Apertium-tat-kir|tat-kir]]''<br>{{#lst:Apertium-tat-kir/stats|tat-kir_stems}} || || || || || || |
||
|- |
|- |
||
| '''tur''' || || [[Apertium-tur-kir|tur-kir]]<br>{{#lst:Apertium-tur-kir/stats|tur- |
| '''tur''' || || [[Apertium-tur-kir|tur-kir]]<br>{{#lst:Apertium-tur-kir/stats|tur-kir_stems}} || [[Apertium-tur-uzb|tur-uzb]]<br>{{#lst:Apertium-tur-uzb/stats|tur-uzb_stems}} || || [[Apertium-tuk-tur|tuk-tur]]<br>{{#lst:Apertium-tuk-tur/stats|tuk-tur_stems}} || || || |
||
|} |
|} |
Latest revision as of 23:13, 22 December 2014
The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenistan, Tajikistan, and Afghanistan. These include Kazakh, Kyrgyz, Uzbek, Turkmen, Tajik, Dari, Pashto, Uyghur, and Karakalpak.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
Transducers[edit]
name | Language | native name | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||
apertium-kaz
|
Kazakh | қазақ тілі | kk
|
kaz
|
HFST (lexc+twol) | production | 36,595 | ~94.5% | apertium-kaz (languages) | Ilnar, Jonathan, Fran, Nathan |
apertium-kir
|
Kyrgyz | кыргыз тили | ky
|
kir
|
HFST (lexc+twol) | working | 8,564 | ~90.4% | apertium-kir (languages) | Jonathan, Mirlan, Fran, Qantörö |
apertium-uzb |
Uzbek | o'zbek tili | uz |
uzb |
HFST (lexc+twol) | development | 34,470 | ~82.9% | ||
apertium-tuk
|
Turkmen | Türkmençe | tk
|
tuk
|
HFST (lexc+twol) | development | 2,988 | ~70.7% | apertium-tuk (languages) | Fran |
apertium-kaa
|
Karakalpak | Qaraqalpaq tili | -
|
kaa
|
HFST (lexc+twol) | prototype | 25,545 | ~86.1% | apertium-kaa (languages) | Beknazar, Fran, Jonathan |
apertium-uig
|
Uyghur | ئۇيغۇر تىلى | ug
|
uig
|
HFST (lexc+twol) | prototype | 17,585 | ~54.2% | apertium-uig (incubator) | Jonathan, Märdan, Fran |
apertium-tgk
|
Tajik | забони тоҷикӣ | tg
|
tgk
|
lttoolbox | development |
Existing language pairs[edit]
kaz | kir | uzb | kaa | tuk | uig | tgk | prs | |
---|---|---|---|---|---|---|---|---|
kaz | - | kaz-kir ? |
kaz-kaa 5,408 |
kaz-uig 2,728 |
||||
kir | kaz-kir ? |
- | kir-uzb 268 |
|||||
uzb | kir-uzb 268 |
- | ||||||
kaa | kaz-kaa 5,408 |
- | ||||||
tuk | - | |||||||
uig | kaz-uig 2,728 |
- | ||||||
tgk | - | |||||||
prs | - | |||||||
eng | eng-kaz 16,931 |
ky-en ? |
||||||
fas | tg-fa 502 |
|||||||
khk | khk-kaz 134 |
|||||||
nog | nog-kaz 9 |
|||||||
tat | 'kaz-tat ' |
tat-kir |
||||||
tur | tur-kir 7,123 |
tur-uzb 3,519 |
tuk-tur 3,387 |