Difference between revisions of "Languages of Central Asia"

From Apertium
Jump to navigation Jump to search
(Created page with "The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenista...")
 
 
(6 intermediate revisions by 2 users not shown)
Line 6: Line 6:


=== Transducers ===
=== Transducers ===

{| class="wikitable sortable"
|-
!rowspan=2| name
!rowspan=2| Language
!rowspan=2| native name
!colspan=2 class="unsortable"| ISO 639
!rowspan=2| formalism
!rowspan=2| state
!rowspan=2| stems
!rowspan=2| coverage
!rowspan=2| location
!rowspan=2 class="unsortable"| primary authors
|-class="sortbottom"
! -2
! -3
|-
|| <code>[[apertium-kaz]]</code>
|| [[Kazakh]]
|| қазақ тілі
|| <code>kk</code>
|| <code>kaz</code>
|| HFST (lexc+twol)
|| production
|align="right"| {{#lst:apertium-kaz/stats|stems}}
|align="center"| [[Apertium-kaz#Current_State|~{{:Apertium-kaz/stats/average}}%]]
|| [[apertium-kaz]]&nbsp;([[languages]])
|| [[User:Ilnar.salimzyan|Ilnar]], [[User:Firespeaker|Jonathan]], [[User:Francis Tyers|Fran]], [[User:nathan0n5ire|Nathan]]
|-
| <code>[[apertium-kir]]</code>
| [[Kyrgyz]]
| {{#lst:apertium-kir/stats|nativename}}
| <code>ky</code>
| <code>kir</code>
| HFST (lexc+twol)
| working
|align="right"|{{:Kymorph/stems}}
|align="center"| [[Kymorph#Current State|~{{:Apertium-kir/stats/average}}%]]
| {{#lst:apertium-kir/stats|location}}
| {{#lst:apertium-kir/stats|authors}}
|-
| <code>[[apertium-uzb]]</code> || [[Uzbek]] || o'zbek tili || <code>uz</code> || <code>uzb</code> || HFST (lexc+twol) || development ||align="right"| {{#lst:apertium-uzb/stats|stems}}||align="center"|[[apertium-uzb#Current_State|~{{:apertium-uzb/stats/average}}%]]
| {{#lst:apertium-uzb/stats|location}}
| {{#lst:apertium-uzb/stats|authors}}
|-
| <code>[[apertium-tuk]]</code>
|| [[Turkmen]]
|| Türkmençe
|| <code>tk</code>
|| <code>tuk</code>
|| HFST (lexc+twol)
|| development
|align="right"| {{#lst:apertium-tuk/stats|stems}}
|align="center"| [[apertium-tuk#Current_State|~{{:apertium-tuk/stats/average}}%]]
|| [[apertium-tuk]]&nbsp;([[languages]])
|| [[User:Francis Tyers|Fran]]
|-
| <code>[[apertium-kaa]]</code>
|| [[Karakalpak]]
|| {{#lst:apertium-kaa/stats|nativename}}
|| <code>-</code>
|| <code>kaa</code>
|| HFST (lexc+twol)
|| prototype
|align="right"| {{#lst:apertium-kaa/stats|stems}}
|align="center"| [[apertium-kaa#Current_State|~{{:apertium-kaa/stats/average}}%]]
| {{#lst:apertium-kaa/stats|location}}
| {{#lst:apertium-kaa/stats|authors}}
|-
| <code>[[apertium-uig]]</code>
|| [[Uyghur]]
|| {{#lst:apertium-uig/stats|nativename}}
|| <code>ug</code>
|| <code>uig</code>
|| HFST (lexc+twol)
|| prototype
|align="right"| {{#lst:apertium-uig/stats|stems}}
|align="center"| [[apertium-uig#Current_State|~{{:apertium-uig/stats/average}}%]]
| {{#lst:apertium-uig/stats|location}}
| {{#lst:apertium-uig/stats|authors}}
|-
|| <code>[[apertium-tgk]]</code>
|| Tajik
| забони тоҷикӣ
|| <code>tg</code>
|| <code>tgk</code>
|| [[lttoolbox]]
|| development
|align="right"|
|align="center"|
||
||
|}


=== Existing language pairs ===
=== Existing language pairs ===


{| style="text-align: center;" class="wikitable"
{| style="text-align: center;" class="wikitable dixtable"
|- style="background: #ececec"
|- style="background: #ececec"
! !! kaz !! kir !! tuk !! uzb !! uig !! tgk !! kaa !! prs
! !! kaz !! kir !! uzb !! kaa !! tuk !! uig !! tgk !! prs
|-
|-
| '''kaz''' || - || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || || || || || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} ||
| '''kaz''' || - || [[Apertium-kaz-kir|kaz-kir]]<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir_stems}} || || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa_stems}} || || ''[[Apertium-kaz-uig|kaz-uig]]''<br>{{#lst:Apertium-kaz-uig/stats|kaz-uig_stems}} || ||
|-
|-
| '''kir''' || ''[[Apertium-kaz-kir|kaz-kir]]''<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir-stems}} || - || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || || ||
| '''kir''' || [[Apertium-kaz-kir|kaz-kir]]<br>{{#lst:Apertium-kaz-kir/stats|kaz-kir_stems}} || - || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb_stems}} || || || || ||
|-
|-
| '''tuk''' || || || - || || || || ||
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb_stems}} || - || || || || ||
|-
|-
| '''uzb''' || || ''[[Apertium-kir-uzb|kir-uzb]]''<br>{{#lst:Apertium-kir-uzb/stats|kir-uzb-stems}} || || - || || || ||
| '''kaa''' || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa_stems}} || || || - || || || ||
|-
|-
| '''uig''' || || || || || - || || ||
| '''tuk''' || || || || || - || || ||
|-
|-
| '''tgk''' || || || || || || - || ||
| '''uig''' || ''[[Apertium-kaz-uig|kaz-uig]]''<br>{{#lst:Apertium-kaz-uig/stats|kaz-uig_stems}} || || || || || - || ||
|-
|-
| '''kaa''' || ''[[Apertium-kaz-kaa|kaz-kaa]]''<br>{{#lst:Apertium-kaz-kaa/stats|kaz-kaa-stems}} || || || || || || - ||
| '''tgk''' || || || || || || || - ||
|-
|-
| '''prs''' || || || || || || || || -
| '''prs''' || || || || || || || || -
Line 31: Line 124:
| || || || || || || || ||
| || || || || || || || ||
|-
|-
| '''eng''' || ''[[Apertium-eng-kaz|eng-kaz]]''<br>{{#lst:Apertium-eng-kaz/stats|eng-kaz-stems}} || ''[[Apertium-ky-en|ky-en]]''<br>{{#lst:Apertium-ky-en/stats|ky-en-stems}} || || || || || ||
| '''eng''' || ''[[Apertium-eng-kaz|eng-kaz]]''<br>{{#lst:Apertium-eng-kaz/stats|eng-kaz_stems}} || ''[[Apertium-ky-en|ky-en]]''<br>{{#lst:Apertium-ky-en/stats|ky-en_stems}} || || || || || ||
|-
|-
| '''fas''' || || || || || || [[Apertium-tg-fa|tg-fa]]<br>{{#lst:Apertium-tg-fa/stats|tg-fa-stems}} || ||
| '''fas''' || || || || || || || [[Apertium-tg-fa|tg-fa]]<br>{{#lst:Apertium-tg-fa/stats|tg-fa_stems}} ||
|-
|-
| '''khk''' || ''[[Apertium-khk-kaz|khk-kaz]]''<br>{{#lst:Apertium-khk-kaz/stats|khk-kaz-stems}} || || || || || || ||
| '''khk''' || ''[[Apertium-khk-kaz|khk-kaz]]''<br>{{#lst:Apertium-khk-kaz/stats|khk-kaz_stems}} || || || || || || ||
|-
|-
| '''nog''' || ''[[Apertium-nog-kaz|nog-kaz]]''<br>{{#lst:Apertium-nog-kaz/stats|nog-kaz-stems}} || || || || || || ||
| '''nog''' || ''[[Apertium-nog-kaz|nog-kaz]]''<br>{{#lst:Apertium-nog-kaz/stats|nog-kaz_stems}} || || || || || || ||
|-
|-
| '''tat''' || '''[[Apertium-kaz-tat|kaz-tat]]'''<br>'''{{#lst:Apertium-kaz-tat/stats|kaz-tat-stems}}''' || ''[[Apertium-tat-kir|tat-kir]]''<br>{{#lst:Apertium-tat-kir/stats|tat-kir-stems}} || || || || || ||
| '''tat''' || '''[[Apertium-kaz-tat|kaz-tat]]'''<br>'''{{#lst:Apertium-kaz-tat/stats|kaz-tat_stems}}''' || ''[[Apertium-tat-kir|tat-kir]]''<br>{{#lst:Apertium-tat-kir/stats|tat-kir_stems}} || || || || || ||
|-
|-
| '''tur''' || || [[Apertium-tur-kir|tur-kir]]<br>{{#lst:Apertium-tur-kir/stats|tur-kir-stems}} || [[Apertium-tuk-tur|tuk-tur]]<br>{{#lst:Apertium-tuk-tur/stats|tuk-tur-stems}} || [[Apertium-tur-uzb|tur-uzb]]<br>{{#lst:Apertium-tur-uzb/stats|tur-uzb-stems}} || || || ||
| '''tur''' || || [[Apertium-tur-kir|tur-kir]]<br>{{#lst:Apertium-tur-kir/stats|tur-kir_stems}} || [[Apertium-tur-uzb|tur-uzb]]<br>{{#lst:Apertium-tur-uzb/stats|tur-uzb_stems}} || || [[Apertium-tuk-tur|tuk-tur]]<br>{{#lst:Apertium-tuk-tur/stats|tuk-tur_stems}} || || ||
|}
|}

Latest revision as of 23:13, 22 December 2014

The languages of Central Asia include several Turkic and Iranian languages spoken in Kazakhstan, Uzbekistan, Kyrgyzstan, Turkmenistan, Tajikistan, and Afghanistan. These include Kazakh, Kyrgyz, Uzbek, Turkmen, Tajik, Dari, Pashto, Uyghur, and Karakalpak.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

Transducers[edit]

name Language native name ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-kaz Kazakh қазақ тілі kk kaz HFST (lexc+twol) production 36,595 ~94.5% apertium-kaz (languages) Ilnar, Jonathan, Fran, Nathan
apertium-kir Kyrgyz кыргыз тили ky kir HFST (lexc+twol) working 8,564 ~90.4% apertium-kir (languages) Jonathan, Mirlan, Fran, Qantörö
apertium-uzb Uzbek o'zbek tili uz uzb HFST (lexc+twol) development 34,470 ~82.9%
apertium-tuk Turkmen Türkmençe tk tuk HFST (lexc+twol) development 2,988 ~70.7% apertium-tuk (languages) Fran
apertium-kaa Karakalpak Qaraqalpaq tili - kaa HFST (lexc+twol) prototype 25,545 ~86.1% apertium-kaa (languages) Beknazar, Fran, Jonathan
apertium-uig Uyghur ئۇيغۇر تىلى ug uig HFST (lexc+twol) prototype 17,585 ~54.2% apertium-uig (incubator) Jonathan, Märdan, Fran
apertium-tgk Tajik забони тоҷикӣ tg tgk lttoolbox development

Existing language pairs[edit]

kaz kir uzb kaa tuk uig tgk prs
kaz - kaz-kir
?
kaz-kaa
5,408
kaz-uig
2,728
kir kaz-kir
?
- kir-uzb
268
uzb kir-uzb
268
-
kaa kaz-kaa
5,408
-
tuk -
uig kaz-uig
2,728
-
tgk -
prs -
eng eng-kaz
16,931
ky-en
?
fas tg-fa
502
khk khk-kaz
134
nog nog-kaz
9
tat 'kaz-tat
'
tat-kir
tur tur-kir
7,123
tur-uzb
3,519
tuk-tur
3,387