Difference between revisions of "Languages of the Volga-Kama region"
(20 intermediate revisions by 3 users not shown) | |||
Line 5: | Line 5: | ||
==Status== |
==Status== |
||
The ultimate goal is to have multi-purposable transducers for a variety of |
The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs. |
||
===Transducers=== |
===Transducers=== |
||
Line 15: | Line 15: | ||
!rowspan=2| Language |
!rowspan=2| Language |
||
!colspan=2 class="unsortable"| ISO 639 |
!colspan=2 class="unsortable"| ISO 639 |
||
!rowspan=2| speakers |
|||
!rowspan=2| UNESCO |
|||
!rowspan=2| formalism |
!rowspan=2| formalism |
||
!rowspan=2| state |
!rowspan=2| state |
||
Line 26: | Line 24: | ||
! -2 |
! -2 |
||
! -3 |
! -3 |
||
|- |
|||
| <code>[[apertium-myv]]</code> |
|||
|| [[Erzya]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>myv</code> |
|||
|| [[HFST|HFST (lexc+twol)]] |
|||
|| development |
|||
|align="right"| {{#lst:Apertium-myv-fin/stats|myv_stems}} |
|||
|align="center"| |
|||
|| [[apertium-myv-fin]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], Jack Rueter |
|||
|- |
|- |
||
|| <code>[[apertium-tat]]</code> |
|| <code>[[apertium-tat]]</code> |
||
Line 31: | Line 40: | ||
|| <code>tt</code> |
|| <code>tt</code> |
||
|| <code>tat</code> |
|| <code>tat</code> |
||
|align="right"| 6500K |
|||
|align="right"| 0 (none) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| {{#lst:Apertium-tat/stats|state}} |
|||
|| production |
|||
|align="right"| {{#lst:Apertium-tat/stats|stems}} |
|align="right"| {{#lst:Apertium-tat/stats|stems}} |
||
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]] |
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]] |
||
|| {{#lst:Apertium-tat/stats|location}} |
|||
|| [[apertium-tat]] ([[languages]]) |
|||
|| {{#lst:Apertium-tat/stats|authors}} |
|||
|| [[User:Ilnar.salimzyan|Ilnar]], [[User:Francis Tyers|Fran]], [[User:Firespeaker|Jonathan]], Milli |
|||
|- |
|- |
||
| <code>[[apertium-chv]]</code> |
| <code>[[apertium-chv]]</code> |
||
Line 44: | Line 51: | ||
|| <code>cv</code> |
|| <code>cv</code> |
||
|| <code>chv</code> |
|| <code>chv</code> |
||
|align="right"| 1325K |
|||
|align="right"| 1 (vulnerable) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| {{:apertium-chv/stems}} |
|align="right"| {{:apertium-chv/stems}} |
||
|align="center"| [[apertium-chv#Current_State|~{{:apertium-chv/stats/average}}%]] |
|align="center"| [[apertium-chv#Current_State|~{{:apertium-chv/stats/average}}%]] |
||
Line 57: | Line 62: | ||
|| <code>ba</code> |
|| <code>ba</code> |
||
|| <code>bak</code> |
|| <code>bak</code> |
||
|align="right"| 1379K |
|||
|align="right"| 1 (vulnerable) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| {{:apertium-bak/stems}} |
|align="right"| {{:apertium-bak/stems}} |
||
|align="center"| [[apertium-bak#Current_State|~{{:apertium-bak/stats/average}}%]] |
|align="center"| [[apertium-bak#Current_State|~{{:apertium-bak/stats/average}}%]] |
||
|| [[apertium-bak]] ([[languages]]) |
|| [[apertium-bak]] ([[languages]]) |
||
|| [[User:Francis Tyers|Fran]], [[User:Firespeaker|Jonathan]], [[User:Ilnar.salimzyan|Ilnar]], Milli |
|| [[User:Francis Tyers|Fran]], [[User:Firespeaker|Jonathan]], [[User:Ilnar.salimzyan|Ilnar]], Milli |
||
|- |
|||
| <code>[[apertium-mrj]]</code> |
|||
|| [[Hill Mari]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>mrj</code> |
|||
|| [[HFST|HFST (lexc+twol)]] |
|||
|| development |
|||
|align="right"| {{#lst:Apertium-mrj-fin/stats|mrj_stems}} |
|||
|align="center"| |
|||
|| [[apertium-mrj-fin]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], kuprina, jackrueter |
|||
|- |
|- |
||
| <code>[[apertium-udm]]</code> |
| <code>[[apertium-udm]]</code> |
||
Line 70: | Line 84: | ||
|align="center"| <code>–</code> |
|align="center"| <code>–</code> |
||
|| <code>udm</code> |
|| <code>udm</code> |
||
|align="right"| 464K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| |
|| prototype |
||
|align="right"| |
|align="right"| {{#lst:Apertium-udm-rus/stats|udm_stems}} |
||
|align="center"| |
|align="center"| |
||
|| |
|| [[apertium-udm-rus]] ([[nursery]]) |
||
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], [[User:Andrewboltachev|Andrey]], Лукерья, Алексей |
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], [[User:Andrewboltachev|Andrey]], Лукерья, Алексей |
||
|- |
|- |
||
| <code>[[apertium- |
| <code>[[apertium-kpv]]</code> |
||
|| [[ |
|| [[Komi-Zyrian]] |
||
|align="center"|<code>–</code> |
|align="center"|<code>–</code> |
||
|| <code> |
|| <code>kpv</code> |
||
|align="right"| 400K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| |
|| prototype |
||
|align="right"| {{#lst:Apertium- |
|align="right"| {{#lst:Apertium-kpv-mhr/stats|kpv_stems}} |
||
|align="center"| |
|align="center"| |
||
|| [[apertium- |
|| [[apertium-kpv-mhr]] ([[incubator]]) |
||
|| [[User:Francis_Tyers|Fran]], |
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], Fedina, Andrei Chemyshev |
||
|- |
|- |
||
| <code>[[apertium-mhr]]</code> |
| <code>[[apertium-mhr]]</code> |
||
|| [[ |
|| [[Meadow Mari]] |
||
|align="center"|<code>–</code> |
|align="center"|<code>–</code> |
||
|| <code>mhr</code> |
|| <code>mhr</code> |
||
|align="right"| 414K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| |
|| prototype |
||
|align="right"| |
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr_stems}} |
||
|align="center"| |
|align="center"| |
||
|| [[apertium-kpv-mhr]] ([[incubator]]) |
|| [[apertium-kpv-mhr]] ([[incubator]]) |
||
|| [[User:Francis_Tyers|Fran]], Fedina, |
|| [[User:Francis_Tyers|Fran]], Fedina, Andrei Chemyshev |
||
|- |
|||
| <code>[[apertium-kpv]]</code> |
|||
|| [[Komi-Zyrian]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>kpv</code> |
|||
|align="right"| 217K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|||
|| development |
|||
|align="right"| ? |
|||
|align="center"| ? |
|||
|| [[apertium-kpv-mhr]] ([[incubator]]) <br> [[apertium-kpv-fin]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], Fedina, chemyshev |
|||
|- |
|||
| <code>[[apertium-mdf]]</code> |
|||
|| [[Moksha]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>mdf</code> |
|||
|align="right"| 200K |
|||
|align="right"| 2 (definitely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|- |
|||
| <code>[[apertium-koi]]</code> |
|||
|| [[Komi-Permyak]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>koi</code> |
|||
|align="right"| 94K |
|||
|align="right"| 2 (definitely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|- |
|||
| <code>[[apertium-mrj]]</code> |
|||
|| [[Western Mari]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>mrj</code> |
|||
|align="right"| 37K |
|||
|align="right"| 3 (severely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|- |
|||
| <code>[[apertium-kpvyaz]]</code> |
|||
|| [[Komi-Yazva]] |
|||
|align="center"|<code>–</code> |
|||
|align="center"|<code>–</code> |
|||
|align="right"| 0K |
|||
|align="right"| 3 (severely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|} |
|} |
||
=== Existing language pairs === |
=== Existing language pairs === |
||
==== Volga-Kama–Volga-Kama pairs ==== |
|||
Text in ''italic'' denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk. |
Text in ''italic'' denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk. |
||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable dixtable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! |
! !! tat !! chv !! bak !! mrj !! udm !! mhr !! myv !! kpv |
||
|- |
|- |
||
| '''tat''' || - |
| '''tat''' || - || ''[[Apertium-chv-tat|chv-tat]]''<br>{{#lst:Apertium-chv-tat/stats|chv-tat_stems}} || [[Apertium-tat-bak|tat-bak]]<br>{{#lst:Apertium-tat-bak/stats|tat-bak_stems}} || || || || || |
||
|- |
|- |
||
| '''chv''' || ''[[ |
| '''chv''' || ''[[Apertium-chv-tat|chv-tat]]''<br>{{#lst:Apertium-chv-tat/stats|chv-tat_stems}} || - || || || || || || |
||
|- |
|- |
||
| '''bak''' || |
| '''bak''' || [[Apertium-tat-bak|tat-bak]]<br>{{#lst:Apertium-tat-bak/stats|tat-bak_stems}} || || - || || || || || |
||
|- |
|- |
||
| ''' |
| '''mrj''' || || || || - || || || || |
||
|- |
|- |
||
| ''' |
| '''udm''' || || || || || - || || || |
||
|- |
|- |
||
| ''' |
| '''mhr''' || || || || || || - || || ''[[Apertium-kpv-mhr|kpv-mhr]]''<br>{{#lst:Apertium-kpv-mhr/stats|kpv-mhr_stems}} |
||
|} |
|||
==== Pairs with non–Volga-Kama languages ==== |
|||
{| style="text-align: center;" class="wikitable" |
|||
|- style="background: #ececec" |
|||
! !! tat !! chv !! bak !! udm !! mhr !! kpv |
|||
|- |
|- |
||
| |
| '''myv''' || || || || || || || - || |
||
|- |
|- |
||
| |
| '''kpv''' || || || || || || ''[[Apertium-kpv-mhr|kpv-mhr]]''<br>{{#lst:Apertium-kpv-mhr/stats|kpv-mhr_stems}} || || - |
||
|- |
|- |
||
| |
| || || || || || || || || |
||
|- |
|- |
||
| |
| '''fin''' || || || || ''[[Apertium-mrj-fin|mrj-fin]]''<br>{{#lst:Apertium-mrj-fin/stats|mrj-fin_stems}} || ''[[Apertium-fin-udm|fin-udm]]''<br>{{#lst:Apertium-fin-udm/stats|fin-udm_stems}} || || ''[[Apertium-myv-fin|myv-fin]]''<br>{{#lst:Apertium-myv-fin/stats|myv-fin_stems}} || ''[[Apertium-kpv-fin|kpv-fin]]''<br>{{#lst:Apertium-kpv-fin/stats|kpv-fin_stems}} |
||
|} |
|||
==== Table of dix progress ==== |
|||
{| style="text-align: center;" class="wikitable" |
|||
|- style="background: #ececec" |
|||
! !! tat !! chv !! bak !! udm !! mhr !! kpv |
|||
|- |
|- |
||
| ''' |
| '''kaz''' || '''[[Apertium-kaz-tat|kaz-tat]]'''<br>'''{{#lst:Apertium-kaz-tat/stats|kaz-tat_stems}}''' || || || || || || || |
||
|- |
|- |
||
| ''' |
| '''kir''' || ''[[Apertium-tat-kir|tat-kir]]''<br>{{#lst:Apertium-tat-kir/stats|tat-kir_stems}} || || || || || || || |
||
|- |
|- |
||
| '''rus''' || [[Apertium-tat-rus|tat-rus]]<br>{{#lst:Apertium-tat-rus/stats|tat-rus_stems}} || ''[[Apertium-cv-ru|cv-ru]]''<br>{{#lst:Apertium-cv-ru/stats|cv-ru_stems}} || || || [[Apertium-udm-rus|udm-rus]]<br>{{#lst:Apertium-udm-rus/stats|udm-rus_stems}} || || || |
|||
| '''bak''' || ? || || - || || || |
|||
|- |
|- |
||
| '''tur''' || [[Apertium-tur-tat|tur-tat]]<br>{{#lst:Apertium-tur-tat/stats|tur-tat_stems}} || ''[[Apertium-cv-tr|cv-tr]]''<br>{{#lst:Apertium-cv-tr/stats|cv-tr_stems}} || || || || || || |
|||
| '''udm''' || || || || - || || |
|||
|- |
|||
| '''mhr''' || || || || || - || ? |
|||
|- |
|||
| '''kpv''' || || || || || ? || - |
|||
|- |
|||
| || || || || || || |
|||
|- |
|||
| '''ru''' || ? || ? || || ? || || |
|||
|- |
|||
| '''ky''' || ? || || || || || |
|||
|- |
|||
| '''tr''' || || ? || || || || |
|||
|- |
|||
| '''fin''' || || || || ? || || ? |
|||
|} |
|} |
||
== The languages == |
== The languages == |
||
The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages. |
|||
=== Volga-Kama languages by subgroup === |
|||
* [[Turkic languages|Turkic]] |
|||
** North Qıpçaq: [[Tatar]], [[Bashqort]] |
|||
** Oğur: [[Chuvash]] |
|||
* [[Uralic languages|Uralic]] → Finno-Ugric → Finno-Permic |
|||
** Permic: [[Komi]] (Komi-Zyrian, Komi-Permyak, Komi-Yazva), [[Udmurt]] |
|||
** Finno-Volgaic |
|||
*** [[Mari]]: [[Meadow Mari|Meadow Mari (Eastern)]], [[Hill Mari|Hill Mari (Western)]] |
|||
*** [[Mordvin]]: [[Erzya]], [[Moksha]] |
|||
=== Volga-Kama language vulnerability === |
|||
The following table shows information about Volga-Kama varieties. |
|||
{|class="wikitable sortable" |
{|class="wikitable sortable" |
||
|- |
|- |
||
! language !! iso !! num speakers !! UNESCO classification |
! language !! iso !! num speakers !! UNESCO classification |
||
|- |
|- |
||
| Tatar || <code>tat</code> || 6500K || 0. none |
| Tatar || <code>tat</code> || 6500K || 0. none |
||
incubator/apertium-tt-kk/ |
|||
incubator/apertium-tt-ky/ |
|||
incubator/apertium-tt-ru/ |
|||
incubator/apertium-cv-tt/ |
|||
nursery/apertium-tt-ba/ |
|||
|- |
|- |
||
| Bashqort || <code>bak</code> || 1379K || 1. vulnerable |
| Bashqort || <code>bak</code> || 1379K || 1. vulnerable |
||
|- |
|- |
||
| Chuvash || <code>chv</code> || 1325K || 1. vulnerable |
| Chuvash || <code>chv</code> || 1325K || 1. vulnerable |
||
incubator/apertium-cv-tr/ |
|||
incubator/apertium-cv-tt/ |
|||
|- |
|- |
||
| Udmurt || <code>udm</code> || 0464K || 2. definitely endangered |
| Udmurt || <code>udm</code> || 0464K || 2. definitely endangered |
||
nursery/apertium-udm-rus/ |
|||
|- |
|- |
||
| Mari - Eastern || <code>mhr</code> || 0414K || 2. definitely endangered |
| Mari - Eastern || <code>mhr</code> || 0414K || 2. definitely endangered |
||
|- |
|- |
||
| Mordvin - Erzya || <code>myv</code> || 0400K || 2. definitely endangered |
| Mordvin - Erzya || <code>myv</code> || 0400K || 2. definitely endangered |
||
|- |
|- |
||
| Komi - Zyryan || <code>kpv</code> || 0217K || 2. definitely endangered |
| Komi - Zyryan || <code>kpv</code> || 0217K || 2. definitely endangered |
||
|- |
|- |
||
| Mordvin - Moksha || <code>mdf</code> || 0200K || 2. definitely endangered |
| Mordvin - Moksha || <code>mdf</code> || 0200K || 2. definitely endangered |
||
|- |
|- |
||
| Komi - Permyak || <code>koi</code> || 0094K || 2. definitely endangered |
| Komi - Permyak || <code>koi</code> || 0094K || 2. definitely endangered |
||
|- |
|- |
||
| Mari - Western || <code>mrj</code> || 0037K || 3. severely endangered |
| Mari - Western || <code>mrj</code> || 0037K || 3. severely endangered |
||
|- |
|- |
||
| Komi - Yazva || <code>koi</code> || 0000K || 3. severely endangered |
| Komi - Yazva || <code>koi</code> || 0000K || 3. severely endangered |
||
|} |
|} |
||
Latest revision as of 23:25, 22 December 2014
The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
Existing language pairs[edit]
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
tat | chv | bak | mrj | udm | mhr | myv | kpv | |
---|---|---|---|---|---|---|---|---|
tat | - | chv-tat 198 |
tat-bak 2,941 |
|||||
chv | chv-tat 198 |
- | ||||||
bak | tat-bak 2,941 |
- | ||||||
mrj | - | |||||||
udm | - | |||||||
mhr | - | kpv-mhr 127 | ||||||
myv | - | |||||||
kpv | kpv-mhr 127 |
- | ||||||
fin | mrj-fin 273 |
fin-udm 93 |
myv-fin 401 |
kpv-fin 1 | ||||
kaz | 'kaz-tat ' |
|||||||
kir | tat-kir |
|||||||
rus | tat-rus 5,999 |
cv-ru 75 |
udm-rus 148 |
|||||
tur | tur-tat 3,317 |
cv-tr 100 |
The languages[edit]
Volga-Kama languages by subgroup[edit]
- Uralic → Finno-Ugric → Finno-Permic
Volga-Kama language vulnerability[edit]
The following table shows information about Volga-Kama varieties.
language | iso | num speakers | UNESCO classification |
---|---|---|---|
Tatar | tat |
6500K | 0. none |
Bashqort | bak |
1379K | 1. vulnerable |
Chuvash | chv |
1325K | 1. vulnerable |
Udmurt | udm |
0464K | 2. definitely endangered |
Mari - Eastern | mhr |
0414K | 2. definitely endangered |
Mordvin - Erzya | myv |
0400K | 2. definitely endangered |
Komi - Zyryan | kpv |
0217K | 2. definitely endangered |
Mordvin - Moksha | mdf |
0200K | 2. definitely endangered |
Komi - Permyak | koi |
0094K | 2. definitely endangered |
Mari - Western | mrj |
0037K | 3. severely endangered |
Komi - Yazva | koi |
0000K | 3. severely endangered |
Existing general resources[edit]
Grammars[edit]
Dictionaries[edit]
Existing computational resources[edit]
Corpora and corpora projects[edit]
Spell-checkers[edit]
Text-to-speech and speech-to-text systems[edit]
Keyboards[edit]
- Xkb includes keyboards for the following languages:
- Tatar
- Chuvash
- ...?