Difference between revisions of "Languages of the Volga-Kama region"
(Update to new style) |
|||
Line 15: | Line 15: | ||
!rowspan=2| Language |
!rowspan=2| Language |
||
!colspan=2 class="unsortable"| ISO 639 |
!colspan=2 class="unsortable"| ISO 639 |
||
!rowspan=2| speakers |
|||
!rowspan=2| UNESCO |
|||
!rowspan=2| formalism |
!rowspan=2| formalism |
||
!rowspan=2| state |
!rowspan=2| state |
||
Line 26: | Line 24: | ||
! -2 |
! -2 |
||
! -3 |
! -3 |
||
|- |
|||
| <code>[[apertium-myv]]</code> |
|||
|| [[Erzya]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>myv</code> |
|||
|| [[HFST|HFST (lexc+twol)]] |
|||
|| development |
|||
|align="right"| {{#lst:Apertium-myv-fin/stats|myv-stems}} |
|||
|align="center"| |
|||
|| [[apertium-myv-fin]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], Jack Rueter |
|||
|- |
|- |
||
|| <code>[[apertium-tat]]</code> |
|| <code>[[apertium-tat]]</code> |
||
Line 31: | Line 40: | ||
|| <code>tt</code> |
|| <code>tt</code> |
||
|| <code>tat</code> |
|| <code>tat</code> |
||
|align="right"| 6500K |
|||
|align="right"| 0 (none) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| |
|| working |
||
|align="right"| {{#lst:Apertium-tat/stats|stems}} |
|align="right"| {{#lst:Apertium-tat/stats|stems}} |
||
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]] |
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]] |
||
Line 44: | Line 51: | ||
|| <code>cv</code> |
|| <code>cv</code> |
||
|| <code>chv</code> |
|| <code>chv</code> |
||
|align="right"| 1325K |
|||
|align="right"| 1 (vulnerable) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| {{:apertium-chv/stems}} |
|align="right"| {{:apertium-chv/stems}} |
||
|align="center"| [[apertium-chv#Current_State|~{{:apertium-chv/stats/average}}%]] |
|align="center"| [[apertium-chv#Current_State|~{{:apertium-chv/stats/average}}%]] |
||
Line 57: | Line 62: | ||
|| <code>ba</code> |
|| <code>ba</code> |
||
|| <code>bak</code> |
|| <code>bak</code> |
||
|align="right"| 1379K |
|||
|align="right"| 1 (vulnerable) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| {{:apertium-bak/stems}} |
|align="right"| {{:apertium-bak/stems}} |
||
|align="center"| [[apertium-bak#Current_State|~{{:apertium-bak/stats/average}}%]] |
|align="center"| [[apertium-bak#Current_State|~{{:apertium-bak/stats/average}}%]] |
||
Line 70: | Line 73: | ||
|align="center"| <code>–</code> |
|align="center"| <code>–</code> |
||
|| <code>udm</code> |
|| <code>udm</code> |
||
|align="right"| 464K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| |
|align="right"| {{#lst:Apertium-udm-rus/stats|udm-stems}} |
||
|align="center"| |
|align="center"| |
||
|| [[apertium-fin-udm]] ([[incubator]]) <br> [[apertium-udm-rus]] ([[nursery]]) |
|| [[apertium-fin-udm]] ([[incubator]]) <br> [[apertium-udm-rus]] ([[nursery]]) |
||
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], [[User:Andrewboltachev|Andrey]], Лукерья, Алексей |
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], [[User:Andrewboltachev|Andrey]], Лукерья, Алексей |
||
|- |
|- |
||
| <code>[[apertium- |
| <code>[[apertium-kpv]]</code> |
||
|| [[ |
|| [[Komi-Zyrian]] |
||
|align="center"|<code>–</code> |
|align="center"|<code>–</code> |
||
|| <code> |
|| <code>kpv</code> |
||
|align="right"| 400K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| {{#lst:Apertium- |
|align="right"| {{#lst:Apertium-kpv-mhr/stats|kpv-stems}} |
||
|align="center"| |
|align="center"| |
||
|| [[apertium- |
|| [[apertium-kpv-mhr]] ([[incubator]]) <br> [[apertium-kpv-fin]] ([[incubator]]) |
||
|| [[User:Francis_Tyers|Fran]], |
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], Fedina, Andrei Chemyshev |
||
|- |
|- |
||
| <code>[[apertium-mhr]]</code> |
| <code>[[apertium-mhr]]</code> |
||
|| [[ |
|| [[Meadow Mari]] |
||
|align="center"|<code>–</code> |
|align="center"|<code>–</code> |
||
|| <code>mhr</code> |
|| <code>mhr</code> |
||
|align="right"| 414K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}} |
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}} |
||
|align="center"| |
|align="center"| |
||
|| [[apertium-kpv-mhr]] ([[incubator]]) |
|| [[apertium-kpv-mhr]] ([[incubator]]) |
||
|| [[User:Francis_Tyers|Fran]], Fedina, |
|| [[User:Francis_Tyers|Fran]], Fedina, Andrei Chemyshev |
||
|- |
|||
| <code>[[apertium-kpv]]</code> |
|||
|| [[Komi-Zyrian]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>kpv</code> |
|||
|align="right"| 217K |
|||
|align="right"| 2 (definitely endangered) |
|||
|| [[HFST|HFST (lexc+twol)]] |
|||
|| development |
|||
|align="right"| ? |
|||
|align="center"| ? |
|||
|| [[apertium-kpv-mhr]] ([[incubator]]) <br> [[apertium-kpv-fin]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], Fedina, chemyshev |
|||
|- |
|||
| <code>[[apertium-mdf]]</code> |
|||
|| [[Moksha]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>mdf</code> |
|||
|align="right"| 200K |
|||
|align="right"| 2 (definitely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|- |
|||
| <code>[[apertium-koi]]</code> |
|||
|| [[Komi-Permyak]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>koi</code> |
|||
|align="right"| 94K |
|||
|align="right"| 2 (definitely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|- |
|||
| <code>[[apertium-mrj]]</code> |
|||
|| [[Western Mari]] |
|||
|align="center"|<code>–</code> |
|||
|| <code>mrj</code> |
|||
|align="right"| 37K |
|||
|align="right"| 3 (severely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|- |
|||
| <code>[[apertium-kpvyaz]]</code> |
|||
|| [[Komi-Yazva]] |
|||
|align="center"|<code>–</code> |
|||
|align="center"|<code>–</code> |
|||
|align="right"| 0K |
|||
|align="right"| 3 (severely endangered) |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|align="center"| – |
|||
|} |
|} |
||
Line 179: | Line 111: | ||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! !! tat !! chv !! bak !! udm !! mhr !! kpv |
! !! tat !! chv !! bak !! udm !! mhr !! myv !! kpv |
||
|- |
|- |
||
| '''tat''' || - || ''[[apertium-cv-tt|cv-tt]]'' || ''[[apertium-tat-bak|tat-bak]]'' || || || |
| '''tat''' || - || ''[[apertium-cv-tt|cv-tt]]'' || ''[[apertium-tat-bak|tat-bak]]'' || || || || |
||
|- |
|- |
||
| '''chv''' || ''[[apertium-cv-tt|cv-tt]]'' || - || || || || |
| '''chv''' || ''[[apertium-cv-tt|cv-tt]]'' || - || || || || || |
||
|- |
|- |
||
| '''bak''' || ''[[apertium-tat-bak|tat-bak]]'' || || - || || || |
| '''bak''' || ''[[apertium-tat-bak|tat-bak]]'' || || - || || || || |
||
|- |
|- |
||
| '''udm''' || || || || - || || |
| '''udm''' || || || || - || || || |
||
|- |
|- |
||
| '''mhr''' || || || || || - || ''[[apertium-kpv-mhr|kpv-mhr]]'' |
| '''mhr''' || || || || || - || || ''[[apertium-kpv-mhr|kpv-mhr]]'' |
||
|- |
|- |
||
| ''' |
| '''myv''' || || || || || || - || |
||
|- |
|||
| '''kpv''' || || || || || ''[[apertium-kpv-mhr|kpv-mhr]]'' || || - |
|||
|} |
|} |
||
Line 197: | Line 131: | ||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! !! tat !! chv !! bak !! udm !! mhr !! kpv |
! !! tat !! chv !! bak !! udm !! mhr !! myv !!kpv |
||
|- |
|- |
||
| '''ru''' || ''[[apertium-tt-ru|tt-ru]]'' || ''[[apertium-cv-ru|cv-ru]]'' || || ''[[apertium-udm-rus|udm-rus]]'' || || |
| '''ru''' || ''[[apertium-tt-ru|tt-ru]]'' || ''[[apertium-cv-ru|cv-ru]]'' || || ''[[apertium-udm-rus|udm-rus]]'' || || || |
||
|- |
|- |
||
| '''ky''' || ''[[apertium-tt-ky|tt-ky]]'' || || || || || |
| '''ky''' || ''[[apertium-tt-ky|tt-ky]]'' || || || || || || |
||
|- |
|- |
||
| '''tr''' || || ''[[apertium-cv-tr|cv-tr]]'' || || || || |
| '''tr''' || || ''[[apertium-cv-tr|cv-tr]]'' || || || || || |
||
|- |
|- |
||
| '''fin''' || || || || ''[[apertium-fin-udm|fin-udm]]'' || || ''[[apertium-kpv-fin|kpv-fin]]'' |
| '''fin''' || || || || ''[[apertium-fin-udm|fin-udm]]'' || || ''[[apertium-myv-fin|myv-fin]]'' || ''[[apertium-kpv-fin|kpv-fin]]'' |
||
|} |
|} |
||
Line 211: | Line 145: | ||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! !! tat !! chv !! bak !! udm !! mhr !! kpv |
! !! tat !! chv !! bak !! udm !! mhr !! myv !! kpv |
||
|- |
|- |
||
| '''tat''' || - || |
| '''tat''' || - || {{#lst:Apertium-cv-tt/stats|cv-tt-stems}} || {{#lst:Apertium-tat-bak/stats|tat-bak-stems}} || || || || |
||
|- |
|- |
||
| '''chv''' || |
| '''chv''' || {{#lst:Apertium-cv-tt/stats|cv-tt-stems}} || - || || || || || |
||
|- |
|- |
||
| '''bak''' || |
| '''bak''' || {{#lst:Apertium-tat-bak/stats|tat-bak-stems}} || || - || || || || |
||
|- |
|- |
||
| '''udm''' || || || || - || || |
| '''udm''' || || || || - || || || |
||
|- |
|- |
||
| '''mhr''' || || || || || - || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}} |
| '''mhr''' || || || || || - || || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}} |
||
|- |
|- |
||
| ''' |
| '''myv''' || || || || || || - || |
||
|- |
|- |
||
| || |
| '''kpv''' || || || || || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}} || || - |
||
|- |
|- |
||
| |
| || || || || || || || |
||
|- |
|- |
||
| ''' |
| '''ru''' || {{#lst:Apertium-tt-ru/stats|tt-ru-stems}} || {{#lst:Apertium-cv-ru/stats|cv-ru-stems}} || || {{#lst:Apertium-udm-rus/stats|udm-rus-stems}} || || || |
||
|- |
|- |
||
| ''' |
| '''ky''' || {{#lst:Apertium-tt-ky/stats|tt-ky-stems}} || || || || || || |
||
|- |
|- |
||
| |
| '''tr''' || || {{#lst:Apertium-cv-tr/stats|stems}} || || || || || |
||
|- |
|||
| '''fin''' || || || || {{#lst:Apertium-fin-udm/stats|fin-udm-stems}} || || {{#lst:Apertium-myv-fin/stats|myv-fin-stems}} || {{#lst:Apertium-kpv-fin/stats|kpv-fin-stems}} |
|||
|} |
|} |
||
== The languages == |
== The languages == |
||
=== Volga-Kama languages by subgroup === |
|||
* [[Turkic languages|Turkic]] |
|||
** North Qıpçaq: [[Tatar]], [[Bashqort]] |
|||
** Oğur: [[Chuvash]] |
|||
* [[Uralic languages|Uralic]] → Finno-Ugric → Finno-Permic |
|||
** Permic: [[Komi]] (Komi-Zyrian, Komi-Permyak, Komi-Yazva), [[Udmurt]] |
|||
** Finno-Volgaic |
|||
*** [[Mari]]: [[Meadow Mari|Meadow Mari (Eastern)]], [[Hill Mari|Hill Mari (Western)]] |
|||
*** [[Mordvin]]: [[Erzya]], [[Moksha]] |
|||
=== Volga-Kama language vulnerability === |
|||
The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages. |
The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages. |
||
{|class="wikitable sortable" |
{|class="wikitable sortable" |
||
|- |
|- |
||
! language !! iso !! num speakers !! UNESCO classification |
! language !! iso !! num speakers !! UNESCO classification |
||
|- |
|- |
||
| Tatar || <code>tat</code> || 6500K || 0. none |
| Tatar || <code>tat</code> || 6500K || 0. none |
||
incubator/apertium-tt-kk/ |
|||
incubator/apertium-tt-ky/ |
|||
incubator/apertium-tt-ru/ |
|||
incubator/apertium-cv-tt/ |
|||
nursery/apertium-tt-ba/ |
|||
|- |
|- |
||
| Bashqort || <code>bak</code> || 1379K || 1. vulnerable |
| Bashqort || <code>bak</code> || 1379K || 1. vulnerable |
||
|- |
|- |
||
| Chuvash || <code>chv</code> || 1325K || 1. vulnerable |
| Chuvash || <code>chv</code> || 1325K || 1. vulnerable |
||
incubator/apertium-cv-tr/ |
|||
incubator/apertium-cv-tt/ |
|||
|- |
|- |
||
| Udmurt || <code>udm</code> || 0464K || 2. definitely endangered |
| Udmurt || <code>udm</code> || 0464K || 2. definitely endangered |
||
nursery/apertium-udm-rus/ |
|||
|- |
|- |
||
| Mari - Eastern || <code>mhr</code> || 0414K || 2. definitely endangered |
| Mari - Eastern || <code>mhr</code> || 0414K || 2. definitely endangered |
||
|- |
|- |
||
| Mordvin - Erzya || <code>myv</code> || 0400K || 2. definitely endangered |
| Mordvin - Erzya || <code>myv</code> || 0400K || 2. definitely endangered |
||
|- |
|- |
||
| Komi - Zyryan || <code>kpv</code> || 0217K || 2. definitely endangered |
| Komi - Zyryan || <code>kpv</code> || 0217K || 2. definitely endangered |
||
|- |
|- |
||
| Mordvin - Moksha || <code>mdf</code> || 0200K || 2. definitely endangered |
| Mordvin - Moksha || <code>mdf</code> || 0200K || 2. definitely endangered |
||
|- |
|- |
||
| Komi - Permyak || <code>koi</code> || 0094K || 2. definitely endangered |
| Komi - Permyak || <code>koi</code> || 0094K || 2. definitely endangered |
||
|- |
|- |
||
| Mari - Western || <code>mrj</code> || 0037K || 3. severely endangered |
| Mari - Western || <code>mrj</code> || 0037K || 3. severely endangered |
||
|- |
|- |
||
| Komi - Yazva || <code>koi</code> || 0000K || 3. severely endangered |
| Komi - Yazva || <code>koi</code> || 0000K || 3. severely endangered |
||
|} |
|} |
||
Revision as of 06:45, 28 December 2013
The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | Language | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||
apertium-myv
|
Erzya | –
|
myv
|
HFST (lexc+twol) | development | apertium-myv-fin (incubator) | Fran, Jack Rueter | ||
apertium-tat
|
Tatar | tt
|
tat
|
HFST (lexc+twol) | working | 55,702 | ~91% | apertium-tat (languages) | Ilnar, Fran, Jonathan, Milli |
apertium-chv
|
Chuvash | cv
|
chv
|
HFST (lexc+twol) | development | 8,579 | ~85% | apertium-chv (languages) | Hèctor |
apertium-bak
|
Bashkir | ba
|
bak
|
HFST (lexc+twol) | development | 2,827 | ~66% | apertium-bak (languages) | Fran, Jonathan, Ilnar, Milli |
apertium-udm
|
Udmurt | –
|
udm
|
HFST (lexc+twol) | development | apertium-fin-udm (incubator) apertium-udm-rus (nursery) |
Fran, Trond, Andrey, Лукерья, Алексей | ||
apertium-kpv
|
Komi-Zyrian | –
|
kpv
|
HFST (lexc+twol) | development | apertium-kpv-mhr (incubator) apertium-kpv-fin (incubator) |
Fran, Trond, Fedina, Andrei Chemyshev | ||
apertium-mhr
|
Meadow Mari | –
|
mhr
|
HFST (lexc+twol) | development | apertium-kpv-mhr (incubator) | Fran, Fedina, Andrei Chemyshev |
Existing language pairs
Volga-Kama–Volga-Kama pairs
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
tat | chv | bak | udm | mhr | myv | kpv | |
---|---|---|---|---|---|---|---|
tat | - | cv-tt | tat-bak | ||||
chv | cv-tt | - | |||||
bak | tat-bak | - | |||||
udm | - | ||||||
mhr | - | kpv-mhr | |||||
myv | - | ||||||
kpv | kpv-mhr | - |
Pairs with non–Volga-Kama languages
tat | chv | bak | udm | mhr | myv | kpv | |
---|---|---|---|---|---|---|---|
ru | tt-ru | cv-ru | udm-rus | ||||
ky | tt-ky | ||||||
tr | cv-tr | ||||||
fin | fin-udm | myv-fin | kpv-fin |
Table of dix progress
tat | chv | bak | udm | mhr | myv | kpv | |
---|---|---|---|---|---|---|---|
tat | - | ||||||
chv | - | ||||||
bak | - | ||||||
udm | - | ||||||
mhr | - | ||||||
myv | - | ||||||
kpv | - | ||||||
ru | |||||||
ky | 10 | ||||||
tr | |||||||
fin |
The languages
Volga-Kama languages by subgroup
- Uralic → Finno-Ugric → Finno-Permic
Volga-Kama language vulnerability
The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages.
language | iso | num speakers | UNESCO classification |
---|---|---|---|
Tatar | tat |
6500K | 0. none |
Bashqort | bak |
1379K | 1. vulnerable |
Chuvash | chv |
1325K | 1. vulnerable |
Udmurt | udm |
0464K | 2. definitely endangered |
Mari - Eastern | mhr |
0414K | 2. definitely endangered |
Mordvin - Erzya | myv |
0400K | 2. definitely endangered |
Komi - Zyryan | kpv |
0217K | 2. definitely endangered |
Mordvin - Moksha | mdf |
0200K | 2. definitely endangered |
Komi - Permyak | koi |
0094K | 2. definitely endangered |
Mari - Western | mrj |
0037K | 3. severely endangered |
Komi - Yazva | koi |
0000K | 3. severely endangered |
Existing general resources
Grammars
Dictionaries
Existing computational resources
Corpora and corpora projects
Spell-checkers
Text-to-speech and speech-to-text systems
Keyboards
- Xkb includes keyboards for the following languages:
- Tatar
- Chuvash
- ...?