Languages of the Volga-Kama region
The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
Existing language pairs[edit]
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
| tat | chv | bak | mrj | udm | mhr | myv | kpv | |
|---|---|---|---|---|---|---|---|---|
| tat | - | chv-tat 198 | tat-bak 2,941 | |||||
| chv | chv-tat 198 | - | ||||||
| bak | tat-bak 2,941 | - | ||||||
| mrj | - | |||||||
| udm | - | |||||||
| mhr | - | kpv-mhr 127 | ||||||
| myv | - | |||||||
| kpv | kpv-mhr 127 | - | ||||||
| fin | mrj-fin 273 | fin-udm 93 | myv-fin 401 | kpv-fin 1 | ||||
| kaz | 'kaz-tat ' | |||||||
| kir | tat-kir | |||||||
| rus | tat-rus 5,999 | cv-ru 75 | udm-rus 148 | |||||
| tur | tur-tat 3,317 | cv-tr 100 | 
The languages[edit]
Volga-Kama languages by subgroup[edit]
- Uralic → Finno-Ugric → Finno-Permic
Volga-Kama language vulnerability[edit]
The following table shows information about Volga-Kama varieties.
| language | iso | num speakers | UNESCO classification | 
|---|---|---|---|
| Tatar | tat | 6500K | 0. none | 
| Bashqort | bak | 1379K | 1. vulnerable | 
| Chuvash | chv | 1325K | 1. vulnerable | 
| Udmurt | udm | 0464K | 2. definitely endangered | 
| Mari - Eastern | mhr | 0414K | 2. definitely endangered | 
| Mordvin - Erzya | myv | 0400K | 2. definitely endangered | 
| Komi - Zyryan | kpv | 0217K | 2. definitely endangered | 
| Mordvin - Moksha | mdf | 0200K | 2. definitely endangered | 
| Komi - Permyak | koi | 0094K | 2. definitely endangered | 
| Mari - Western | mrj | 0037K | 3. severely endangered | 
| Komi - Yazva | koi | 0000K | 3. severely endangered | 
Existing general resources[edit]
Grammars[edit]
Dictionaries[edit]
Existing computational resources[edit]
Corpora and corpora projects[edit]
Spell-checkers[edit]
Text-to-speech and speech-to-text systems[edit]
Keyboards[edit]
- Xkb includes keyboards for the following languages:
- Tatar
- Chuvash
- ...?
 

