Languages of the Volga-Kama region
The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
Existing language pairs[edit]
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
tat | chv | bak | mrj | udm | mhr | myv | kpv | |
---|---|---|---|---|---|---|---|---|
tat | - | chv-tat 198 |
tat-bak 2,941 |
|||||
chv | chv-tat 198 |
- | ||||||
bak | tat-bak 2,941 |
- | ||||||
mrj | - | |||||||
udm | - | |||||||
mhr | - | kpv-mhr 127 | ||||||
myv | - | |||||||
kpv | kpv-mhr 127 |
- | ||||||
fin | mrj-fin 273 |
fin-udm 93 |
myv-fin 401 |
kpv-fin 1 | ||||
kaz | 'kaz-tat ' |
|||||||
kir | tat-kir |
|||||||
rus | tat-rus 5,999 |
cv-ru 75 |
udm-rus 148 |
|||||
tur | tur-tat 3,317 |
cv-tr 100 |
The languages[edit]
Volga-Kama languages by subgroup[edit]
- Uralic → Finno-Ugric → Finno-Permic
Volga-Kama language vulnerability[edit]
The following table shows information about Volga-Kama varieties.
language | iso | num speakers | UNESCO classification |
---|---|---|---|
Tatar | tat |
6500K | 0. none |
Bashqort | bak |
1379K | 1. vulnerable |
Chuvash | chv |
1325K | 1. vulnerable |
Udmurt | udm |
0464K | 2. definitely endangered |
Mari - Eastern | mhr |
0414K | 2. definitely endangered |
Mordvin - Erzya | myv |
0400K | 2. definitely endangered |
Komi - Zyryan | kpv |
0217K | 2. definitely endangered |
Mordvin - Moksha | mdf |
0200K | 2. definitely endangered |
Komi - Permyak | koi |
0094K | 2. definitely endangered |
Mari - Western | mrj |
0037K | 3. severely endangered |
Komi - Yazva | koi |
0000K | 3. severely endangered |
Existing general resources[edit]
Grammars[edit]
Dictionaries[edit]
Existing computational resources[edit]
Corpora and corpora projects[edit]
Spell-checkers[edit]
Text-to-speech and speech-to-text systems[edit]
Keyboards[edit]
- Xkb includes keyboards for the following languages:
- Tatar
- Chuvash
- ...?