Languages of the Volga-Kama region

From Apertium
Jump to navigation Jump to search

The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers[edit]

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-myv Erzya myv HFST (lexc+twol) development 74,977 apertium-myv-fin (incubator) Fran, Jack Rueter
apertium-tat Tatar tt tat HFST (lexc+twol) production 55,702 ~91% apertium-tat (languages) Ilnar, Fran, Jonathan, Röstäm
apertium-chv Chuvash cv chv HFST (lexc+twol) development 8,579 ~85% apertium-chv (languages) Hèctor
apertium-bak Bashkir ba bak HFST (lexc+twol) development 2,827 ~66% apertium-bak (languages) Fran, Jonathan, Ilnar, Milli
apertium-mrj Hill Mari mrj HFST (lexc+twol) development 53,051 apertium-mrj-fin (incubator) Fran, kuprina, jackrueter
apertium-udm Udmurt udm HFST (lexc+twol) prototype 196 apertium-udm-rus (nursery) Fran, Trond, Andrey, Лукерья, Алексей
apertium-kpv Komi-Zyrian kpv HFST (lexc+twol) prototype 135 apertium-kpv-mhr (incubator) Fran, Trond, Fedina, Andrei Chemyshev
apertium-mhr Meadow Mari mhr HFST (lexc+twol) prototype 117 apertium-kpv-mhr (incubator) Fran, Fedina, Andrei Chemyshev

Existing language pairs[edit]

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

tat chv bak mrj udm mhr myv kpv
tat - chv-tat
198
tat-bak
2,941
chv chv-tat
198
-
bak tat-bak
2,941
-
mrj -
udm -
mhr - kpv-mhr
127
myv -
kpv kpv-mhr
127
-
fin mrj-fin
273
fin-udm
93
myv-fin
401
kpv-fin
1
kaz 'kaz-tat
'
kir tat-kir
rus tat-rus
5,999
cv-ru
75
udm-rus
148
tur tur-tat
3,317
cv-tr
100

The languages[edit]

Volga-Kama languages by subgroup[edit]

Volga-Kama language vulnerability[edit]

The following table shows information about Volga-Kama varieties.

language iso num speakers UNESCO classification
Tatar tat 6500K 0. none
Bashqort bak 1379K 1. vulnerable
Chuvash chv 1325K 1. vulnerable
Udmurt udm 0464K 2. definitely endangered
Mari - Eastern mhr 0414K 2. definitely endangered
Mordvin - Erzya myv 0400K 2. definitely endangered
Komi - Zyryan kpv 0217K 2. definitely endangered
Mordvin - Moksha mdf 0200K 2. definitely endangered
Komi - Permyak koi 0094K 2. definitely endangered
Mari - Western mrj 0037K 3. severely endangered
Komi - Yazva koi 0000K 3. severely endangered

Existing general resources[edit]

Grammars[edit]

Dictionaries[edit]

Existing computational resources[edit]

Corpora and corpora projects[edit]

Spell-checkers[edit]

Text-to-speech and speech-to-text systems[edit]

Keyboards[edit]

  • Xkb includes keyboards for the following languages:
    • Tatar
    • Chuvash
    • ...?

Morphological Transducers[edit]

Scholarship[edit]