Difference between revisions of "Languages of the Volga-Kama region"
Line 100: | Line 100: | ||
|| [[HFST|HFST (lexc+twol)]] |
|| [[HFST|HFST (lexc+twol)]] |
||
|| development |
|| development |
||
|align="right"| |
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}} |
||
|align="center"| |
|align="center"| ? |
||
|| [[apertium-kpv-mhr]] ([[incubator]]) |
|| [[apertium-kpv-mhr]] ([[incubator]]) |
||
|| [[User:Francis_Tyers|Fran]], Fedina, chemyshev |
|| [[User:Francis_Tyers|Fran]], Fedina, chemyshev |
Revision as of 22:36, 27 December 2013
The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | Language | ISO 639 | speakers | UNESCO | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | ||||||||||
apertium-tat
|
Tatar | tt
|
tat
|
6500K | 0 (none) | HFST (lexc+twol) | production | 55,702 | ~91% | apertium-tat (languages) | Ilnar, Fran, Jonathan, Milli |
apertium-chv
|
Chuvash | cv
|
chv
|
1325K | 1 (vulnerable) | HFST (lexc+twol) | development | 8,579 | ~85% | apertium-chv (languages) | Hèctor |
apertium-bak
|
Bashkir | ba
|
bak
|
1379K | 1 (vulnerable) | HFST (lexc+twol) | development | 2,827 | ~66% | apertium-bak (languages) | Fran, Jonathan, Ilnar, Milli |
apertium-udm
|
Udmurt | –
|
udm
|
464K | 2 (definitely endangered) | HFST (lexc+twol) | development | ? | ? | apertium-fin-udm (incubator) apertium-udm-rus (nursery) |
Fran, Trond, Andrey, Лукерья, Алексей |
apertium-myv
|
Erzya | –
|
myv
|
400K | 2 (definitely endangered) | HFST (lexc+twol) | development | ? | apertium-myv-fin (incubator) | Fran, Jack Rueter | |
apertium-mhr
|
Eastern Mari | –
|
mhr
|
414K | 2 (definitely endangered) | HFST (lexc+twol) | development | ? | apertium-kpv-mhr (incubator) | Fran, Fedina, chemyshev | |
apertium-kpv
|
Komi-Zyrian | –
|
kpv
|
217K | 2 (definitely endangered) | HFST (lexc+twol) | development | ? | ? | apertium-kpv-mhr (incubator) apertium-kpv-fin (incubator) |
Fran, Trond, Fedina, chemyshev |
apertium-mdf
|
Moksha | –
|
mdf
|
200K | 2 (definitely endangered) | – | – | – | – | – | – |
apertium-koi
|
Komi-Permyak | –
|
koi
|
94K | 2 (definitely endangered) | – | – | – | – | – | – |
apertium-mrj
|
Western Mari | –
|
mrj
|
37K | 3 (severely endangered) | – | – | – | – | – | – |
apertium-kpvyaz
|
Komi-Yazva | –
|
–
|
0K | 3 (severely endangered) | – | – | – | – | – | – |
Existing language pairs
Volga-Kama–Volga-Kama pairs
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
tat | chv | bak | udm | mhr | kpv | |
---|---|---|---|---|---|---|
tat | - | cv-tt | tat-bak | |||
chv | cv-tt | - | ||||
bak | tat-bak | - | ||||
udm | - | |||||
mhr | - | kpv-mhr | ||||
kpv | kpv-mhr | - |
Pairs with non–Volga-Kama languages
tat | chv | bak | udm | mhr | kpv | |
---|---|---|---|---|---|---|
ru | tt-ru | cv-ru | udm-rus | |||
ky | tt-ky | |||||
tr | cv-tr | |||||
fin | fin-udm | kpv-fin |
Table of dix progress
tat | chv | bak | udm | mhr | kpv | |
---|---|---|---|---|---|---|
tat | - | ? | ? | |||
chv | ? | - | ||||
bak | ? | - | ||||
udm | - | |||||
mhr | - | ? | ||||
kpv | ? | - | ||||
ru | ? | ? | ? | |||
ky | ? | |||||
tr | ? | |||||
fin | ? | ? |
The languages
The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages.
language | iso | num speakers | UNESCO classification | Apertium support |
---|---|---|---|---|
Tatar | tat |
6500K | 0. none | incubator/apertium-tr-tt/
incubator/apertium-tt-kk/ incubator/apertium-tt-ky/ incubator/apertium-tt-ru/ incubator/apertium-cv-tt/ nursery/apertium-tt-ba/ |
Bashqort | bak |
1379K | 1. vulnerable | nursery/apertium-tt-ba/ |
Chuvash | chv |
1325K | 1. vulnerable | incubator/apertium-cv-ru/
incubator/apertium-cv-tr/ incubator/apertium-cv-tt/ |
Udmurt | udm |
0464K | 2. definitely endangered | incubator/apertium-fin-udm/
nursery/apertium-udm-rus/ |
Mari - Eastern | mhr |
0414K | 2. definitely endangered | incubator/apertium-kpv-mhr/ |
Mordvin - Erzya | myv |
0400K | 2. definitely endangered | |
Komi - Zyryan | kpv |
0217K | 2. definitely endangered | incubator/apertium-kpv-mhr/ |
Mordvin - Moksha | mdf |
0200K | 2. definitely endangered | |
Komi - Permyak | koi |
0094K | 2. definitely endangered | |
Mari - Western | mrj |
0037K | 3. severely endangered | |
Komi - Yazva | koi |
0000K | 3. severely endangered |
Existing general resources
Grammars
Dictionaries
Existing computational resources
Corpora and corpora projects
Spell-checkers
Text-to-speech and speech-to-text systems
Keyboards
- Xkb includes keyboards for the following languages:
- Tatar
- Chuvash
- ...?