Difference between revisions of "Languages of the Volga-Kama region"

From Apertium
Jump to navigation Jump to search
(→‎Transducers: updated tatar)
Line 41: Line 41:
|| <code>tat</code>
|| <code>tat</code>
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| {{#lst:Apertium-tat/stats|state}}
|| working
|align="right"| {{#lst:Apertium-tat/stats|stems}}
|align="right"| {{#lst:Apertium-tat/stats|stems}}
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]]
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]]
|| {{#lst:Apertium-tat/stats|location}}
|| [[apertium-tat]] ([[languages]])
|| {{#lst:Apertium-tat/stats|authors}}
|| [[User:Ilnar.salimzyan|Ilnar]], [[User:Francis Tyers|Fran]], [[User:Firespeaker|Jonathan]], Milli
|-
|-
| <code>[[apertium-chv]]</code>
| <code>[[apertium-chv]]</code>

Revision as of 22:33, 2 March 2014

The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status

The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-myv Erzya myv HFST (lexc+twol) development apertium-myv-fin (incubator) Fran, Jack Rueter
apertium-tat Tatar tt tat HFST (lexc+twol) production 55,702 ~91% apertium-tat (languages) Ilnar, Fran, Jonathan, Röstäm
apertium-chv Chuvash cv chv HFST (lexc+twol) development 8,579 ~85% apertium-chv (languages) Hèctor
apertium-bak Bashkir ba bak HFST (lexc+twol) development 2,827 ~66% apertium-bak (languages) Fran, Jonathan, Ilnar, Milli
apertium-mrj Hill Mari mrj HFST (lexc+twol) development apertium-mrj-fin (incubator) Fran, kuprina, jackrueter
apertium-udm Udmurt udm HFST (lexc+twol) prototype apertium-udm-rus (nursery) Fran, Trond, Andrey, Лукерья, Алексей
apertium-kpv Komi-Zyrian kpv HFST (lexc+twol) prototype apertium-kpv-mhr (incubator) Fran, Trond, Fedina, Andrei Chemyshev
apertium-mhr Meadow Mari mhr HFST (lexc+twol) prototype apertium-kpv-mhr (incubator) Fran, Fedina, Andrei Chemyshev

Existing language pairs

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

tat chv bak mrj udm mhr myv kpv
tat - cv-tt
tat-bak
chv cv-tt
-
bak tat-bak
-
mrj -
udm -
mhr - kpv-mhr
myv -
kpv kpv-mhr
-
fin mrj-fin
fin-udm
myv-fin
kpv-fin
kaz kaz-tat
kir tat-kir
rus tt-ru
cv-ru
udm-rus
tur tur-tat
cv-tr

The languages

Volga-Kama languages by subgroup

Volga-Kama language vulnerability

The following table shows information about Volga-Kama varieties.

language iso num speakers UNESCO classification
Tatar tat 6500K 0. none
Bashqort bak 1379K 1. vulnerable
Chuvash chv 1325K 1. vulnerable
Udmurt udm 0464K 2. definitely endangered
Mari - Eastern mhr 0414K 2. definitely endangered
Mordvin - Erzya myv 0400K 2. definitely endangered
Komi - Zyryan kpv 0217K 2. definitely endangered
Mordvin - Moksha mdf 0200K 2. definitely endangered
Komi - Permyak koi 0094K 2. definitely endangered
Mari - Western mrj 0037K 3. severely endangered
Komi - Yazva koi 0000K 3. severely endangered

Existing general resources

Grammars

Dictionaries

Existing computational resources

Corpora and corpora projects

Spell-checkers

Text-to-speech and speech-to-text systems

Keyboards

  • Xkb includes keyboards for the following languages:
    • Tatar
    • Chuvash
    • ...?

Morphological Transducers

Scholarship