Difference between revisions of "Languages of the Volga-Kama region"

From Apertium
Jump to navigation Jump to search
Line 74: Line 74:
|| <code>udm</code>
|| <code>udm</code>
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| prototype
|align="right"| {{#lst:Apertium-udm-rus/stats|udm-stems}}
|align="right"| {{#lst:Apertium-udm-rus/stats|udm-stems}}
|align="center"|
|align="center"|
Line 85: Line 85:
|| <code>kpv</code>
|| <code>kpv</code>
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| prototype
|align="right"| {{#lst:Apertium-kpv-mhr/stats|kpv-stems}}
|align="right"| {{#lst:Apertium-kpv-mhr/stats|kpv-stems}}
|align="center"|
|align="center"|
Line 96: Line 96:
|| <code>mhr</code>
|| <code>mhr</code>
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| prototype
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}}
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}}
|align="center"|
|align="center"|

Revision as of 23:43, 28 December 2013

The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status

The ultimate goal is to have multi-purposable transducers for a variety of Volga-Kama languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-myv Erzya myv HFST (lexc+twol) development apertium-myv-fin (incubator) Fran, Jack Rueter
apertium-tat Tatar tt tat HFST (lexc+twol) working 55,702 ~91% apertium-tat (languages) Ilnar, Fran, Jonathan, Milli
apertium-chv Chuvash cv chv HFST (lexc+twol) development 8,579 ~85% apertium-chv (languages) Hèctor
apertium-bak Bashkir ba bak HFST (lexc+twol) development 2,827 ~66% apertium-bak (languages) Fran, Jonathan, Ilnar, Milli
apertium-udm Udmurt udm HFST (lexc+twol) prototype apertium-fin-udm (incubator)
apertium-udm-rus (nursery)
Fran, Trond, Andrey, Лукерья, Алексей
apertium-kpv Komi-Zyrian kpv HFST (lexc+twol) prototype apertium-kpv-mhr (incubator)
apertium-kpv-fin (incubator)
Fran, Trond, Fedina, Andrei Chemyshev
apertium-mhr Meadow Mari mhr HFST (lexc+twol) prototype apertium-kpv-mhr (incubator) Fran, Fedina, Andrei Chemyshev

Existing language pairs

Volga-Kama–Volga-Kama pairs

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

tat chv bak udm mhr myv kpv
tat - cv-tt tat-bak
chv cv-tt -
bak tat-bak -
udm -
mhr - kpv-mhr
myv -
kpv kpv-mhr -

Pairs with non–Volga-Kama languages

tat chv bak udm mhr myv kpv
ru tt-ru cv-ru udm-rus
ky tt-ky
tr cv-tr
fin fin-udm myv-fin kpv-fin

Table of dix progress

tat chv bak udm mhr myv kpv
tat -
chv -
bak -
udm -
mhr -
myv -
kpv -
ru
ky 10
tr
fin

The languages

Volga-Kama languages by subgroup

Volga-Kama language vulnerability

The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages.

language iso num speakers UNESCO classification
Tatar tat 6500K 0. none
Bashqort bak 1379K 1. vulnerable
Chuvash chv 1325K 1. vulnerable
Udmurt udm 0464K 2. definitely endangered
Mari - Eastern mhr 0414K 2. definitely endangered
Mordvin - Erzya myv 0400K 2. definitely endangered
Komi - Zyryan kpv 0217K 2. definitely endangered
Mordvin - Moksha mdf 0200K 2. definitely endangered
Komi - Permyak koi 0094K 2. definitely endangered
Mari - Western mrj 0037K 3. severely endangered
Komi - Yazva koi 0000K 3. severely endangered

Existing general resources

Grammars

Dictionaries

Existing computational resources

Corpora and corpora projects

Spell-checkers

Text-to-speech and speech-to-text systems

Keyboards

  • Xkb includes keyboards for the following languages:
    • Tatar
    • Chuvash
    • ...?

Morphological Transducers

Scholarship