Difference between revisions of "Languages of the Volga-Kama region"

From Apertium
Jump to navigation Jump to search
(Update to new style)
Line 15: Line 15:
!rowspan=2| Language
!rowspan=2| Language
!colspan=2 class="unsortable"| ISO 639
!colspan=2 class="unsortable"| ISO 639
!rowspan=2| speakers
!rowspan=2| UNESCO
!rowspan=2| formalism
!rowspan=2| formalism
!rowspan=2| state
!rowspan=2| state
Line 26: Line 24:
! -2
! -2
! -3
! -3
|-
| <code>[[apertium-myv]]</code>
|| [[Erzya]]
|align="center"|<code>&ndash;</code>
|| <code>myv</code>
|| [[HFST|HFST (lexc+twol)]]
|| development
|align="right"| {{#lst:Apertium-myv-fin/stats|myv-stems}}
|align="center"|
|| [[apertium-myv-fin]]&nbsp;([[incubator]])
|| [[User:Francis_Tyers|Fran]], Jack Rueter
|-
|-
|| <code>[[apertium-tat]]</code>
|| <code>[[apertium-tat]]</code>
Line 31: Line 40:
|| <code>tt</code>
|| <code>tt</code>
|| <code>tat</code>
|| <code>tat</code>
|align="right"| 6500K
|align="right"| 0 (none)
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| working
|align="right"| {{#lst:Apertium-tat/stats|stems}}
|align="right"| {{#lst:Apertium-tat/stats|stems}}
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]]
|align="center"| [[Apertium-tat#Current_State|~{{:Apertium-tat/stats/average}}%]]
Line 44: Line 51:
|| <code>cv</code>
|| <code>cv</code>
|| <code>chv</code>
|| <code>chv</code>
|align="right"| 1325K
|align="right"| 1 (vulnerable)
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| development
|align="right"| {{:apertium-chv/stems}}
|align="right"| {{:apertium-chv/stems}}
|align="center"| [[apertium-chv#Current_State|~{{:apertium-chv/stats/average}}%]]
|align="center"| [[apertium-chv#Current_State|~{{:apertium-chv/stats/average}}%]]
Line 57: Line 62:
|| <code>ba</code>
|| <code>ba</code>
|| <code>bak</code>
|| <code>bak</code>
|align="right"| 1379K
|align="right"| 1 (vulnerable)
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| development
|align="right"| {{:apertium-bak/stems}}
|align="right"| {{:apertium-bak/stems}}
|align="center"| [[apertium-bak#Current_State|~{{:apertium-bak/stats/average}}%]]
|align="center"| [[apertium-bak#Current_State|~{{:apertium-bak/stats/average}}%]]
Line 70: Line 73:
|align="center"| <code>&ndash;</code>
|align="center"| <code>&ndash;</code>
|| <code>udm</code>
|| <code>udm</code>
|align="right"| 464K
|align="right"| 2 (definitely endangered)
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| development
|align="right"| ?
|align="right"| {{#lst:Apertium-udm-rus/stats|udm-stems}}
|align="center"| ?
|align="center"|
|| [[apertium-fin-udm]]&nbsp;([[incubator]]) <br> [[apertium-udm-rus]]&nbsp;([[nursery]])
|| [[apertium-fin-udm]]&nbsp;([[incubator]]) <br> [[apertium-udm-rus]]&nbsp;([[nursery]])
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], [[User:Andrewboltachev|Andrey]], Лукерья, Алексей
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], [[User:Andrewboltachev|Andrey]], Лукерья, Алексей
|-
|-
| <code>[[apertium-myv]]</code>
| <code>[[apertium-kpv]]</code>
|| [[Erzya]]
|| [[Komi-Zyrian]]
|align="center"|<code>&ndash;</code>
|align="center"|<code>&ndash;</code>
|| <code>myv</code>
|| <code>kpv</code>
|align="right"| 400K
|align="right"| 2 (definitely endangered)
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| development
|align="right"| {{#lst:Apertium-myv-fin/stats|myv-stems}}
|align="right"| {{#lst:Apertium-kpv-mhr/stats|kpv-stems}}
|align="center"| ?
|align="center"|
|| [[apertium-myv-fin]]&nbsp;([[incubator]])
|| [[apertium-kpv-mhr]]&nbsp;([[incubator]]) <br> [[apertium-kpv-fin]]&nbsp;([[incubator]])
|| [[User:Francis_Tyers|Fran]], Jack Rueter
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], Fedina, Andrei Chemyshev
|-
|-
| <code>[[apertium-mhr]]</code>
| <code>[[apertium-mhr]]</code>
|| [[Eastern Mari]]
|| [[Meadow Mari]]
|align="center"|<code>&ndash;</code>
|align="center"|<code>&ndash;</code>
|| <code>mhr</code>
|| <code>mhr</code>
|align="right"| 414K
|align="right"| 2 (definitely endangered)
|| [[HFST|HFST (lexc+twol)]]
|| [[HFST|HFST (lexc+twol)]]
|| development
|| development
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}}
|align="right"| {{#lst:Apertium-kpv-mhr/stats|mhr-stems}}
|align="center"| ?
|align="center"|
|| [[apertium-kpv-mhr]]&nbsp;([[incubator]])
|| [[apertium-kpv-mhr]]&nbsp;([[incubator]])
|| [[User:Francis_Tyers|Fran]], Fedina, chemyshev
|| [[User:Francis_Tyers|Fran]], Fedina, Andrei Chemyshev
|-
| <code>[[apertium-kpv]]</code>
|| [[Komi-Zyrian]]
|align="center"|<code>&ndash;</code>
|| <code>kpv</code>
|align="right"| 217K
|align="right"| 2 (definitely endangered)
|| [[HFST|HFST (lexc+twol)]]
|| development
|align="right"| ?
|align="center"| ?
|| [[apertium-kpv-mhr]]&nbsp;([[incubator]]) <br> [[apertium-kpv-fin]]&nbsp;([[incubator]])
|| [[User:Francis_Tyers|Fran]], [[User:Trondtr|Trond]], Fedina, chemyshev
|-
| <code>[[apertium-mdf]]</code>
|| [[Moksha]]
|align="center"|<code>&ndash;</code>
|| <code>mdf</code>
|align="right"| 200K
|align="right"| 2 (definitely endangered)
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|-
| <code>[[apertium-koi]]</code>
|| [[Komi-Permyak]]
|align="center"|<code>&ndash;</code>
|| <code>koi</code>
|align="right"| 94K
|align="right"| 2 (definitely endangered)
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|-
| <code>[[apertium-mrj]]</code>
|| [[Western Mari]]
|align="center"|<code>&ndash;</code>
|| <code>mrj</code>
|align="right"| 37K
|align="right"| 3 (severely endangered)
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|-
| <code>[[apertium-kpvyaz]]</code>
|| [[Komi-Yazva]]
|align="center"|<code>&ndash;</code>
|align="center"|<code>&ndash;</code>
|align="right"| 0K
|align="right"| 3 (severely endangered)
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|align="center"| &ndash;
|}
|}


Line 179: Line 111:
{| style="text-align: center;" class="wikitable"
{| style="text-align: center;" class="wikitable"
|- style="background: #ececec"
|- style="background: #ececec"
! !! tat !! chv !! bak !! udm !! mhr !! kpv
! !! tat !! chv !! bak !! udm !! mhr !! myv !! kpv
|-
|-
| '''tat''' || - || ''[[apertium-cv-tt|cv-tt]]'' || ''[[apertium-tat-bak|tat-bak]]'' || || ||
| '''tat''' || - || ''[[apertium-cv-tt|cv-tt]]'' || ''[[apertium-tat-bak|tat-bak]]'' || || || ||
|-
|-
| '''chv''' || ''[[apertium-cv-tt|cv-tt]]'' || - || || || ||
| '''chv''' || ''[[apertium-cv-tt|cv-tt]]'' || - || || || || ||
|-
|-
| '''bak''' || ''[[apertium-tat-bak|tat-bak]]'' || || - || || ||
| '''bak''' || ''[[apertium-tat-bak|tat-bak]]'' || || - || || || ||
|-
|-
| '''udm''' || || || || - || ||
| '''udm''' || || || || - || || ||
|-
|-
| '''mhr''' || || || || || - || ''[[apertium-kpv-mhr|kpv-mhr]]''
| '''mhr''' || || || || || - || || ''[[apertium-kpv-mhr|kpv-mhr]]''
|-
|-
| '''kpv''' || || || || || ''[[apertium-kpv-mhr|kpv-mhr]]'' || -
| '''myv''' || || || || || || - ||
|-
| '''kpv''' || || || || || ''[[apertium-kpv-mhr|kpv-mhr]]'' || || -
|}
|}


Line 197: Line 131:
{| style="text-align: center;" class="wikitable"
{| style="text-align: center;" class="wikitable"
|- style="background: #ececec"
|- style="background: #ececec"
! !! tat !! chv !! bak !! udm !! mhr !! kpv
! !! tat !! chv !! bak !! udm !! mhr !! myv !!kpv
|-
|-
| '''ru''' || ''[[apertium-tt-ru|tt-ru]]'' || ''[[apertium-cv-ru|cv-ru]]'' || || ''[[apertium-udm-rus|udm-rus]]'' || ||
| '''ru''' || ''[[apertium-tt-ru|tt-ru]]'' || ''[[apertium-cv-ru|cv-ru]]'' || || ''[[apertium-udm-rus|udm-rus]]'' || || ||
|-
|-
| '''ky''' || ''[[apertium-tt-ky|tt-ky]]'' || || || || ||
| '''ky''' || ''[[apertium-tt-ky|tt-ky]]'' || || || || || ||
|-
|-
| '''tr''' || || ''[[apertium-cv-tr|cv-tr]]'' || || || ||
| '''tr''' || || ''[[apertium-cv-tr|cv-tr]]'' || || || || ||
|-
|-
| '''fin''' || || || || ''[[apertium-fin-udm|fin-udm]]'' || || ''[[apertium-kpv-fin|kpv-fin]]''
| '''fin''' || || || || ''[[apertium-fin-udm|fin-udm]]'' || || ''[[apertium-myv-fin|myv-fin]]'' || ''[[apertium-kpv-fin|kpv-fin]]''
|}
|}


Line 211: Line 145:
{| style="text-align: center;" class="wikitable"
{| style="text-align: center;" class="wikitable"
|- style="background: #ececec"
|- style="background: #ececec"
! !! tat !! chv !! bak !! udm !! mhr !! kpv
! !! tat !! chv !! bak !! udm !! mhr !! myv !! kpv
|-
|-
| '''tat''' || - || ? || ? || || ||
| '''tat''' || - || {{#lst:Apertium-cv-tt/stats|cv-tt-stems}} || {{#lst:Apertium-tat-bak/stats|tat-bak-stems}} || || || ||
|-
|-
| '''chv''' || ? || - || || || ||
| '''chv''' || {{#lst:Apertium-cv-tt/stats|cv-tt-stems}} || - || || || || ||
|-
|-
| '''bak''' || ? || || - || || ||
| '''bak''' || {{#lst:Apertium-tat-bak/stats|tat-bak-stems}} || || - || || || ||
|-
|-
| '''udm''' || || || || - || ||
| '''udm''' || || || || - || || ||
|-
|-
| '''mhr''' || || || || || - || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}}
| '''mhr''' || || || || || - || || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}}
|-
|-
| '''kpv''' || || || || || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}} || -
| '''myv''' || || || || || || - ||
|-
|-
| || || || || || ||
| '''kpv''' || || || || || {{#lst:Apertium-kpv-mhr/stats|kpv-mhr-stems}} || || -
|-
|-
| '''ru''' || ? || ? || || ? || ||
| || || || || || || ||
|-
|-
| '''ky''' || ? || || || || ||
| '''ru''' || {{#lst:Apertium-tt-ru/stats|tt-ru-stems}} || {{#lst:Apertium-cv-ru/stats|cv-ru-stems}} || || {{#lst:Apertium-udm-rus/stats|udm-rus-stems}} || || ||
|-
|-
| '''tr''' || || ? || || || ||
| '''ky''' || {{#lst:Apertium-tt-ky/stats|tt-ky-stems}} || || || || || ||
|-
|-
| '''fin''' || || || || {{#lst:Apertium-fin-udm/stats|fin-udm-stems}} || || ?
| '''tr''' || || {{#lst:Apertium-cv-tr/stats|stems}} || || || || ||
|-
| '''fin''' || || || || {{#lst:Apertium-fin-udm/stats|fin-udm-stems}} || || {{#lst:Apertium-myv-fin/stats|myv-fin-stems}} || {{#lst:Apertium-kpv-fin/stats|kpv-fin-stems}}
|}
|}


== The languages ==
== The languages ==

=== Volga-Kama languages by subgroup ===
* [[Turkic languages|Turkic]]
** North Qıpçaq: [[Tatar]], [[Bashqort]]
** Oğur: [[Chuvash]]

* [[Uralic languages|Uralic]] → Finno-Ugric → Finno-Permic
** Permic: [[Komi]] (Komi-Zyrian, Komi-Permyak, Komi-Yazva), [[Udmurt]]
** Finno-Volgaic
*** [[Mari]]: [[Meadow Mari|Meadow Mari (Eastern)]], [[Hill Mari|Hill Mari (Western)]]
*** [[Mordvin]]: [[Erzya]], [[Moksha]]

=== Volga-Kama language vulnerability ===

The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages.
The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages.


{|class="wikitable sortable"
{|class="wikitable sortable"
|-
|-
! language !! iso !! num speakers !! UNESCO classification !! Apertium support
! language !! iso !! num speakers !! UNESCO classification
|-
|-
| Tatar || <code>tat</code> || 6500K || 0. none || incubator/apertium-tr-tt/
| Tatar || <code>tat</code> || 6500K || 0. none
incubator/apertium-tt-kk/
incubator/apertium-tt-ky/
incubator/apertium-tt-ru/
incubator/apertium-cv-tt/
nursery/apertium-tt-ba/
|-
|-
| Bashqort || <code>bak</code> || 1379K || 1. vulnerable || nursery/apertium-tt-ba/
| Bashqort || <code>bak</code> || 1379K || 1. vulnerable
|-
|-
| Chuvash || <code>chv</code> || 1325K || 1. vulnerable || incubator/apertium-cv-ru/
| Chuvash || <code>chv</code> || 1325K || 1. vulnerable
incubator/apertium-cv-tr/
incubator/apertium-cv-tt/
|-
|-
| Udmurt || <code>udm</code> || 0464K || 2. definitely endangered || incubator/apertium-fin-udm/
| Udmurt || <code>udm</code> || 0464K || 2. definitely endangered
nursery/apertium-udm-rus/
|-
|-
| Mari - Eastern || <code>mhr</code> || 0414K || 2. definitely endangered || incubator/apertium-kpv-mhr/
| Mari - Eastern || <code>mhr</code> || 0414K || 2. definitely endangered
|-
|-
| Mordvin - Erzya || <code>myv</code> || 0400K || 2. definitely endangered ||
| Mordvin - Erzya || <code>myv</code> || 0400K || 2. definitely endangered
|-
|-
| Komi - Zyryan || <code>kpv</code> || 0217K || 2. definitely endangered || incubator/apertium-kpv-mhr/
| Komi - Zyryan || <code>kpv</code> || 0217K || 2. definitely endangered
|-
|-
| Mordvin&nbsp;-&nbsp;Moksha || <code>mdf</code> || 0200K || 2. definitely endangered ||
| Mordvin&nbsp;-&nbsp;Moksha || <code>mdf</code> || 0200K || 2. definitely endangered
|-
|-
| Komi - Permyak || <code>koi</code> || 0094K || 2. definitely endangered ||
| Komi - Permyak || <code>koi</code> || 0094K || 2. definitely endangered
|-
|-
| Mari - Western || <code>mrj</code> || 0037K || 3. severely endangered ||
| Mari - Western || <code>mrj</code> || 0037K || 3. severely endangered
|-
|-
| Komi - Yazva || <code>koi</code> || 0000K || 3. severely endangered ||
| Komi - Yazva || <code>koi</code> || 0000K || 3. severely endangered
|}
|}



Revision as of 06:45, 28 December 2013

The languages of the Volga-Kama region include several Turkic and Uralic languages spoken in the Volga-Kama region (along the Volga and Kama rivers) in Russia. These include [varieties of] Tatar, Bashqort, Chuvash, Mari, Komi, Mordvin, and Udmurt (and linguistically, to some extent, Russian).

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status

The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-myv Erzya myv HFST (lexc+twol) development apertium-myv-fin (incubator) Fran, Jack Rueter
apertium-tat Tatar tt tat HFST (lexc+twol) working 55,702 ~91% apertium-tat (languages) Ilnar, Fran, Jonathan, Milli
apertium-chv Chuvash cv chv HFST (lexc+twol) development 8,579 ~85% apertium-chv (languages) Hèctor
apertium-bak Bashkir ba bak HFST (lexc+twol) development 2,827 ~66% apertium-bak (languages) Fran, Jonathan, Ilnar, Milli
apertium-udm Udmurt udm HFST (lexc+twol) development apertium-fin-udm (incubator)
apertium-udm-rus (nursery)
Fran, Trond, Andrey, Лукерья, Алексей
apertium-kpv Komi-Zyrian kpv HFST (lexc+twol) development apertium-kpv-mhr (incubator)
apertium-kpv-fin (incubator)
Fran, Trond, Fedina, Andrei Chemyshev
apertium-mhr Meadow Mari mhr HFST (lexc+twol) development apertium-kpv-mhr (incubator) Fran, Fedina, Andrei Chemyshev

Existing language pairs

Volga-Kama–Volga-Kama pairs

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

tat chv bak udm mhr myv kpv
tat - cv-tt tat-bak
chv cv-tt -
bak tat-bak -
udm -
mhr - kpv-mhr
myv -
kpv kpv-mhr -

Pairs with non–Volga-Kama languages

tat chv bak udm mhr myv kpv
ru tt-ru cv-ru udm-rus
ky tt-ky
tr cv-tr
fin fin-udm myv-fin kpv-fin

Table of dix progress

tat chv bak udm mhr myv kpv
tat -
chv -
bak -
udm -
mhr -
myv -
kpv -
ru
ky 10
tr
fin

The languages

Volga-Kama languages by subgroup

Volga-Kama language vulnerability

The following table shows information about Volga-Kama varieties and information about apertium projects related to the languages.

language iso num speakers UNESCO classification
Tatar tat 6500K 0. none
Bashqort bak 1379K 1. vulnerable
Chuvash chv 1325K 1. vulnerable
Udmurt udm 0464K 2. definitely endangered
Mari - Eastern mhr 0414K 2. definitely endangered
Mordvin - Erzya myv 0400K 2. definitely endangered
Komi - Zyryan kpv 0217K 2. definitely endangered
Mordvin - Moksha mdf 0200K 2. definitely endangered
Komi - Permyak koi 0094K 2. definitely endangered
Mari - Western mrj 0037K 3. severely endangered
Komi - Yazva koi 0000K 3. severely endangered

Existing general resources

Grammars

Dictionaries

Existing computational resources

Corpora and corpora projects

Spell-checkers

Text-to-speech and speech-to-text systems

Keyboards

  • Xkb includes keyboards for the following languages:
    • Tatar
    • Chuvash
    • ...?

Morphological Transducers

Scholarship