Difference between revisions of "Balkan languages"
Firespeaker (talk | contribs) |
|||
(31 intermediate revisions by 4 users not shown) | |||
Line 43: | Line 43: | ||
|| <code>hbs</code> |
|| <code>hbs</code> |
||
|| [[lttoolbox]] |
|| [[lttoolbox]] |
||
|| |
|| production |
||
|align="right"| {{#lst:Apertium-hbs/stats|stems}} |
|align="right"| {{#lst:Apertium-hbs/stats|stems}} |
||
|align="right"| {{#lst:Apertium-hbs/stats|paradigms}} |
|align="right"| {{#lst:Apertium-hbs/stats|paradigms}} |
||
|align="center"| [[Apertium-hbs#Current_State|~{{:Apertium-hbs/stats/average}}%]] |
|align="center"| [[Apertium-hbs#Current_State|~{{:Apertium-hbs/stats/average}}%]] |
||
|| [[apertium-hbs]] ([[languages]]) |
|| [[apertium-hbs]] ([[languages]]) |
||
|| [[User: Francis Tyers|Fran]] |
|| [[User: Francis Tyers|Fran]], [[User:Fpetkovski|Petkovski]], Aleš, [[User:Krvoje|hrvoj]] |
||
|- |
|- |
||
|| <code>[[apertium-slv]]</code> |
|| <code>[[apertium-slv]]</code> |
||
Line 60: | Line 60: | ||
|align="center"| [[Apertium-slv#Current_State|~{{:Apertium-slv/stats/average}}%]] |
|align="center"| [[Apertium-slv#Current_State|~{{:Apertium-slv/stats/average}}%]] |
||
|| [[apertium-hbs-slv]] ([[trunk]])<br />[[apertium-slv-pol]] ([[incubator]])<br />[[apertium-sl-mk]] ([[incubator]]) |
|| [[apertium-hbs-slv]] ([[trunk]])<br />[[apertium-slv-pol]] ([[incubator]])<br />[[apertium-sl-mk]] ([[incubator]]) |
||
|| [[User:Francis Tyers|Fran]], [[User:Fpetkovski|Petkovski]], [[User:Krvoje|Peradin]], |
|| [[User:Francis Tyers|Fran]], [[User:Fpetkovski|Petkovski]], [[User:Krvoje|Peradin]], Aleš, Čabrilo, Dimitrijev |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
|| working |
|||
⚫ | |||
|align="right"| {{#lst:Apertium-tur/stats|paradigms}} |
|||
⚫ | |||
⚫ | |||
⚫ | |||
|- |
|- |
||
|| <code>[[apertium-bul]]</code> |
|| <code>[[apertium-bul]]</code> |
||
Line 85: | Line 73: | ||
|| [[apertium-bul]] ([[languages]]) |
|| [[apertium-bul]] ([[languages]]) |
||
|| [[User:Francis Tyers|Fran]], [[User:Tihomir|Tihomir]] |
|| [[User:Francis Tyers|Fran]], [[User:Tihomir|Tihomir]] |
||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
⚫ | |||
|| development |
|||
⚫ | |||
|align="right"| - |
|||
⚫ | |||
⚫ | |||
⚫ | |||
|- |
|- |
||
|| <code>[[apertium-sqi]]</code> |
|| <code>[[apertium-sqi]]</code> |
||
Line 103: | Line 103: | ||
|| <code>ell</code> |
|| <code>ell</code> |
||
|| [[lttoolbox]] |
|| [[lttoolbox]] |
||
|| |
|| prototype |
||
|align="right"| {{#lst:Apertium-ell/stats|stems}} |
|align="right"| {{#lst:Apertium-ell/stats|stems}} |
||
|align="right"| {{#lst:Apertium-ell/stats|paradigms}} |
|align="right"| {{#lst:Apertium-ell/stats|paradigms}} |
||
Line 115: | Line 115: | ||
|| <code>rup</code> |
|| <code>rup</code> |
||
|| [[lttoolbox]] |
|| [[lttoolbox]] |
||
|| |
|| prototype |
||
|align="right"| {{#lst:Apertium-rup/stats|stems}} |
|align="right"| {{#lst:Apertium-rup/stats|stems}} |
||
|align="right"| {{#lst:Apertium-rup/stats|paradigms}} |
|align="right"| {{#lst:Apertium-rup/stats|paradigms}} |
||
|align="center"| - |
|align="center"| - |
||
|| [[apertium-rup]] ([[incubator]]) |
|| [[apertium-rup]] ([[incubator]]) |
||
|| [[User: Francis Tyers|Fran]], shopskasalata |
|| [[User: Francis Tyers|Fran]], [[User:Shopskasalata|shopskasalata]] |
||
|- |
|- |
||
|| <code>[[apertium-ron]]</code> |
|| <code>[[apertium-ron]]</code> |
||
Line 127: | Line 127: | ||
|| <code>ron</code> |
|| <code>ron</code> |
||
|| [[lttoolbox]] |
|| [[lttoolbox]] |
||
|| possibly non-existant |
|||
|| ? |
|||
|align="right"| ? |
|align="right"| ? |
||
|align="right"| ? |
|align="right"| ? |
||
Line 135: | Line 135: | ||
|} |
|} |
||
=== Balkan Language Classification |
=== Balkan Language Classification=== |
||
The languages that share these similarities belong to five distinct branches of the Indo-European languages: |
The languages that share these similarities belong to five distinct branches of the Indo-European languages<sup><small>[https://en.wikipedia.org/wiki/Balkan_language_area#Languages]</small></sup>: |
||
* '''[[Albanian]]''': Arvanitika, Gheg, Tosk |
* '''[[Albanian]]''': Arvanitika, Gheg, Tosk |
||
* '''[[Hellenic languages]]''': [[Greek|Standard Greek]], Cappadocian Greek, Pontic Greek |
* '''[[Hellenic languages]]''': [[Greek|Standard Greek]], Cappadocian Greek, Pontic Greek |
||
Line 146: | Line 146: | ||
=== Existing language pairs === |
=== Existing language pairs === |
||
==== Balkan-Balkan pairs ==== |
|||
Text in ''italic'' denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk. |
Text in ''italic'' denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk. |
||
{| style="text-align: center;" class="wikitable" |
{| style="text-align: center;" class="wikitable dixtable" |
||
|- style="background: #ececec" |
|- style="background: #ececec" |
||
! |
! !! mkd !! hbs !! slv !! bul !! tur !! sqi !! ell !! rup !! ron |
||
|- |
|- |
||
| '''mkd''' || - || '''[[Apertium-hbs-mkd|hbs-mkd]]'''<br>'''{{#lst:Apertium-hbs-mkd/stats|hbs-mkd_stems}}''' || ''[[Apertium-sl-mk|sl-mk]]''<br>{{#lst:Apertium-sl-mk/stats|sl-mk_stems}} || '''[[Apertium-mk-bg|mk-bg]]'''<br>'''{{#lst:Apertium-mk-bg/stats|mk-bg_stems}}''' || || ''[[Apertium-mk-sq|mk-sq]]''<br>{{#lst:Apertium-mk-sq/stats|mk-sq_stems}} || || || |
|||
| '''bul''' || - || '''[[mk-bg]]''' || || || || ''[[bg-el]]'' || || || |
|||
|- |
|- |
||
| '''hbs''' || '''[[Apertium-hbs-mkd|hbs-mkd]]'''<br>'''{{#lst:Apertium-hbs-mkd/stats|hbs-mkd_stems}}''' || - || '''[[Apertium-hbs-slv|hbs-slv]]'''<br>'''{{#lst:Apertium-hbs-slv/stats|hbs-slv_stems}}''' || || || || || || |
|||
| '''mkd''' || '''[[mk-bg]]''' || - || || || ''[[mk-sq]]'' || || || || |
|||
|- |
|- |
||
| '''slv''' || ''[[Apertium-sl-mk|sl-mk]]''<br>{{#lst:Apertium-sl-mk/stats|sl-mk_stems}} || '''[[Apertium-hbs-slv|hbs-slv]]'''<br>'''{{#lst:Apertium-hbs-slv/stats|hbs-slv_stems}}''' || - || || || || || || |
|||
| '''ron''' || || || - || ''[[ron-rup]]'' || || || || || |
|||
|- |
|- |
||
| '''bul''' || '''[[Apertium-mk-bg|mk-bg]]'''<br>'''{{#lst:Apertium-mk-bg/stats|mk-bg_stems}}''' || || || - || || || ''[[Apertium-bg-el|bg-el]]''<br>{{#lst:Apertium-bg-el/stats|bg-el_stems}} || || |
|||
| '''rup''' || || || ''[[ron-rup]]'' || - || || || || || |
|||
|- |
|- |
||
| ''' |
| '''tur''' || || || || || - || || || || |
||
|- |
|- |
||
| ''' |
| '''sqi''' || ''[[Apertium-mk-sq|mk-sq]]''<br>{{#lst:Apertium-mk-sq/stats|mk-sq_stems}} || || || || || - || || || |
||
|- |
|- |
||
| ''' |
| '''ell''' || || || || ''[[Apertium-bg-el|bg-el]]''<br>{{#lst:Apertium-bg-el/stats|bg-el_stems}} || || || - || || |
||
|- |
|- |
||
| ''' |
| '''rup''' || || || || || || || || - || ''[[Apertium-ron-rup|ron-rup]]''<br>{{#lst:Apertium-ron-rup/stats|ron-rup_stems}} |
||
|- |
|- |
||
| ''' |
| '''ron''' || || || || || || || || ''[[Apertium-ron-rup|ron-rup]]''<br>{{#lst:Apertium-ron-rup/stats|ron-rup_stems}} || - |
||
⚫ | |||
==== Pairs with non-Balkan languages ==== |
|||
{| style="text-align: center;" class="wikitable" |
|||
|- style="background: #ececec" |
|||
! !! bul !! mkd !! ron !! rup !! sqi !! ell !! hbs !! slv !! tur |
|||
|- |
|- |
||
| |
| || || || || || || || || || |
||
|- |
|- |
||
| '''aze''' || || || || || [[Apertium-tur-aze|tur-aze]]<br>{{#lst:Apertium-tur-aze/stats|tur-aze_stems}} || || || || |
|||
| '''ru''' || ''[[bg-ru]]'' || || || || || || ''[[hbs-rus]]'' || || |
|||
|- |
|- |
||
| |
| '''cat''' || || || || || || || || || [[Apertium-ca-ro|ca-ro]]<br>{{#lst:Apertium-ca-ro/stats|ca-ro_stems}} |
||
|- |
|- |
||
| '''ces''' || || ''[[Apertium-ces-hbs|ces-hbs]]''<br>{{#lst:Apertium-ces-hbs/stats|ces-hbs_stems}} || ''[[Apertium-cs-sl|cs-sl]]''<br>{{#lst:Apertium-cs-sl/stats|cs-sl_stems}} || || || || || || |
|||
| '''it''' || || || ''[[ro-it]]'' || || || || || ''[[sl-it]]'' || |
|||
|- |
|- |
||
| |
| '''chv''' || || || || || ''[[Apertium-cv-tr|cv-tr]]''<br>{{#lst:Apertium-cv-tr/stats|cv-tr_stems}} || || || || |
||
|- |
|- |
||
| '''eng''' || '''[[Apertium-mk-en|mk-en]]'''<br>'''{{#lst:Apertium-mk-en/stats|mk-en_stems}}''' || '''[[Apertium-hbs-eng|hbs-eng]]'''<br>'''{{#lst:Apertium-hbs-eng/stats|hbs-eng_stems}}''' || ''[[Apertium-sl-en|sl-en]]''<br>{{#lst:Apertium-sl-en/stats|sl-en_stems}} || [[Apertium-bg-en|bg-en]]<br>{{#lst:Apertium-bg-en/stats|bg-en_stems}} || ''[[Apertium-tr-en|tr-en]]''<br>{{#lst:Apertium-tr-en/stats|tr-en_stems}} || ''[[Apertium-en-sq|en-sq]]''<br>{{#lst:Apertium-en-sq/stats|en-sq_stems}} || ''[[Apertium-ell-eng|ell-eng]]''<br>{{#lst:Apertium-ell-eng/stats|ell-eng_stems}} || || |
|||
| '''pol''' || || || || || || || || ''[[slv-pol]]'' || |
|||
|- |
|- |
||
| |
| '''epo''' || || || || ''[[Apertium-eo-bg|eo-bg]]''<br>{{#lst:Apertium-eo-bg/stats|eo-bg_stems}} || ''[[Apertium-eo-tr|eo-tr]]''<br>{{#lst:Apertium-eo-tr/stats|eo-tr_stems}} || || ''[[Apertium-eo-el|eo-el]]''<br>{{#lst:Apertium-eo-el/stats|eo-el_stems}} || || |
||
|- |
|- |
||
| |
| '''fin''' || || ''[[Apertium-fin-hbs|fin-hbs]]''<br>{{#lst:Apertium-fin-hbs/stats|fin-hbs_stems}} || || || || || || || |
||
|- |
|- |
||
| |
| '''fra''' || || || || || || || || || ''[[Apertium-fr-ro|fr-ro]]''<br>{{#lst:Apertium-fr-ro/stats|fr-ro_stems}} |
||
|- |
|- |
||
| '''ina''' || || || || || || || || || ''[[Apertium-ron-ina|ron-ina]]''<br>{{#lst:Apertium-ron-ina/stats|ron-ina_stems}} |
|||
| '''es''' || || || '''[[es-ro]]''' || || || || || || |
|||
|- |
|- |
||
| '''ita''' || || || ''[[Apertium-slv-ita|slv-ita]]''<br>{{#lst:Apertium-slv-ita/stats|slv-ita_stems}} || || || || || || [[Apertium-ro-it|ro-it]]<br>{{#lst:Apertium-ro-it/stats|ro-it_stems}} |
|||
| '''cs''' || || || || || || || || ''[[cs-sl]]'' || |
|||
|- |
|- |
||
| |
| '''kir''' || || || || || [[Apertium-tur-kir|tur-kir]]<br>{{#lst:Apertium-tur-kir/stats|tur-kir_stems}} || || || || |
||
|- |
|- |
||
| '''pol''' || || ''[[Apertium-pol-hbs|pol-hbs]]''<br>{{#lst:Apertium-pol-hbs/stats|pol-hbs_stems}} || ''[[Apertium-slv-pol|slv-pol]]''<br>{{#lst:Apertium-slv-pol/stats|slv-pol_stems}} || || || || || || |
|||
| '''tat''' || || || || || || || || || ''[[tur-tat]]'' |
|||
|- |
|- |
||
| '''rus''' || || [[Apertium-hbs-rus|hbs-rus]]<br>{{#lst:Apertium-hbs-rus/stats|hbs-rus_stems}} || || ''[[Apertium-bg-ru|bg-ru]]''<br>{{#lst:Apertium-bg-ru/stats|bg-ru_stems}} || || || || || |
|||
| '''uzb''' || || || || || || || || || ''[[tur-uzb]]'' |
|||
|- |
|- |
||
| '''spa''' || || || [[Apertium-slv-spa|slv-spa]]<br>{{#lst:Apertium-slv-spa/stats|slv-spa_stems}} || || || || || || '''[[Apertium-es-ro|es-ro]]'''<br>'''{{#lst:Apertium-es-ro/stats|es-ro_stems}}''' |
|||
| '''aze''' || || || || || || || || || [[tur-aze]] |
|||
|- |
|- |
||
| '''tat''' || || || || || [[Apertium-tur-tat|tur-tat]]<br>{{#lst:Apertium-tur-tat/stats|tur-tat_stems}} || || || || |
|||
| '''cv''' || || || || || || || || || ''[[cv-tr]]'' |
|||
⚫ | |||
| '''tuk''' || || || || || [[Apertium-tuk-tur|tuk-tur]]<br>{{#lst:Apertium-tuk-tur/stats|tuk-tur_stems}} || || || || |
|||
⚫ | |||
| '''uzb''' || || || || || [[Apertium-tur-uzb|tur-uzb]]<br>{{#lst:Apertium-tur-uzb/stats|tur-uzb_stems}} || || || || |
|||
|} |
|} |
||
== |
== Samples == |
||
Article 1 of the Universal Declaration of Human Rights: |
|||
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.'' |
|||
===Monolingual=== |
|||
{|class= |
{|class=wikitable |
||
⚫ | |||
! Language !! Module !! Paradigms !! Lemmata !! Coverage (SETimes) !! Coverage (Wikipedia) |
|||
|- |
|||
| Bulgarian || [[Macedonian and Bulgarian]] || 305 || 7873 || 88.1% || 77.15% |
|||
|- |
|||
| Macedonian || [[Macedonian and Bulgarian]] || 225 || 8094 || 92.1% || |
|||
|- |
|- |
||
|| Macedonian || Сите човечки суштества се раѓаат слободни и еднакви по достоинство и права. Тие се обдарени со разум и совест и треба да се однесуваат еден кон друг во духот на општо човечката припадност. |
|||
| Romanian || [[Spanish and Romanian]] || 997 || 18719 || 89.7% || 83.62% |
|||
|- |
|- |
||
|| Slovenian || Vsi ljudje se rodijo svobodni in imajo enako dostojanstvo in enake pravice. Obdarjeni so z razumom in vestjo in bi morali ravnati drug z drugim kakor bratje. |
|||
| Aromanian || [[Incubator]] || 17 || 28 || - || |
|||
|- |
|- |
||
|| Bulgarian || Всички хора се раждат свободни и равни по достойнство и права. Те са надарени с разум и съвест и следва да се отнасят помежду си в дух на братство. |
|||
| Albanian || [[Incubator]] || 127 || 3302 || 80.2% || 65.62% |
|||
⚫ | |||
| Greek || [[Incubator]] || 377 || 859 || 49.4% || 49.75% |
|||
|- |
|||
| Serbo-Croatian || [[Incubator]] || 85 || 660 || - || |
|||
|- |
|||
| Slovenian || [[Incubator]] || 1128 || 20385 || - || |
|||
|- |
|||
| Turkish || (external: [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph]) || - || 37101 || || |
|||
|- |
|- |
||
|| Turkish || Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler. |
|||
|} |
|} |
||
Languages missing: Roma |
|||
===Bilingual=== |
|||
⚫ | |||
* [[Macedonian and Bulgarian]] |
|||
* [[Macedonian and English]] |
|||
==See also== |
==See also== |
||
Line 256: | Line 232: | ||
* [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph] |
* [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph] |
||
[[Category: |
[[Category:Languages of the Balkans]] |
Latest revision as of 19:27, 27 August 2017
The Balkan languages are those languages spoken in the Balkans, and possibly forming a part of the Balkan Sprachbund. They include Bulgarian, Macedonian, Romanian, Aromanian, Albanian, Greek, Serbo-Croatian, and a number of others.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
Balkan Language Classification[edit]
The languages that share these similarities belong to five distinct branches of the Indo-European languages[1]:
- Albanian: Arvanitika, Gheg, Tosk
- Hellenic languages: Standard Greek, Cappadocian Greek, Pontic Greek
- Romance languages: Aromanian, Romanian, Moldovan, Istro-Romanian, Megleno-Romanian
- Slavic languages
- Indic languages: Romani
Existing language pairs[edit]
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
mkd | hbs | slv | bul | tur | sqi | ell | rup | ron | |
---|---|---|---|---|---|---|---|---|---|
mkd | - | hbs-mkd 12,813 |
sl-mk 25,579 |
mk-bg 8,783 |
mk-sq ? |
||||
hbs | hbs-mkd 12,813 |
- | hbs-slv 24,717 |
||||||
slv | sl-mk 25,579 |
hbs-slv 24,717 |
- | ||||||
bul | mk-bg 8,783 |
- | bg-el 638 |
||||||
tur | - | ||||||||
sqi | mk-sq ? |
- | |||||||
ell | bg-el 638 |
- | |||||||
rup | - | ron-rup 402 | |||||||
ron | ron-rup 402 |
- | |||||||
aze | tur-aze 8,194 |
||||||||
cat | ca-ro 16,431 | ||||||||
ces | ces-hbs 167 |
cs-sl 570 |
|||||||
chv | cv-tr 100 |
||||||||
eng | mk-en 33,350 |
hbs-eng 16,228 |
sl-en 313 |
bg-en 10,242 |
tr-en 171 |
en-sq 580 |
ell-eng 830 |
||
epo | eo-bg ? |
eo-tr 1,500 |
eo-el 1,150 |
||||||
fin | fin-hbs 250 |
||||||||
fra | fr-ro 12,727 | ||||||||
ina | ron-ina 192 | ||||||||
ita | slv-ita 1,586 |
ro-it 10,093 | |||||||
kir | tur-kir 7,123 |
||||||||
pol | pol-hbs 136 |
slv-pol 354 |
|||||||
rus | hbs-rus 5,008 |
bg-ru 3,292 |
|||||||
spa | slv-spa |
es-ro 24,528 | |||||||
tat | tur-tat 3,317 |
||||||||
tuk | tuk-tur 3,387 |
||||||||
uzb | tur-uzb 3,519 |
Samples[edit]
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Macedonian | Сите човечки суштества се раѓаат слободни и еднакви по достоинство и права. Тие се обдарени со разум и совест и треба да се однесуваат еден кон друг во духот на општо човечката припадност. |
Slovenian | Vsi ljudje se rodijo svobodni in imajo enako dostojanstvo in enake pravice. Obdarjeni so z razumom in vestjo in bi morali ravnati drug z drugim kakor bratje. |
Bulgarian | Всички хора се раждат свободни и равни по достойнство и права. Те са надарени с разум и съвест и следва да се отнасят помежду си в дух на братство. |
Turkish | Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler. |