Difference between revisions of "Balkan languages"

From Apertium
Jump to navigation Jump to search
 
(31 intermediate revisions by 4 users not shown)
Line 43: Line 43:
|| <code>hbs</code>
|| <code>hbs</code>
|| [[lttoolbox]]
|| [[lttoolbox]]
|| working
|| production
|align="right"| {{#lst:Apertium-hbs/stats|stems}}
|align="right"| {{#lst:Apertium-hbs/stats|stems}}
|align="right"| {{#lst:Apertium-hbs/stats|paradigms}}
|align="right"| {{#lst:Apertium-hbs/stats|paradigms}}
|align="center"| [[Apertium-hbs#Current_State|~{{:Apertium-hbs/stats/average}}%]]
|align="center"| [[Apertium-hbs#Current_State|~{{:Apertium-hbs/stats/average}}%]]
|| [[apertium-hbs]] ([[languages]])
|| [[apertium-hbs]] ([[languages]])
|| [[User: Francis Tyers|Fran]]
|| [[User: Francis Tyers|Fran]], [[User:Fpetkovski|Petkovski]], Aleš, [[User:Krvoje|hrvoj]]
|-
|-
|| <code>[[apertium-slv]]</code>
|| <code>[[apertium-slv]]</code>
Line 60: Line 60:
|align="center"| [[Apertium-slv#Current_State|~{{:Apertium-slv/stats/average}}%]]
|align="center"| [[Apertium-slv#Current_State|~{{:Apertium-slv/stats/average}}%]]
|| [[apertium-hbs-slv]] ([[trunk]])<br />[[apertium-slv-pol]] ([[incubator]])<br />[[apertium-sl-mk]] ([[incubator]])
|| [[apertium-hbs-slv]] ([[trunk]])<br />[[apertium-slv-pol]] ([[incubator]])<br />[[apertium-sl-mk]] ([[incubator]])
|| [[User:Francis Tyers|Fran]], [[User:Fpetkovski|Petkovski]], [[User:Krvoje|Peradin]], Horvat, Čabrilo, Dimitrijev
|| [[User:Francis Tyers|Fran]], [[User:Fpetkovski|Petkovski]], [[User:Krvoje|Peradin]], Aleš, Čabrilo, Dimitrijev
|-
|| <code>[[apertium-tur]]</code>
|| [[Turkish]]
|| <code>tr</code>
|| <code>tur</code>
|| [[HFST]]
|| working
|align="right"| {{#lst:Apertium-tur/stats|stems}}
|align="right"| {{#lst:Apertium-tur/stats|paradigms}}
|align="center"| [[Apertium-tur#Current_State|~{{:Apertium-tur/stats/average}}%]]
|| [[apertium-tur]] ([[languages]])
|| [[User:Francis Tyers|Fran]], [[User:Zfe|Gianluca]], Sezgi Aydın
|-
|-
|| <code>[[apertium-bul]]</code>
|| <code>[[apertium-bul]]</code>
Line 85: Line 73:
|| [[apertium-bul]] ([[languages]])
|| [[apertium-bul]] ([[languages]])
|| [[User:Francis Tyers|Fran]], [[User:Tihomir|Tihomir]]
|| [[User:Francis Tyers|Fran]], [[User:Tihomir|Tihomir]]
|-
|| <code>[[apertium-tur]]</code>
|| [[Turkish]]
|| <code>tr</code>
|| <code>tur</code>
|| [[HFST]]
|| development
|align="right"| {{#lst:Apertium-tur/stats|stems}}
|align="right"| -
|align="center"| [[Apertium-tur#Current_State|~{{:Apertium-tur/stats/average}}%]]
|| [[apertium-tur]] ([[languages]])
|| [[User:Francis Tyers|Fran]], [[User:Zfe|Gianluca]], Sezgi Aydın
|-
|-
|| <code>[[apertium-sqi]]</code>
|| <code>[[apertium-sqi]]</code>
Line 103: Line 103:
|| <code>ell</code>
|| <code>ell</code>
|| [[lttoolbox]]
|| [[lttoolbox]]
|| ?
|| prototype
|align="right"| {{#lst:Apertium-ell/stats|stems}}
|align="right"| {{#lst:Apertium-ell/stats|stems}}
|align="right"| {{#lst:Apertium-ell/stats|paradigms}}
|align="right"| {{#lst:Apertium-ell/stats|paradigms}}
Line 115: Line 115:
|| <code>rup</code>
|| <code>rup</code>
|| [[lttoolbox]]
|| [[lttoolbox]]
|| ?
|| prototype
|align="right"| {{#lst:Apertium-rup/stats|stems}}
|align="right"| {{#lst:Apertium-rup/stats|stems}}
|align="right"| {{#lst:Apertium-rup/stats|paradigms}}
|align="right"| {{#lst:Apertium-rup/stats|paradigms}}
|align="center"| -
|align="center"| -
|| [[apertium-rup]] ([[incubator]])
|| [[apertium-rup]] ([[incubator]])
|| [[User: Francis Tyers|Fran]], shopskasalata
|| [[User: Francis Tyers|Fran]], [[User:Shopskasalata|shopskasalata]]
|-
|-
|| <code>[[apertium-ron]]</code>
|| <code>[[apertium-ron]]</code>
Line 127: Line 127:
|| <code>ron</code>
|| <code>ron</code>
|| [[lttoolbox]]
|| [[lttoolbox]]
|| possibly non-existant
|| ?
|align="right"| ?
|align="right"| ?
|align="right"| ?
|align="right"| ?
Line 135: Line 135:
|}
|}


=== Balkan Language Classification <sup><small>[https://en.wikipedia.org/wiki/Balkan_language_area#Languages]</small></sup>===
=== Balkan Language Classification===
The languages that share these similarities belong to five distinct branches of the Indo-European languages:
The languages that share these similarities belong to five distinct branches of the Indo-European languages<sup><small>[https://en.wikipedia.org/wiki/Balkan_language_area#Languages]</small></sup>:
* '''[[Albanian]]''': Arvanitika, Gheg, Tosk
* '''[[Albanian]]''': Arvanitika, Gheg, Tosk
* '''[[Hellenic languages]]''': [[Greek|Standard Greek]], Cappadocian Greek, Pontic Greek
* '''[[Hellenic languages]]''': [[Greek|Standard Greek]], Cappadocian Greek, Pontic Greek
Line 146: Line 146:


=== Existing language pairs ===
=== Existing language pairs ===

==== Balkan-Balkan pairs ====

Text in ''italic'' denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk.
Text in ''italic'' denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in '''bold''' denotes a stable well-working language pair in trunk.


{| style="text-align: center;" class="wikitable"
{| style="text-align: center;" class="wikitable dixtable"
|- style="background: #ececec"
|- style="background: #ececec"
! !! bul !! mkd !! ron !! rup !! sqi !! ell !! hbs !! slv !! tur
! !! mkd !! hbs !! slv !! bul !! tur !! sqi !! ell !! rup !! ron
|-
|-
| '''mkd''' || - || '''[[Apertium-hbs-mkd|hbs-mkd]]'''<br>'''{{#lst:Apertium-hbs-mkd/stats|hbs-mkd_stems}}''' || ''[[Apertium-sl-mk|sl-mk]]''<br>{{#lst:Apertium-sl-mk/stats|sl-mk_stems}} || '''[[Apertium-mk-bg|mk-bg]]'''<br>'''{{#lst:Apertium-mk-bg/stats|mk-bg_stems}}''' || || ''[[Apertium-mk-sq|mk-sq]]''<br>{{#lst:Apertium-mk-sq/stats|mk-sq_stems}} || || ||
| '''bul''' || - || '''[[mk-bg]]''' || || || || ''[[bg-el]]'' || || ||
|-
|-
| '''hbs''' || '''[[Apertium-hbs-mkd|hbs-mkd]]'''<br>'''{{#lst:Apertium-hbs-mkd/stats|hbs-mkd_stems}}''' || - || '''[[Apertium-hbs-slv|hbs-slv]]'''<br>'''{{#lst:Apertium-hbs-slv/stats|hbs-slv_stems}}''' || || || || || ||
| '''mkd''' || '''[[mk-bg]]''' || - || || || ''[[mk-sq]]'' || || || ||
|-
|-
| '''slv''' || ''[[Apertium-sl-mk|sl-mk]]''<br>{{#lst:Apertium-sl-mk/stats|sl-mk_stems}} || '''[[Apertium-hbs-slv|hbs-slv]]'''<br>'''{{#lst:Apertium-hbs-slv/stats|hbs-slv_stems}}''' || - || || || || || ||
| '''ron''' || || || - || ''[[ron-rup]]'' || || || || ||
|-
|-
| '''bul''' || '''[[Apertium-mk-bg|mk-bg]]'''<br>'''{{#lst:Apertium-mk-bg/stats|mk-bg_stems}}''' || || || - || || || ''[[Apertium-bg-el|bg-el]]''<br>{{#lst:Apertium-bg-el/stats|bg-el_stems}} || ||
| '''rup''' || || || ''[[ron-rup]]'' || - || || || || ||
|-
|-
| '''sqi''' || || ''[[mk-sq]]'' || || || - || || || ||
| '''tur''' || || || || || - || || || ||
|-
|-
| '''ell''' || ''[[bg-el]]'' || || || || || - || || ||
| '''sqi''' || ''[[Apertium-mk-sq|mk-sq]]''<br>{{#lst:Apertium-mk-sq/stats|mk-sq_stems}} || || || || || - || || ||
|-
|-
| '''hbs''' || || '''[[sh-mk]]''' || || || || || - || '''[[hbs-slv]]''' ||
| '''ell''' || || || || ''[[Apertium-bg-el|bg-el]]''<br>{{#lst:Apertium-bg-el/stats|bg-el_stems}} || || || - || ||
|-
|-
| '''slv''' || || ''[[sl-mk]]'' || || || || || '''[[hbs-slv]]''' || - ||
| '''rup''' || || || || || || || || - || ''[[Apertium-ron-rup|ron-rup]]''<br>{{#lst:Apertium-ron-rup/stats|ron-rup_stems}}
|-
|-
| '''tur''' || || || || || || || || || -
| '''ron''' || || || || || || || || ''[[Apertium-ron-rup|ron-rup]]''<br>{{#lst:Apertium-ron-rup/stats|ron-rup_stems}} || -
|}

==== Pairs with non-Balkan languages ====
{| style="text-align: center;" class="wikitable"
|- style="background: #ececec"
! !! bul !! mkd !! ron !! rup !! sqi !! ell !! hbs !! slv !! tur
|-
|-
| '''el''' || ''[[bg-el]]'' || || || || || || || ||
| || || || || || || || || ||
|-
|-
| '''aze''' || || || || || [[Apertium-tur-aze|tur-aze]]<br>{{#lst:Apertium-tur-aze/stats|tur-aze_stems}} || || || ||
| '''ru''' || ''[[bg-ru]]'' || || || || || || ''[[hbs-rus]]'' || ||
|-
|-
| '''en''' || ''[[bg-en]]'' || '''[[mk-en]]''' || || || ''[[en-sq]]'' || ''[[ell-eng]]'' || ''[[sh-en]]'' || || ''[[tr-en]]''
| '''cat''' || || || || || || || || || [[Apertium-ca-ro|ca-ro]]<br>{{#lst:Apertium-ca-ro/stats|ca-ro_stems}}
|-
|-
| '''ces''' || || ''[[Apertium-ces-hbs|ces-hbs]]''<br>{{#lst:Apertium-ces-hbs/stats|ces-hbs_stems}} || ''[[Apertium-cs-sl|cs-sl]]''<br>{{#lst:Apertium-cs-sl/stats|cs-sl_stems}} || || || || || ||
| '''it''' || || || ''[[ro-it]]'' || || || || || ''[[sl-it]]'' ||
|-
|-
| '''spa''' || || || || || || || || ''[[slv-spa]]'' ||
| '''chv''' || || || || || ''[[Apertium-cv-tr|cv-tr]]''<br>{{#lst:Apertium-cv-tr/stats|cv-tr_stems}} || || || ||
|-
|-
| '''eng''' || '''[[Apertium-mk-en|mk-en]]'''<br>'''{{#lst:Apertium-mk-en/stats|mk-en_stems}}''' || '''[[Apertium-hbs-eng|hbs-eng]]'''<br>'''{{#lst:Apertium-hbs-eng/stats|hbs-eng_stems}}''' || ''[[Apertium-sl-en|sl-en]]''<br>{{#lst:Apertium-sl-en/stats|sl-en_stems}} || [[Apertium-bg-en|bg-en]]<br>{{#lst:Apertium-bg-en/stats|bg-en_stems}} || ''[[Apertium-tr-en|tr-en]]''<br>{{#lst:Apertium-tr-en/stats|tr-en_stems}} || ''[[Apertium-en-sq|en-sq]]''<br>{{#lst:Apertium-en-sq/stats|en-sq_stems}} || ''[[Apertium-ell-eng|ell-eng]]''<br>{{#lst:Apertium-ell-eng/stats|ell-eng_stems}} || ||
| '''pol''' || || || || || || || || ''[[slv-pol]]'' ||
|-
|-
| '''eo''' || ''[[eo-bg]]'' || || || || || ''[[eo-el]]'' || || ||
| '''epo''' || || || || ''[[Apertium-eo-bg|eo-bg]]''<br>{{#lst:Apertium-eo-bg/stats|eo-bg_stems}} || ''[[Apertium-eo-tr|eo-tr]]''<br>{{#lst:Apertium-eo-tr/stats|eo-tr_stems}} || || ''[[Apertium-eo-el|eo-el]]''<br>{{#lst:Apertium-eo-el/stats|eo-el_stems}} || ||
|-
|-
| '''fr''' || || || ''[[fr-ro]]'' || || || || || ||
| '''fin''' || || ''[[Apertium-fin-hbs|fin-hbs]]''<br>{{#lst:Apertium-fin-hbs/stats|fin-hbs_stems}} || || || || || || ||
|-
|-
| '''ca''' || || || ''[[ca-ro]]'' || || || || || ||
| '''fra''' || || || || || || || || || ''[[Apertium-fr-ro|fr-ro]]''<br>{{#lst:Apertium-fr-ro/stats|fr-ro_stems}}
|-
|-
| '''ina''' || || || || || || || || || ''[[Apertium-ron-ina|ron-ina]]''<br>{{#lst:Apertium-ron-ina/stats|ron-ina_stems}}
| '''es''' || || || '''[[es-ro]]''' || || || || || ||
|-
|-
| '''ita''' || || || ''[[Apertium-slv-ita|slv-ita]]''<br>{{#lst:Apertium-slv-ita/stats|slv-ita_stems}} || || || || || || [[Apertium-ro-it|ro-it]]<br>{{#lst:Apertium-ro-it/stats|ro-it_stems}}
| '''cs''' || || || || || || || || ''[[cs-sl]]'' ||
|-
|-
| '''kir''' || || || || || || || || || ''[[tur-kir]]''
| '''kir''' || || || || || [[Apertium-tur-kir|tur-kir]]<br>{{#lst:Apertium-tur-kir/stats|tur-kir_stems}} || || || ||
|-
|-
| '''pol''' || || ''[[Apertium-pol-hbs|pol-hbs]]''<br>{{#lst:Apertium-pol-hbs/stats|pol-hbs_stems}} || ''[[Apertium-slv-pol|slv-pol]]''<br>{{#lst:Apertium-slv-pol/stats|slv-pol_stems}} || || || || || ||
| '''tat''' || || || || || || || || || ''[[tur-tat]]''
|-
|-
| '''rus''' || || [[Apertium-hbs-rus|hbs-rus]]<br>{{#lst:Apertium-hbs-rus/stats|hbs-rus_stems}} || || ''[[Apertium-bg-ru|bg-ru]]''<br>{{#lst:Apertium-bg-ru/stats|bg-ru_stems}} || || || || ||
| '''uzb''' || || || || || || || || || ''[[tur-uzb]]''
|-
|-
| '''spa''' || || || [[Apertium-slv-spa|slv-spa]]<br>{{#lst:Apertium-slv-spa/stats|slv-spa_stems}} || || || || || || '''[[Apertium-es-ro|es-ro]]'''<br>'''{{#lst:Apertium-es-ro/stats|es-ro_stems}}'''
| '''aze''' || || || || || || || || || [[tur-aze]]
|-
|-
| '''tat''' || || || || || [[Apertium-tur-tat|tur-tat]]<br>{{#lst:Apertium-tur-tat/stats|tur-tat_stems}} || || || ||
| '''cv''' || || || || || || || || || ''[[cv-tr]]''
|-
| '''tuk''' || || || || || [[Apertium-tuk-tur|tuk-tur]]<br>{{#lst:Apertium-tuk-tur/stats|tuk-tur_stems}} || || || ||
|-
| '''uzb''' || || || || || [[Apertium-tur-uzb|tur-uzb]]<br>{{#lst:Apertium-tur-uzb/stats|tur-uzb_stems}} || || || ||
|}
|}


==Existing==
== Samples ==
Article 1 of the Universal Declaration of Human Rights:


''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.''
===Monolingual===


{|class="wikitable"
{|class=wikitable
! Language !! Text
! Language !! Module !! Paradigms !! Lemmata !! Coverage (SETimes) !! Coverage (Wikipedia)
|-
| Bulgarian || [[Macedonian and Bulgarian]] || 305 || 7873 || 88.1% || 77.15%
|-
| Macedonian || [[Macedonian and Bulgarian]] || 225 || 8094 || 92.1% ||
|-
|-
|| Macedonian || Сите човечки суштества се раѓаат слободни и еднакви по достоинство и права. Тие се обдарени со разум и совест и треба да се однесуваат еден кон друг во духот на општо човечката припадност.
| Romanian || [[Spanish and Romanian]] || 997 || 18719 || 89.7% || 83.62%
|-
|-
|| Slovenian || Vsi ljudje se rodijo svobodni in imajo enako dostojanstvo in enake pravice. Obdarjeni so z razumom in vestjo in bi morali ravnati drug z drugim kakor bratje.
| Aromanian || [[Incubator]] || 17 || 28 || - ||
|-
|-
|| Bulgarian || Всички хора се раждат свободни и равни по достойнство и права. Те са надарени с разум и съвест и следва да се отнасят помежду си в дух на братство.
| Albanian || [[Incubator]] || 127 || 3302 || 80.2% || 65.62%
|-
| Greek || [[Incubator]] || 377 || 859 || 49.4% || 49.75%
|-
| Serbo-Croatian || [[Incubator]] || 85 || 660 || - ||
|-
| Slovenian || [[Incubator]] || 1128 || 20385 || - ||
|-
| Turkish || (external: [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph]) || - || 37101 || ||
|-
|-
|| Turkish || Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler.
|}
|}

Languages missing: Roma

===Bilingual===

===Language pairs===

* [[Macedonian and Bulgarian]]
* [[Macedonian and English]]


==See also==
==See also==
Line 256: Line 232:
* [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph]
* [http://www.let.rug.nl/~coltekin/trmorph/ TRMorph]


[[Category:Balkan languages]]
[[Category:Languages of the Balkans]]

Latest revision as of 19:27, 27 August 2017

The Balkan languages are those languages spoken in the Balkans, and possibly forming a part of the Balkan Sprachbund. They include Bulgarian, Macedonian, Romanian, Aromanian, Albanian, Greek, Serbo-Croatian, and a number of others.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers[edit]

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-mkd Macedonian mk mkd lttoolbox production 30,686 260 ~90.5% apertium-mkd (languages) Fran, Tihomir, Petkovski
apertium-hbs Serbo-Croatian sh hbs lttoolbox production 58,004 1,092 ~90.5% apertium-hbs (languages) Fran, Petkovski, Aleš, hrvoj
apertium-slv Slovenian sl slv lttoolbox production 20,596 1,435 ~90.5% apertium-hbs-slv (trunk)
apertium-slv-pol (incubator)
apertium-sl-mk (incubator)
Fran, Petkovski, Peradin, Aleš, Čabrilo, Dimitrijev
apertium-bul Bulgarian bg bul lttoolbox production 8,578 317 ~80% apertium-bul (languages) Fran, Tihomir
apertium-tur Turkish tr tur HFST development 17,221 - ~87.3% apertium-tur (languages) Fran, Gianluca, Sezgi Aydın
apertium-sqi Albanian sq sqi lttoolbox development 3,312 138 ~80.2% apertium-sqi (languages) Fran
apertium-ell Greek el ell lttoolbox prototype 2,460 951 - apertium-ell (languages) Fran
apertium-rup Aromanian - rup lttoolbox prototype 312,005 26193 - apertium-rup (incubator) Fran, shopskasalata
apertium-ron Romanian ro ron lttoolbox possibly non-existant ? ? - apertium-ron-rup (incubator) Fran

Balkan Language Classification[edit]

The languages that share these similarities belong to five distinct branches of the Indo-European languages[1]:

Existing language pairs[edit]

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

mkd hbs slv bul tur sqi ell rup ron
mkd - hbs-mkd
12,813
sl-mk
25,579
mk-bg
8,783
mk-sq
?
hbs hbs-mkd
12,813
- hbs-slv
24,717
slv sl-mk
25,579
hbs-slv
24,717
-
bul mk-bg
8,783
- bg-el
638
tur -
sqi mk-sq
?
-
ell bg-el
638
-
rup - ron-rup
402
ron ron-rup
402
-
aze tur-aze
8,194
cat ca-ro
16,431
ces ces-hbs
167
cs-sl
570
chv cv-tr
100
eng mk-en
33,350
hbs-eng
16,228
sl-en
313
bg-en
10,242
tr-en
171
en-sq
580
ell-eng
830
epo eo-bg
?
eo-tr
1,500
eo-el
1,150
fin fin-hbs
250
fra fr-ro
12,727
ina ron-ina
192
ita slv-ita
1,586
ro-it
10,093
kir tur-kir
7,123
pol pol-hbs
136
slv-pol
354
rus hbs-rus
5,008
bg-ru
3,292
spa slv-spa
es-ro
24,528
tat tur-tat
3,317
tuk tuk-tur
3,387
uzb tur-uzb
3,519

Samples[edit]

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Macedonian Сите човечки суштества се раѓаат слободни и еднакви по достоинство и права. Тие се обдарени со разум и совест и треба да се однесуваат еден кон друг во духот на општо човечката припадност.
Slovenian Vsi ljudje se rodijo svobodni in imajo enako dostojanstvo in enake pravice. Obdarjeni so z razumom in vestjo in bi morali ravnati drug z drugim kakor bratje.
Bulgarian Всички хора се раждат свободни и равни по достойнство и права. Те са надарени с разум и съвест и следва да се отнасят помежду си в дух на братство.
Turkish Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler.

See also[edit]

External links[edit]