Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Balkan languages

From Apertium
Jump to: navigation, search

Contents

The Balkan languages are those languages spoken in the Balkans, and possibly forming a part of the Balkan Sprachbund. They include Bulgarian, Macedonian, Romanian, Aromanian, Albanian, Greek, Serbo-Croatian, and a number of others.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

[edit] Status

The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

[edit] Transducers

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-mkd Macedonian mk mkd lttoolbox production 30,686 260 ~90.5% apertium-mkd (languages) Fran, Tihomir, Petkovski
apertium-hbs Serbo-Croatian sh hbs lttoolbox production 58,004 1,092 ~90.5% apertium-hbs (languages) Fran, Petkovski, Aleš, hrvoj
apertium-slv Slovenian sl slv lttoolbox production 20,596 1,435 ~90.5% apertium-hbs-slv (trunk)
apertium-slv-pol (incubator)
apertium-sl-mk (incubator)
Fran, Petkovski, Peradin, Aleš, Čabrilo, Dimitrijev
apertium-bul Bulgarian bg bul lttoolbox production 8,578 317 ~80% apertium-bul (languages) Fran, Tihomir
apertium-tur Turkish tr tur HFST development 17,221 - ~87.3% apertium-tur (languages) Fran, Gianluca, Sezgi Aydın
apertium-sqi Albanian sq sqi lttoolbox development 3,312 138 ~80.2% apertium-sqi (languages) Fran
apertium-ell Greek el ell lttoolbox prototype 2,460 951 - apertium-ell (languages) Fran
apertium-rup Aromanian - rup lttoolbox prototype 312,005 26193 - apertium-rup (incubator) Fran, shopskasalata
apertium-ron Romanian ro ron lttoolbox possibly non-existant  ?  ? - apertium-ron-rup (incubator) Fran

[edit] Balkan Language Classification

The languages that share these similarities belong to five distinct branches of the Indo-European languages[1]:

[edit] Existing language pairs

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

mkd hbs slv bul tur sqi ell rup ron
mkd - hbs-mkd
12,813
sl-mk
25,579
mk-bg
8,783
mk-sq
?
hbs hbs-mkd
12,813
- hbs-slv
24,717
slv sl-mk
25,579
hbs-slv
24,717
-
bul mk-bg
8,783
- bg-el
638
tur -
sqi mk-sq
?
-
ell bg-el
638
-
rup - ron-rup
402
ron ron-rup
402
-
aze tur-aze
8,194
cat ca-ro
16,431
ces ces-hbs
167
cs-sl
570
chv cv-tr
100
eng mk-en
33,350
hbs-eng
16,228
sl-en
313
bg-en
10,242
tr-en
171
en-sq
580
ell-eng
830
epo eo-bg
?
eo-tr
1,500
eo-el
1,150
fin fin-hbs
250
fra fr-ro
12,727
ina ron-ina
192
ita slv-ita
1,586
ro-it
10,093
kir tur-kir
7,123
pol pol-hbs
136
slv-pol
354
rus hbs-rus
5,008
bg-ru
3,292
spa slv-spa
es-ro
24,528
tat tur-tat
3,317
tuk tuk-tur
3,387
uzb tur-uzb
3,519

[edit] Samples

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Macedonian Сите човечки суштества се раѓаат слободни и еднакви по достоинство и права. Тие се обдарени со разум и совест и треба да се однесуваат еден кон друг во духот на општо човечката припадност.
Slovenian Vsi ljudje se rodijo svobodni in imajo enako dostojanstvo in enake pravice. Obdarjeni so z razumom in vestjo in bi morali ravnati drug z drugim kakor bratje.
Bulgarian Всички хора се раждат свободни и равни по достойнство и права. Те са надарени с разум и съвест и следва да се отнасят помежду си в дух на братство.
Turkish Bütün insanlar hür, haysiyet ve haklar bakımından eşit doğarlar. Akıl ve vicdana sahiptirler ve birbirlerine karşı kardeşlik zihniyeti ile hareket etmelidirler.

[edit] See also

[edit] External links

Personal tools