Difference between revisions of "User:Sushain/BalkanLangsConvert"

From Apertium
Jump to navigation Jump to search
 
(2 intermediate revisions by the same user not shown)
Line 196: Line 196:
 
{| style="text-align: center;" class="wikitable"
 
{| style="text-align: center;" class="wikitable"
 
|- style="background: #ececec"
 
|- style="background: #ececec"
! !! bul !! mkd !! ron !! rup !! sqi !! ell !! hbs !! slv !! tur
+
! !! bul !! mkd !! ron !! rup !! sqi !! ell !! hbs !! slv !! tur
 
|-
 
|-
| '''el''' || ''[[bg-el]]'' || || || || || || || ||
+
| '''el''' || ''[[bg-el]]'' || || || || || || || ||
 
|-
 
|-
| '''ru''' || ''[[bg-ru]]'' || || || || || || ''[[hbs-rus]]'' || ||
+
| '''ru''' || ''[[bg-ru]]'' || || || || || || ''[[hbs-rus]]'' || ||
 
|-
 
|-
| '''en''' || ''[[bg-en]]'' || '''[[mk-en]]''' || || || || || ''[[sh-en]]'' || || ''[[tr-en]]''
+
| '''en''' || ''[[bg-en]]'' || '''[[mk-en]]''' || || || ''[[en-sq]]'' || ''[[ell-eng]]'' || ''[[sh-en]]'' || || ''[[tr-en]]''
 
|-
 
|-
| '''it''' || || || ''[[ro-it]]'' || || || || || ''[[sl-it]]'' ||
+
| '''it''' || || || ''[[ro-it]]'' || || || || || ''[[sl-it]]'' ||
 
|-
 
|-
| '''spa''' || || || || || || || || ''[[slv-spa]]'' ||
+
| '''spa''' || || || || || || || || ''[[slv-spa]]'' ||
 
|-
 
|-
| '''pol''' || || || || || || || || ''[[slv-pol]]'' ||
+
| '''pol''' || || || || || || || || ''[[slv-pol]]'' ||
  +
|-
  +
| '''eo''' || ''[[eo-bg]]'' || || || || || ''[[eo-el]]'' || || ||
  +
|-
  +
| '''fr''' || || || ''[[fr-ro]]'' || || || || || ||
  +
|-
  +
| '''ca''' || || || ''[[ca-ro]]'' || || || || || ||
  +
|-
  +
| '''es''' || || || '''[[es-ro]]''' || || || || || ||
  +
|-
  +
| '''cs''' || || || || || || || || ''[[cs-sl]]'' ||
  +
|-
  +
| '''kir''' || || || || || || || || || ''[[tur-kir]]''
  +
|-
  +
| '''tat''' || || || || || || || || || ''[[tur-tat]]''
  +
|-
  +
| '''uzb''' || || || || || || || || || ''[[tur-uzb]]''
  +
|-
  +
| '''aze''' || || || || || || || || || [[tur-aze]]
  +
|-
  +
| '''cv''' || || || || || || || || || ''[[cv-tr]]''
 
|}
 
|}
   

Latest revision as of 08:56, 24 December 2013

The Balkan languages are those languages spoken in the Balkans, and possibly forming a part of the Balkan Sprachbund. They include Bulgarian, Macedonian, Romanian, Aromanian, Albanian, Greek, Serbo-Croatian, and a number of others.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

Status[edit]

The ultimate goal is to have multi-purposable transducers for a variety of Balkan languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

Transducers[edit]

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name Language ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-mkd Macedonian mk mkd lttoolbox production 30,686 260 ~90.5% apertium-mkd (languages) Fran, Tihomir, Petkovski
apertium-hbs Serbo-Croatian sh hbs lttoolbox working 58,004 1,092 ~90.5% apertium-hbs (languages) Fran
apertium-slv Slovenian sl slv lttoolbox production 20,596 1,435 ~90.5% apertium-hbs-slv (trunk)
apertium-slv-pol (incubator)
apertium-sl-mk (incubator)
Fran, Petkovski, Peradin, Horvat, Čabrilo, Dimitrijev
apertium-tur Turkish tr tur HFST working 17,221 1 ~87.3% apertium-tur (languages) Fran, Gianluca, Sezgi Aydın
apertium-bul Bulgarian bg bul lttoolbox production 8,578 317 ~80% apertium-bul (languages) Fran, Tihomir
apertium-sqi Albanian sq sqi lttoolbox development 3,312 138 ~80.2% apertium-sqi (languages) Fran
apertium-ell Greek el ell lttoolbox ? 2,460 951 - apertium-ell (languages) Fran
apertium-rup Aromanian - rup lttoolbox ? 312,005 26193 - apertium-rup (incubator) Fran, shopskasalata
apertium-ron Romanian ro ron lttoolbox ? ? ? - apertium-ron-rup (incubator) Fran

Balkan Language Classification [1][edit]

Existing language pairs[edit]

Balkan-Balkan pairs[edit]

Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.

bul mkd ron rup sqi ell hbs slv tur
bul - mk-bg bg-el
mkd - mk-sq
ron - ron-rup
rup -
sqi -
ell -
hbs sh-mk - hbs-slv
slv sl-mk -
tur -

Pairs with non-Balkan languages[edit]

bul mkd ron rup sqi ell hbs slv tur
el bg-el
ru bg-ru hbs-rus
en bg-en mk-en en-sq ell-eng sh-en tr-en
it ro-it sl-it
spa slv-spa
pol slv-pol
eo eo-bg eo-el
fr fr-ro
ca ca-ro
es es-ro
cs cs-sl
kir tur-kir
tat tur-tat
uzb tur-uzb
aze tur-aze
cv cv-tr

Existing[edit]

Monolingual[edit]

Language Module Paradigms Lemmata Coverage (SETimes) Coverage (Wikipedia)
Bulgarian Macedonian and Bulgarian 305 7873 88.1% 77.15%
Macedonian Macedonian and Bulgarian 225 8094 92.1%
Romanian Spanish and Romanian 997 18719 89.7% 83.62%
Aromanian Incubator 17 28 -
Albanian Incubator 127 3302 80.2% 65.62%
Greek Incubator 377 859 49.4% 49.75%
Serbo-Croatian Incubator 85 660 -
Slovenian Incubator 1128 20385 -
Turkish (external: TRMorph) - 37101

Languages missing: Roma

Bilingual[edit]

Language pairs[edit]

See also[edit]

External links[edit]