Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Romance languages

From Apertium
Jump to: navigation, search

En français

Contents

The Romance languages (Wikipedia:Romance languages) include Catalan, Occitan, Asturian, Spanish (es), French, Galician, Portuguese, Romanian and Italian . The languages are related with varying levels of mutual intelligibility. Many of these languages are included in Apertium already.

Romance languages that are not yet covered in Apertium include Aromanian, Arpitan, Corsican, Friulan, Ladino, Leonese, Lombard, Mirandese, Neapolitan, Piedmontese, Romansh, Sicilian, Venetian and Walloon.

[edit] Status

The ultimate goal is to have multi-purposable transducers for a variety of Romance languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

[edit] Transducers

name Language native name ISO 639 formalism state stems coverage location primary authors
-2 -3
apertium-spa Spanish castellano es spa lttoolbox production 46,003 [[Apertium-spa#Current_State|~Apertium-spa/stats/average%]]
apertium-cat Catalan català ca cat lttoolbox production 95604 [[Apertium-cat#Current_State|~Apertium-cat/stats/average%]]
apertium-ita Italian italiano it ita lttoolbox production 25,609 [[Apertium-ita#Current_State|~Apertium-ita/stats/average%]]
apertium-arg Aragonese aragonés an arg lttoolbox production 26,068 [[Apertium-arg#Current_State|~Apertium-arg/stats/average%]]
apertium-ast Asturian asturianu - ast lttoolbox production 498 [[Apertium-ast#Current_State|~Apertium-ast/stats/average%]]
apertium-oci Occitan occitan oc oci lttoolbox production [[Apertium-oci#Current_State|~Apertium-oci/stats/average%]]
apertium-srd Sardinian sardu sc srd lttoolbox production 46,642 [[Apertium-srd#Current_State|~Apertium-srd/stats/average%]]
apertium-scn Sicilian sicilianu - scn lttoolbox development 25,723 ~84.4%
apertium-fra French français fr fra lttoolbox production [[Apertium-fra#Current_State|~Apertium-fra/stats/average%]]
apertium-por Portuguese português pt por lttoolbox production 14,796 [[Apertium-por#Current_State|~Apertium-por/stats/average%]]
apertium-glg Galician galego gl glg lttoolbox production 31,916 [[Apertium-glg#Current_State|~Apertium-glg/stats/average%]]
apertium-ron Romanian română ro ron lttoolbox production 18,878 [[Apertium-ron#Current_State|~Apertium-ron/stats/average%]]
apertium-cos Corsican corsu co cos lttoolbox development 3,618 ~85.9%
apertium-rup Aromanian - rup lttoolbox prototype 312,005 [[Apertium-rup#Current_State|~Apertium-rup/stats/average%]]

[edit] Annotated corpora

[edit] Table of existing pairs

Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.

arg ast cat cos spa fra glg ita oci por ron rup srd
arg - es-an
17,758
ast - es-ast
57,067
cat - cat-cos
316
es-ca
43,180
fr-ca
10,560
ca-it
9,773
oc-ca
24,877
pt-ca
8,034
ca-ro
16,431
cat-srd
2,803
cos cat-cos
316
- cos-ita
25
spa es-an
17,758
es-ast
57,067
es-ca
43,180
- fr-es
26,993
es-gl
28,052
es-it
12,505
oc-es
18,774
es-pt
15,520
es-ro
24,528
fra fr-ca
10,560
fr-es
26,993
- fr-it
1
oc-fr
10,588
fra-por
18,919
fr-ro
12,727
glg es-gl
28,052
- pt-gl
12,330
ita ca-it
9,773
cos-ita
25
es-it
12,505
fr-it
1
- it-pt
1
ro-it
10,093
ita-srd
?
oci oc-ca
24,877
oc-es
18,774
oc-fr
10,588
-
por pt-ca
8,034
es-pt
15,520
fra-por
18,919
pt-gl
12,330
it-pt
1
- sc-pt
?
ron ca-ro
16,431
es-ro
24,528
fr-ro
12,727
ro-it
10,093
- ron-rup
402
rup ron-rup
402
-
srd cat-srd
2,803
ita-srd
?
sc-pt
?
-
bre br-es
11,760
br-fr
27,988
ces es-cs
387
cym cy-es
8,798
deu es-de
615
eng en-ca
35,873
en-es'
'
en-fr
13,455
en-gl
30,049
en-it
21,067
en-pt
6,828
epo eo-ca
43,160
eo-es
48,312
eo-fr
43,077
eo-it
9,344
eo-pt
12,760
eus eu-es
19,307
eu-fr
7,616
guc guc-spa
1,077
ina es-ia
35
ron-ina
192
lat la-es
1,920
la-it
65
mlt mlt-spa
137
nld fr-nl
1,744
quz quz-spa
0
qve spa-qve
0
slv slv-spa
slv-ita
1,586
sme sme-spa
1
ssp es-ssp
3,397
tet tet-por
3,075
zho zho-spa
14,040

Many of these are documented in Publications.

[edit] Samples

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Italian Tutti gli esseri umani nascono liberi ed eguali in dignità e diritti. Essi sono dotati di ragione e di coscienza e devono agire gli uni verso gli altri in spirito di fratellanza.
Venetian Tuti i essari Umani nasse liberi e uguaƚi in teƚa dignità e diriti. I xe dotai de raxón e de cosiensa e i gà da agire cò spirito de fraternità lun l’altro.
French Tous les êtres humains naissent libres et égaux en dignité et en droits. Ils sont doués de raison et de conscience et doivent agir les uns envers les autres dans un esprit de fraternité.
Picard Tos lès-omes vinèt å monde lîbes èt égåls po çou qu'èst d' leû dignité èt d' leûs dreûts. Leû re°zon èt leû consyince elzî fe°t on d'vwér di s'kidûre inte di zèle come dès frès
Walloon Tos lès-omes vinèt-st-å monde lîbes, èt so-l'minme pîd po çou qu'ènn'èst d'leu dignité èt d'leus dreûts. I n'sont nin foû rêzon èt-z-ont-i leû consyince po zèls, çou qu'èlzès deût miner a s'kidûre onk' po l'ôte tot come dès frés.
Friulian Ducj i oms a nassin libars e compagns come dignitât e derits. A an sintiment e cussience e bisugne che si tratin un culaltri come fradis.
Romansch Tuots umans naschan libers ed eguals in dignità e drets. Els sun dotats cun intellet e conscienza e dessan agir tanter per in uin spiert da fraternità.
Catalan-Valencian-Balear Tots els éssers humans neixen lliures i iguals en dignitat i en drets. Són dotats de raó i de consciència, i han de comportar-se fraternalment els uns amb els altres.
Asturian Tolos seres humanos nacen llibres y iguales en dignidá y drechos y, pola mor de la razón y la conciencia de so, han comportase hermaniblemente los unos colos otros.
Ladino Todos los umanos nasen libres i iguales en dinyidad i derechos i, komo estan ekipados de razon i konsensia, deven komportarsen kon ermandad los unos kon los otros.
Spanish Todos los seres humanos nacen libres e iguales en dignidad y derechos y, dotados como están de razón y conciencia, deben comportarse fraternalmente los unos con los otros.
Galician Tódolos seres humanos nacen libres e iguais en dignidade e dereitos e, dotados como están de razón e conciencia, díbense comportar fraternalmente uns cos outros.
Portuguese Todos os seres humanos nascem livres e iguais em dignidade e em direitos. Dotados de razão e de consciência, devem agir uns para com os outros em espírito de fraternidade.
Corsican Nascinu tutti l’omi libari è pari di dignità è di diritti. Pussedinu a raghjoni è a cuscenza è li tocca ad agiscia trà elli di modu fraternu.
Sardinian, Logudorese Totu sos èsseres umanos naschint lìberos e eguales in dinnidade e in deretos. Issos tenent sa resone e sa cussèntzia e depent operare s'unu cun s'àteru cun ispìritu de fraternidade.

[edit] Vulnerability

This table summarizes the vulnerability of various Romance languages. Vulnerability data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.

Language ISO639-3 Location Speakers Status
Ethnologue UNESCO
Zarphatic zrp France 0 10 (Extinct) -
Shuadit sdt France 0 10 (Extinct) -
Emilian egl Italy 0 9 (Dormant) -
Romagnol rgn Italy 0 9 (Dormant) -
Minderico drc Portugal 500 8b (Nearly extinct) -
Judeo-Italian itk Italy 250 8a (Moribund) -
Arpitan frp France & Italy 137,000 8a (Moribund) 2 (Definitely endangered)
Romanian, Istro ruo Croatia 560 7 (Shifting) 3 (Severely endangered)
Istriot ist Croatia 1,000 7 (Shifting) 3 (Severely endangered)
Romanian, Megleno ruq Greece, Macedonia 5,000 7 (Shifting) 3 (Severely endangered)
French, Cajun frc United States 25,600 7 (Shifting) -
Extremaduran ext Spain 201,500 7 (Shifting) -
Aragonese arg Spain 10,000 6b (Threatened) 2 (Definitely endangered)
Ladin lld Italy 20,000 6b (Threatened) 2 (Definitely endangered)
Sardinian, Gallurese sdn Italy 100,000 6b (Threatened) 2 (Definitely endangered)
Sardinian, Sassarese sdc Italy 100,000 6b (Threatened) 2 (Definitely endangered)
Asturian ast Spain 110,000 6b (Threatened) -
Aromanian rup Albania, Bulgaria, Greece, Macedonia, Serbia 123,300 6b (Threatened) 2 (Definitely endangered)
Sardinian, Logudorese src Italy 500,000 6b (Threatened) 2 (Definitely endangered)
Walloon wln Belgium, France, Luxembourg 600,000 6b (Threatened) 2 (Definitely endangered)
Spanish, Loreto-Ucayali spq Peru 2,800 6a (Vigorous) -
Fala fax Spain 10,500 6a (Vigorous) -
Sardinian, Campidanese sro Italy 500,000 6a (Vigorous) 2 (Definitely endangered)
Corsican cos France, Italy 31,000 5 (Developing) 2 (Definitely endangered)
Picard pcd Belgium, France 200,000 5 (Developing) 3 (Severely endangered)
Friulian fur Italy 300,000 5 (Developing) 2 (Definitely endangered)
Ligurian lij France, Italy, Monaco 505,100 5 (Developing) 2 (Definitely endangered)
Piemontese pms Italy 1,600,000 5 (Developing) 2 (Definitely endangered)
Lombard lmo Italy 3,903,000 5 (Developing) 2 (Definitely endangered)
Sicilian scn Italy 4,700,000 5 (Developing) 1 (Vulnerable)
Napoletano-Calabrese nap Italy 5,700,000 5 (Developing) 1 (Vulnerable)
Romansch roh Switzerland 35,139 4 (Educational) 2 (Definitely endangered)
Ladino lad Israel & Albania, Algeria, Bosnia and Herzegovina, Bulgaria, Croatia, Greece, Macedonia, Morocco, Romania, Turkey, Serbia 112,130 4 (Educational) 3 (Severely endangered)
Occitan oci France, Italy 2,048,310 4 (Educational) 2 (Definitely endangered)
Venetian vec Croatia, Italy, Slovenia 3,852,500 4 (Educational) 1 (Vulnerable)
Mirandese mwl Portugal 15,000 2 (Provincial) -
Galician glg Spain 3,185,000 2 (Provincial) -
Catalan cat Spain & Italy 7,220,420 2 (Provincial) 2 (Definitely endangered)
Romanian ron Romania 23,623,890 1 (National) -
Italian ita Italy 61,068,677 1 (National) -
French fra France 68,458,600 1 (National) 3 (Severely endangered)
Portuguese por Portugal 202,468,100 1 (National) -
Spanish spa Spain 405,638,110 1 (National) -

[edit] Other language pairs

Pairs including a non-Romance language

[edit] Resources

[edit] Funding possibilities

[edit] Samples

[edit] See also

Personal tools