Difference between revisions of "Romance languages"
Line 196: | Line 196: | ||
|- |
|- |
||
|} |
|} |
||
=== Annotated corpora === |
|||
=== Table of existing pairs === |
=== Table of existing pairs === |
Revision as of 14:06, 1 June 2016
The Romance languages (Wikipedia:Romance languages) include Catalan (ca
), Occitan (oc
), Asturian (ast
), Spanish (es
), French (fr
), Galician (gl
), Portuguese (pt
), Romanian (ro
) and Italian (it
). The languages are related with varying levels of mutual intelligibility. Many of these languages are included in Apertium already.
Romance languages that are not yet covered in Apertium include Aromanian, Arpitan, Corsican, Friulan, Ladino, Leonese, Lombard, Mirandese, Neapolitan, Piedmontese, Romansh, Sicilian, Venetian and Walloon.
Status
The ultimate goal is to have multi-purposable transducers for a variety of Romance languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers
name | Language | native name | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||
apertium-spa
|
Spanish | castellano | es
|
spa
|
lttoolbox | production | 46,003 | [[Apertium-spa#Current_State|~Apertium-spa/stats/average%]] | ||
apertium-cat
|
Catalan | català | ca
|
cat
|
lttoolbox | production | 95604 | [[Apertium-cat#Current_State|~Apertium-cat/stats/average%]] | ||
apertium-ita
|
Italian | italiano | it
|
ita
|
lttoolbox | production | 25,609 | [[Apertium-ita#Current_State|~Apertium-ita/stats/average%]] | ||
apertium-arg
|
Aragonese | aragonés | an
|
arg
|
lttoolbox | production | 26,068 | [[Apertium-arg#Current_State|~Apertium-arg/stats/average%]] | ||
apertium-ast
|
Asturian | asturianu | -
|
ast
|
lttoolbox | production | 498 | [[Apertium-ast#Current_State|~Apertium-ast/stats/average%]] | ||
apertium-oci
|
Occitan | occitan | oc
|
oci
|
lttoolbox | production | [[Apertium-oci#Current_State|~Apertium-oci/stats/average%]] | |||
apertium-srd
|
Sardinian | sardu | sc
|
srd
|
lttoolbox | development | 46,642 | [[Apertium-srd#Current_State|~Apertium-srd/stats/average%]] | ||
apertium-scn
|
Sicilian | sicilianu | -
|
scn
|
lttoolbox | development | 25,723 | ~84.4% | ||
apertium-fra
|
French | français | fr
|
fra
|
lttoolbox | production | [[Apertium-fra#Current_State|~Apertium-fra/stats/average%]] | |||
apertium-por
|
Portuguese | português | pt
|
por
|
lttoolbox | production | 14,796 | [[Apertium-por#Current_State|~Apertium-por/stats/average%]] | ||
apertium-glg
|
Galician | galego | gl
|
glg
|
lttoolbox | production | 31,916 | [[Apertium-glg#Current_State|~Apertium-glg/stats/average%]] | ||
apertium-ron
|
Romanian | română | ro
|
ron
|
lttoolbox | production | 18,878 | [[Apertium-ron#Current_State|~Apertium-ron/stats/average%]] | ||
apertium-rup
|
Aromanian | -
|
rup
|
lttoolbox | prototype | 312,005 | [[Apertium-rup#Current_State|~Apertium-rup/stats/average%]] | |||
apertium-cos
|
Corsican | corsu | co
|
cos
|
lttoolbox | prototype | 3,618 | ~85.9% |
Annotated corpora
Table of existing pairs
Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.
arg | ast | cat | cos | spa | fra | glg | ita | oci | por | ron | rup | srd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
arg | - | es-an 17,758 |
|||||||||||
ast | - | es-ast 57,067 |
|||||||||||
cat | - | cat-cos 316 |
es-ca 43,180 |
fr-ca 10,560 |
ca-it 9,773 |
oc-ca 24,877 |
pt-ca 8,034 |
ca-ro 16,431 |
cat-srd 2,803 | ||||
cos | cat-cos 316 |
- | cos-ita 25 |
||||||||||
spa | es-an 17,758 |
es-ast 57,067 |
es-ca 43,180 |
- | fr-es 26,993 |
es-gl 28,052 |
es-it 12,505 |
oc-es 18,774 |
es-pt 15,520 |
es-ro 24,528 |
|||
fra | fr-ca 10,560 |
fr-es 26,993 |
- | fr-it 1 |
oc-fr 10,588 |
fra-por 18,919 |
fr-ro 12,727 |
||||||
glg | es-gl 28,052 |
- | pt-gl 12,330 |
||||||||||
ita | ca-it 9,773 |
cos-ita 25 |
es-it 12,505 |
fr-it 1 |
- | it-pt 1 |
ro-it 10,093 |
ita-srd ? | |||||
oci | oc-ca 24,877 |
oc-es 18,774 |
oc-fr 10,588 |
- | |||||||||
por | pt-ca 8,034 |
es-pt 15,520 |
fra-por 18,919 |
pt-gl 12,330 |
it-pt 1 |
- | sc-pt ? | ||||||
ron | ca-ro 16,431 |
es-ro 24,528 |
fr-ro 12,727 |
ro-it 10,093 |
- | ron-rup 402 |
|||||||
rup | ron-rup 402 |
- | |||||||||||
srd | cat-srd 2,803 |
ita-srd ? |
sc-pt ? |
- | |||||||||
bre | br-es 11,760 |
br-fr 27,988 |
|||||||||||
ces | es-cs 387 |
||||||||||||
cym | cy-es 8,798 |
||||||||||||
deu | es-de 615 |
||||||||||||
eng | en-ca 35,873 |
'en-es ' |
en-fr 13,455 |
en-gl 30,049 |
en-it 21,067 |
en-pt 6,828 |
|||||||
epo | eo-ca 43,160 |
eo-es 48,312 |
eo-fr 43,077 |
eo-it 9,344 |
eo-pt 12,760 |
||||||||
eus | eu-es 19,307 |
eu-fr 7,616 |
|||||||||||
guc | guc-spa 1,077 |
||||||||||||
ina | es-ia 35 |
ron-ina 192 |
|||||||||||
lat | la-es 1,920 |
la-it 65 |
|||||||||||
mlt | mlt-spa 137 |
||||||||||||
nld | fr-nl 1,744 |
||||||||||||
quz | quz-spa 0 |
||||||||||||
qve | spa-qve 0 |
||||||||||||
slv | slv-spa |
slv-ita 1,586 |
|||||||||||
sme | sme-spa 1 |
||||||||||||
ssp | es-ssp 3,397 |
||||||||||||
tet | tet-por 3,075 |
||||||||||||
zho | zho-spa 14,040 |
Many of these are documented in Publications.
Samples
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Italian | Tutti gli esseri umani nascono liberi ed eguali in dignità e diritti. Essi sono dotati di ragione e di coscienza e devono agire gli uni verso gli altri in spirito di fratellanza. |
Venetian | Tuti i essari Umani nasse liberi e uguaƚi in teƚa dignità e diriti. I xe dotai de raxón e de cosiensa e i gà da agire cò spirito de fraternità lun l’altro. |
French | Tous les êtres humains naissent libres et égaux en dignité et en droits. Ils sont doués de raison et de conscience et doivent agir les uns envers les autres dans un esprit de fraternité. |
Picard | Tos lès-omes vinèt å monde lîbes èt égåls po çou qu'èst d' leû dignité èt d' leûs dreûts. Leû re°zon èt leû consyince elzî fe°t on d'vwér di s'kidûre inte di zèle come dès frès |
Walloon | Tos lès-omes vinèt-st-å monde lîbes, èt so-l'minme pîd po çou qu'ènn'èst d'leu dignité èt d'leus dreûts. I n'sont nin foû rêzon èt-z-ont-i leû consyince po zèls, çou qu'èlzès deût miner a s'kidûre onk' po l'ôte tot come dès frés. |
Friulian | Ducj i oms a nassin libars e compagns come dignitât e derits. A an sintiment e cussience e bisugne che si tratin un culaltri come fradis. |
Romansch | Tuots umans naschan libers ed eguals in dignità e drets. Els sun dotats cun intellet e conscienza e dessan agir tanter per in uin spiert da fraternità. |
Catalan-Valencian-Balear | Tots els éssers humans neixen lliures i iguals en dignitat i en drets. Són dotats de raó i de consciència, i han de comportar-se fraternalment els uns amb els altres. |
Asturian | Tolos seres humanos nacen llibres y iguales en dignidá y drechos y, pola mor de la razón y la conciencia de so, han comportase hermaniblemente los unos colos otros. |
Ladino | Todos los umanos nasen libres i iguales en dinyidad i derechos i, komo estan ekipados de razon i konsensia, deven komportarsen kon ermandad los unos kon los otros. |
Spanish | Todos los seres humanos nacen libres e iguales en dignidad y derechos y, dotados como están de razón y conciencia, deben comportarse fraternalmente los unos con los otros. |
Galician | Tódolos seres humanos nacen libres e iguais en dignidade e dereitos e, dotados como están de razón e conciencia, díbense comportar fraternalmente uns cos outros. |
Portuguese | Todos os seres humanos nascem livres e iguais em dignidade e em direitos. Dotados de razão e de consciência, devem agir uns para com os outros em espírito de fraternidade. |
Corsican | Nascinu tutti l’omi libari è pari di dignità è di diritti. Pussedinu a raghjoni è a cuscenza è li tocca ad agiscia trà elli di modu fraternu. |
Sardinian, Logudorese | Totu sos èsseres umanos naschint lìberos e eguales in dinnidade e in deretos. Issos tenent sa resone e sa cussèntzia e depent operare s'unu cun s'àteru cun ispìritu de fraternidade. |
Vulnerability
This table summarizes the vulnerability of various Romance languages. Vulnerability data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | Location | Speakers | Status | |
---|---|---|---|---|---|
Ethnologue | UNESCO | ||||
Zarphatic | zrp
|
France | 0 | 10 (Extinct) | - |
Shuadit | sdt
|
France | 0 | 10 (Extinct) | - |
Emilian | egl
|
Italy | 0 | 9 (Dormant) | - |
Romagnol | rgn
|
Italy | 0 | 9 (Dormant) | - |
Minderico | drc
|
Portugal | 500 | 8b (Nearly extinct) | - |
Judeo-Italian | itk
|
Italy | 250 | 8a (Moribund) | - |
Arpitan | frp
|
France & Italy | 137,000 | 8a (Moribund) | 2 (Definitely endangered) |
Romanian, Istro | ruo
|
Croatia | 560 | 7 (Shifting) | 3 (Severely endangered) |
Istriot | ist
|
Croatia | 1,000 | 7 (Shifting) | 3 (Severely endangered) |
Romanian, Megleno | ruq
|
Greece, Macedonia | 5,000 | 7 (Shifting) | 3 (Severely endangered) |
French, Cajun | frc
|
United States | 25,600 | 7 (Shifting) | - |
Extremaduran | ext
|
Spain | 201,500 | 7 (Shifting) | - |
Aragonese | arg
|
Spain | 10,000 | 6b (Threatened) | 2 (Definitely endangered) |
Ladin | lld
|
Italy | 20,000 | 6b (Threatened) | 2 (Definitely endangered) |
Sardinian, Gallurese | sdn
|
Italy | 100,000 | 6b (Threatened) | 2 (Definitely endangered) |
Sardinian, Sassarese | sdc
|
Italy | 100,000 | 6b (Threatened) | 2 (Definitely endangered) |
Asturian | ast
|
Spain | 110,000 | 6b (Threatened) | - |
Aromanian | rup
|
Albania, Bulgaria, Greece, Macedonia, Serbia | 123,300 | 6b (Threatened) | 2 (Definitely endangered) |
Sardinian, Logudorese | src
|
Italy | 500,000 | 6b (Threatened) | 2 (Definitely endangered) |
Walloon | wln
|
Belgium, France, Luxembourg | 600,000 | 6b (Threatened) | 2 (Definitely endangered) |
Spanish, Loreto-Ucayali | spq
|
Peru | 2,800 | 6a (Vigorous) | - |
Fala | fax
|
Spain | 10,500 | 6a (Vigorous) | - |
Sardinian, Campidanese | sro
|
Italy | 500,000 | 6a (Vigorous) | 2 (Definitely endangered) |
Corsican | cos
|
France, Italy | 31,000 | 5 (Developing) | 2 (Definitely endangered) |
Picard | pcd
|
Belgium, France | 200,000 | 5 (Developing) | 3 (Severely endangered) |
Friulian | fur
|
Italy | 300,000 | 5 (Developing) | 2 (Definitely endangered) |
Ligurian | lij
|
France, Italy, Monaco | 505,100 | 5 (Developing) | 2 (Definitely endangered) |
Piemontese | pms
|
Italy | 1,600,000 | 5 (Developing) | 2 (Definitely endangered) |
Lombard | lmo
|
Italy | 3,903,000 | 5 (Developing) | 2 (Definitely endangered) |
Sicilian | scn
|
Italy | 4,700,000 | 5 (Developing) | 1 (Vulnerable) |
Napoletano-Calabrese | nap
|
Italy | 5,700,000 | 5 (Developing) | 1 (Vulnerable) |
Romansch | roh
|
Switzerland | 35,139 | 4 (Educational) | 2 (Definitely endangered) |
Ladino | lad
|
Israel & Albania, Algeria, Bosnia and Herzegovina, Bulgaria, Croatia, Greece, Macedonia, Morocco, Romania, Turkey, Serbia | 112,130 | 4 (Educational) | 3 (Severely endangered) |
Occitan | oci
|
France, Italy | 2,048,310 | 4 (Educational) | 2 (Definitely endangered) |
Venetian | vec
|
Croatia, Italy, Slovenia | 3,852,500 | 4 (Educational) | 1 (Vulnerable) |
Mirandese | mwl
|
Portugal | 15,000 | 2 (Provincial) | - |
Galician | glg
|
Spain | 3,185,000 | 2 (Provincial) | - |
Catalan | cat
|
Spain & Italy | 7,220,420 | 2 (Provincial) | 2 (Definitely endangered) |
Romanian | ron
|
Romania | 23,623,890 | 1 (National) | - |
Italian | ita
|
Italy | 61,068,677 | 1 (National) | - |
French | fra
|
France | 68,458,600 | 1 (National) | 3 (Severely endangered) |
Portuguese | por
|
Portugal | 202,468,100 | 1 (National) | - |
Spanish | spa
|
Spain | 405,638,110 | 1 (National) | - |
Other language pairs
- Pairs including a non-Romance language
- English and Portuguese
- French and Esperanto
- English and Galician
- English to Catalan
- English and Spanish
- Spanish and Esperanto
- Breton and French
- Catalan and Esperanto
- Basque and Spanish