Difference between revisions of "Romance languages"
(28 intermediate revisions by 4 users not shown) | |||
Line 1: | Line 1: | ||
[[Langues romanes|En français]] |
|||
{{TOCD}} |
{{TOCD}} |
||
The '''Romance languages''' include [[Catalan]] |
The '''Romance languages''' ([[Wikipedia:Romance languages]]) include [[Catalan]], [[Occitan]], [[Asturian]], [[Spanish]] (<code>es</code>), [[French]], [[Galician]], [[Portuguese]], [[Romanian]] and [[Italian]] . The languages are related with varying levels of mutual intelligibility. Many of these languages are included in Apertium already. |
||
Romance languages that are not yet covered in [[Apertium]] include [[Aromanian]], [[Arpitan]], [[Corsican]], [[Friulan]], [[Ladino]], [[Leonese]], [[Lombard]], [[Mirandese]], [[Neapolitan]], [[Piedmontese]], [[Romansh]], [[Sicilian]], [[Venetian]] and [[Walloon]]. |
|||
==Status== |
==Status== |
||
The ultimate goal is to have multi-purposable transducers for a variety of Romance languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs. |
|||
=== Transducers === |
|||
{| |
{| class="wikitable sortable" |
||
|- style="background: #ececec" |
|||
! !! an !! ast !! ca !! es !! fr !! gl !! it !! oc !! pt !! ro !! sc |
|||
|- |
|- |
||
!rowspan=2| name |
|||
| '''an''' || — || || || ''[[an-es]]'' || || || || || || || |
|||
!rowspan=2| Language |
|||
!rowspan=2| native name |
|||
!colspan=2 class="unsortable"| ISO 639 |
|||
!rowspan=2| formalism |
|||
!rowspan=2| state |
|||
!rowspan=2| stems |
|||
!rowspan=2| coverage |
|||
!rowspan=2| location |
|||
!rowspan=2 class="unsortable"| primary authors |
|||
|-class="sortbottom" |
|||
! -2 |
|||
! -3 |
|||
|- |
|- |
||
|| <code>[[apertium-spa]]</code> |
|||
| '''ast''' || || — || || [[es-ast]] || || || || || || || |
|||
|| [[Spanish]] |
|||
|| {{#lst:apertium-spa/stats|nativename}} |
|||
|| <code>es</code> |
|||
|| <code>spa</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-spa/stats|stems}} |
|||
|align="center"| [[Apertium-spa#Current_State|~{{:Apertium-spa/stats/average}}%]] |
|||
|| {{#lst:apertium-spa/stats|location}} |
|||
|| {{#lst:apertium-spa/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-cat]]</code> |
|||
| '''ca''' || || || — || [[es-ca]] || [[ca-fr]] || || ''[[ca-it]]'' || [[oc-ca]] || [[pt-ca]] || || ''[[ca-sc]]'' |
|||
|| [[Catalan]] |
|||
|- |
|||
|| {{#lst:apertium-cat/stats|nativename}} |
|||
| '''es''' || ''[[an-es]]'' || [[es-ast]] || [[es-ca]] || — || [[fr-es]] || [[es-gl]] || ''[[es-it]]'' || [[oc-es]] || [[es-pt]] || [[es-ro]] || |
|||
|| <code>ca</code> |
|||
|| <code>cat</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-cat/stats|stems}} |
|||
|align="center"| [[Apertium-cat#Current_State|~{{:Apertium-cat/stats/average}}%]] |
|||
|| {{#lst:apertium-cat/stats|location}} |
|||
|| {{#lst:apertium-cat/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-ita]]</code> |
|||
| '''fr''' || || || [[ca-fr]] || [[fr-es]] || — || || || || ''[[fr-pt]]'' || || |
|||
|| [[Italian]] |
|||
|| {{#lst:apertium-ita/stats|nativename}} |
|||
|| <code>it</code> |
|||
|| <code>ita</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-ita/stats|stems}} |
|||
|align="center"| [[Apertium-ita#Current_State|~{{:Apertium-ita/stats/average}}%]] |
|||
|| {{#lst:apertium-ita/stats|location}} |
|||
|| {{#lst:apertium-ita/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-arg]]</code> |
|||
| '''gl''' || || || || [[es-gl]] || || — || || || [[pt-gl]] || || |
|||
|| [[Aragonese]] |
|||
|| {{#lst:apertium-arg/stats|nativename}} |
|||
|| <code>an</code> |
|||
|| <code>arg</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-arg/stats|stems}} |
|||
|align="center"| [[Apertium-arg#Current_State|~{{:Apertium-arg/stats/average}}%]] |
|||
|| {{#lst:apertium-arg/stats|location}} |
|||
|| {{#lst:apertium-arg/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-ast]]</code> |
|||
| '''it''' || || || ''[[ca-it]]'' || ''[[es-it]]'' || || || — || || || || |
|||
|| [[Asturian]] |
|||
|| {{#lst:apertium-ast/stats|nativename}} |
|||
|| <code>-</code> |
|||
|| <code>ast</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-ast/stats|stems}} |
|||
|align="center"| [[Apertium-ast#Current_State|~{{:Apertium-ast/stats/average}}%]] |
|||
|| {{#lst:apertium-ast/stats|location}} |
|||
|| {{#lst:apertium-ast/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-oci]]</code> |
|||
| '''oc''' || || || [[ca-oc]] || [[es-oc]] || || || || — || || || |
|||
|| [[Occitan]] |
|||
|| {{#lst:apertium-oci/stats|nativename}} |
|||
|| <code>oc</code> |
|||
|| <code>oci</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-oci/stats|stems}} |
|||
|align="center"| [[Apertium-oci#Current_State|~{{:Apertium-oci/stats/average}}%]] |
|||
|| {{#lst:apertium-oci/stats|location}} |
|||
|| {{#lst:apertium-oci/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-srd]]</code> |
|||
| '''pt''' || || || [[pt-ca]] || [[es-pt]] || ''[[fr-pt]]'' || [[pt-gl]] || || || — || || |
|||
|| [[Sardinian]] |
|||
|| {{#lst:apertium-srd/stats|nativename}} |
|||
|| <code>sc</code> |
|||
|| <code>srd</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-srd/stats|stems}} |
|||
|align="center"| [[Apertium-srd#Current_State|~{{:Apertium-srd/stats/average}}%]] |
|||
|| {{#lst:apertium-srd/stats|location}} |
|||
|| {{#lst:apertium-srd/stats|authors}} |
|||
|- |
|- |
||
|| <code>[[apertium-scn]]</code> |
|||
| '''ro''' || || || || [[es-ro]] || || || || || || — || |
|||
|| [[Sicilian]] |
|||
|| {{#lst:apertium-scn/stats|nativename}} |
|||
|| <code>-</code> |
|||
|| <code>scn</code> |
|||
|| lttoolbox |
|||
|| development |
|||
|align="right"| {{#lst:apertium-scn/stats|stems}} |
|||
|align="center"| [[Apertium-scn#Current_State|~{{:Apertium-scn/stats/average}}%]] |
|||
|| {{#lst:apertium-scn/stats|location}} |
|||
|| {{#lst:apertium-scn/stats|authors}} |
|||
|- |
|||
|| <code>[[apertium-fra]]</code> |
|||
|| [[French]] |
|||
|| {{#lst:apertium-fra/stats|nativename}} |
|||
|| <code>fr</code> |
|||
|| <code>fra</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-fra/stats|stems}} |
|||
|align="center"| [[Apertium-fra#Current_State|~{{:Apertium-fra/stats/average}}%]] |
|||
|| {{#lst:apertium-fra/stats|location}} |
|||
|| {{#lst:apertium-fra/stats|authors}} |
|||
|- |
|||
|| <code>[[apertium-por]]</code> |
|||
|| [[Portuguese]] |
|||
|| {{#lst:apertium-por/stats|nativename}} |
|||
|| <code>pt</code> |
|||
|| <code>por</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-por/stats|stems}} |
|||
|align="center"| [[Apertium-por#Current_State|~{{:Apertium-por/stats/average}}%]] |
|||
|| {{#lst:apertium-por/stats|location}} |
|||
|| {{#lst:apertium-por/stats|authors}} |
|||
|- |
|||
|| <code>[[apertium-glg]]</code> |
|||
|| [[Galician]] |
|||
|| {{#lst:apertium-glg/stats|nativename}} |
|||
|| <code>gl</code> |
|||
|| <code>glg</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-glg/stats|stems}} |
|||
|align="center"| [[Apertium-glg#Current_State|~{{:Apertium-glg/stats/average}}%]] |
|||
|| {{#lst:apertium-glg/stats|location}} |
|||
|| {{#lst:apertium-glg/stats|authors}} |
|||
|- |
|||
|| <code>[[apertium-ron]]</code> |
|||
|| [[Romanian]] |
|||
|| {{#lst:apertium-ron/stats|nativename}} |
|||
|| <code>ro</code> |
|||
|| <code>ron</code> |
|||
|| lttoolbox |
|||
|| production |
|||
|align="right"| {{#lst:apertium-ron/stats|stems}} |
|||
|align="center"| [[Apertium-ron#Current_State|~{{:Apertium-ron/stats/average}}%]] |
|||
|| {{#lst:apertium-ron/stats|location}} |
|||
|| {{#lst:apertium-ron/stats|authors}} |
|||
|- |
|||
|| <code>[[apertium-cos]]</code> |
|||
|| [[Corsican]] |
|||
|| {{#lst:apertium-cos/stats|nativename}} |
|||
|| <code>co</code> |
|||
|| <code>cos</code> |
|||
|| lttoolbox |
|||
|| development |
|||
|align="right"| {{#lst:apertium-cos/stats|stems}} |
|||
|align="center"| [[Apertium-cos#Current_State|~{{:Apertium-cos/stats/average}}%]] |
|||
|| {{#lst:apertium-cos/stats|location}} |
|||
|| {{#lst:apertium-cos/stats|authors}} |
|||
|- |
|||
|| <code>[[apertium-rup]]</code> |
|||
|| [[Aromanian]] |
|||
|| {{#lst:apertium-rup/stats|nativename}} |
|||
|| <code>-</code> |
|||
|| <code>rup</code> |
|||
|| lttoolbox |
|||
|| prototype |
|||
|align="right"| {{#lst:apertium-rup/stats|stems}} |
|||
|align="center"| [[Apertium-rup#Current_State|~{{:Apertium-rup/stats/average}}%]] |
|||
|| {{#lst:apertium-rup/stats|location}} |
|||
|| {{#lst:apertium-rup/stats|authors}} |
|||
|- |
|- |
||
| '''sc''' || || || ''[[ca-sc]]'' || || || || || || || || — |
|||
|- |
|||
|} |
|} |
||
=== Annotated corpora === |
|||
=== Table of existing pairs === |
|||
Text in ''italics'' denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in '''bold''' denotes a stable well-working language pair in trunk and text in '''''bold and italics''''' denotes a pair in staging. Bidix stems as counted with [[dixcounter]] are displayed below. |
|||
{{Romance language translations}} |
|||
Many of these are documented in [[Publications]]. |
Many of these are documented in [[Publications]]. |
||
==Samples== |
|||
==Other language pairs== |
|||
Article 1 of the Universal Declaration of Human Rights: |
|||
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.'' |
|||
* [[apertium-oc-ca|Occitan and Catalan]] |
|||
* [[apertium-oc-es|Occitan and Spanish]] |
|||
{|class=wikitable |
|||
* [[apertium-es-gl|Spanish and Galician]] |
|||
! Language !! Text |
|||
* [[apertium-fr-es|French and Spanish]] |
|||
|- |
|||
* [[apertium-pt-ca|Portuguese and Catalan]] |
|||
|| Italian || Tutti gli esseri umani nascono liberi ed eguali in dignità e diritti. Essi sono dotati di ragione e di coscienza e devono agire gli uni verso gli altri in spirito di fratellanza. |
|||
* [[apertium-pt-gl|Portuguese and Galician]] |
|||
|- |
|||
|| Venetian || Tuti i essari Umani nasse liberi e uguaƚi in teƚa dignità e diriti. I xe dotai de raxón e de cosiensa e i gà da agire cò spirito de fraternità lun l’altro. |
|||
|- |
|||
|| French || Tous les êtres humains naissent libres et égaux en dignité et en droits. Ils sont doués de raison et de conscience et doivent agir les uns envers les autres dans un esprit de fraternité. |
|||
|- |
|||
|| Picard || Tos lès-omes vinèt å monde lîbes èt égåls po çou qu'èst d' leû dignité èt d' leûs dreûts. Leû re°zon èt leû consyince elzî fe°t on d'vwér di s'kidûre inte di zèle come dès frès |
|||
|- |
|||
|| Walloon || Tos lès-omes vinèt-st-å monde lîbes, èt so-l'minme pîd po çou qu'ènn'èst d'leu dignité èt d'leus dreûts. I n'sont nin foû rêzon èt-z-ont-i leû consyince po zèls, çou qu'èlzès deût miner a s'kidûre onk' po l'ôte tot come dès frés. |
|||
|- |
|||
|| Friulian || Ducj i oms a nassin libars e compagns come dignitât e derits. A an sintiment e cussience e bisugne che si tratin un culaltri come fradis. |
|||
|- |
|||
|| Romansch || Tuots umans naschan libers ed eguals in dignità e drets. Els sun dotats cun intellet e conscienza e dessan agir tanter per in uin spiert da fraternità. |
|||
|- |
|||
|| Catalan-Valencian-Balear || Tots els éssers humans neixen lliures i iguals en dignitat i en drets. Són dotats de raó i de consciència, i han de comportar-se fraternalment els uns amb els altres. |
|||
|- |
|||
|| Asturian || Tolos seres humanos nacen llibres y iguales en dignidá y drechos y, pola mor de la razón y la conciencia de so, han comportase hermaniblemente los unos colos otros. |
|||
|- |
|||
|| Ladino || Todos los umanos nasen libres i iguales en dinyidad i derechos i, komo estan ekipados de razon i konsensia, deven komportarsen kon ermandad los unos kon los otros. |
|||
|- |
|||
|| Spanish || Todos los seres humanos nacen libres e iguales en dignidad y derechos y, dotados como están de razón y conciencia, deben comportarse fraternalmente los unos con los otros. |
|||
|- |
|||
|| Galician || Tódolos seres humanos nacen libres e iguais en dignidade e dereitos e, dotados como están de razón e conciencia, díbense comportar fraternalmente uns cos outros. |
|||
|- |
|||
|| Portuguese || Todos os seres humanos nascem livres e iguais em dignidade e em direitos. Dotados de razão e de consciência, devem agir uns para com os outros em espírito de fraternidade. |
|||
|- |
|||
|| Corsican || Nascinu tutti l’omi libari è pari di dignità è di diritti. Pussedinu a raghjoni è a cuscenza è li tocca ad agiscia trà elli di modu fraternu. |
|||
|- |
|||
|| Sardinian, Logudorese || Totu sos èsseres umanos naschint lìberos e eguales in dinnidade e in deretos. Issos tenent sa resone e sa cussèntzia e depent operare s'unu cun s'àteru cun ispìritu de fraternidade. |
|||
|} |
|||
==Vulnerability== |
|||
This table summarizes the vulnerability of various Romance languages. Vulnerability data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’ and [http://www.ethnologue.com/ Ethnologue]. |
|||
{| class="wikitable sortable" |
|||
!rowspan=2| Language |
|||
!rowspan=2| ISO639-3 |
|||
!rowspan=2| Location |
|||
!rowspan=2| Speakers |
|||
!colspan=2|Status |
|||
|-class="sortbottom" |
|||
! Ethnologue |
|||
! UNESCO |
|||
|- |
|||
|| Zarphatic |
|||
|align="center"| <code>[http://www.ethnologue.com/language/zrp zrp]</code> |
|||
|| France |
|||
|align="right"| 0 |
|||
|| 10 (Extinct) |
|||
|| - |
|||
|- |
|||
|| Shuadit |
|||
|align="center"| <code>[http://www.ethnologue.com/language/sdt sdt]</code> |
|||
|| France |
|||
|align="right"| 0 |
|||
|| 10 (Extinct) |
|||
|| - |
|||
|- |
|||
|| Emilian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/egl egl]</code> |
|||
|| Italy |
|||
|align="right"| 0 |
|||
|| 9 (Dormant) |
|||
|| - |
|||
|- |
|||
|| Romagnol |
|||
|align="center"| <code>[http://www.ethnologue.com/language/rgn rgn]</code> |
|||
|| Italy |
|||
|align="right"| 0 |
|||
|| 9 (Dormant) |
|||
|| - |
|||
|- |
|||
|| Minderico |
|||
|align="center"| <code>[http://www.ethnologue.com/language/drc drc]</code> |
|||
|| Portugal |
|||
|align="right"| 500 |
|||
|| 8b (Nearly extinct) |
|||
|| - |
|||
|- |
|||
|| Judeo-Italian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/itk itk]</code> |
|||
|| Italy |
|||
|align="right"| 250 |
|||
|| 8a (Moribund) |
|||
|| - |
|||
|- |
|||
|| Arpitan |
|||
|align="center"| <code>[http://www.ethnologue.com/language/frp frp]</code> |
|||
|| France & Italy |
|||
|align="right"| 137,000 |
|||
|| 8a (Moribund) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Romanian, Istro |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ruo ruo]</code> |
|||
|| Croatia |
|||
|align="right"| 560 |
|||
|| 7 (Shifting) |
|||
|| 3 (Severely endangered) |
|||
|- |
|||
|| Istriot |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ist ist]</code> |
|||
|| Croatia |
|||
|align="right"| 1,000 |
|||
|| 7 (Shifting) |
|||
|| 3 (Severely endangered) |
|||
|- |
|||
|| Romanian, Megleno |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ruq ruq]</code> |
|||
|| Greece, Macedonia |
|||
|align="right"| 5,000 |
|||
|| 7 (Shifting) |
|||
|| 3 (Severely endangered) |
|||
|- |
|||
|| French, Cajun |
|||
|align="center"| <code>[http://www.ethnologue.com/language/frc frc]</code> |
|||
|| United States |
|||
|align="right"| 25,600 |
|||
|| 7 (Shifting) |
|||
|| - |
|||
|- |
|||
|| Extremaduran |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ext ext]</code> |
|||
|| Spain |
|||
|align="right"| 201,500 |
|||
|| 7 (Shifting) |
|||
|| - |
|||
|- |
|||
|| Aragonese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/arg arg]</code> |
|||
|| Spain |
|||
|align="right"| 10,000 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Ladin |
|||
|align="center"| <code>[http://www.ethnologue.com/language/lld lld]</code> |
|||
|| Italy |
|||
|align="right"| 20,000 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Sardinian, Gallurese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/sdn sdn]</code> |
|||
|| Italy |
|||
|align="right"| 100,000 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Sardinian, Sassarese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/sdc sdc]</code> |
|||
|| Italy |
|||
|align="right"| 100,000 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Asturian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ast ast]</code> |
|||
|| Spain |
|||
|align="right"| 110,000 |
|||
|| 6b (Threatened) |
|||
|| - |
|||
|- |
|||
|| Aromanian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/rup rup]</code> |
|||
|| Albania, Bulgaria, Greece, Macedonia, Serbia |
|||
|align="right"| 123,300 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Sardinian, Logudorese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/src src]</code> |
|||
|| Italy |
|||
|align="right"| 500,000 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Walloon |
|||
|align="center"| <code>[http://www.ethnologue.com/language/wln wln]</code> |
|||
|| Belgium, France, Luxembourg |
|||
|align="right"| 600,000 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Spanish, Loreto-Ucayali |
|||
|align="center"| <code>[http://www.ethnologue.com/language/spq spq]</code> |
|||
|| Peru |
|||
|align="right"| 2,800 |
|||
|| 6a (Vigorous) |
|||
|| - |
|||
|- |
|||
|| Fala |
|||
|align="center"| <code>[http://www.ethnologue.com/language/fax fax]</code> |
|||
|| Spain |
|||
|align="right"| 10,500 |
|||
|| 6a (Vigorous) |
|||
|| - |
|||
|- |
|||
|| Sardinian, Campidanese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/sro sro]</code> |
|||
|| Italy |
|||
|align="right"| 500,000 |
|||
|| 6a (Vigorous) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Corsican |
|||
|align="center"| <code>[http://www.ethnologue.com/language/cos cos]</code> |
|||
|| France, Italy |
|||
|align="right"| 31,000 |
|||
|| 5 (Developing) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Picard |
|||
|align="center"| <code>[http://www.ethnologue.com/language/pcd pcd]</code> |
|||
|| Belgium, France |
|||
|align="right"| 200,000 |
|||
|| 5 (Developing) |
|||
|| 3 (Severely endangered) |
|||
|- |
|||
|| Friulian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/fur fur]</code> |
|||
|| Italy |
|||
|align="right"| 300,000 |
|||
|| 5 (Developing) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Ligurian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/lij lij]</code> |
|||
|| France, Italy, Monaco |
|||
|align="right"| 505,100 |
|||
|| 5 (Developing) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Piemontese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/pms pms]</code> |
|||
|| Italy |
|||
|align="right"| 1,600,000 |
|||
|| 5 (Developing) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Lombard |
|||
|align="center"| <code>[http://www.ethnologue.com/language/lmo lmo]</code> |
|||
|| Italy |
|||
|align="right"| 3,903,000 |
|||
|| 5 (Developing) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Sicilian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/scn scn]</code> |
|||
|| Italy |
|||
|align="right"| 4,700,000 |
|||
|| 5 (Developing) |
|||
|| 1 (Vulnerable) |
|||
|- |
|||
|| Napoletano-Calabrese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/nap nap]</code> |
|||
|| Italy |
|||
|align="right"| 5,700,000 |
|||
|| 5 (Developing) |
|||
|| 1 (Vulnerable) |
|||
|- |
|||
|| Romansch |
|||
|align="center"| <code>[http://www.ethnologue.com/language/roh roh]</code> |
|||
|| Switzerland |
|||
|align="right"| 35,139 |
|||
|| 4 (Educational) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Ladino |
|||
|align="center"| <code>[http://www.ethnologue.com/language/lad lad]</code> |
|||
|| Israel & Albania, Algeria, Bosnia and Herzegovina, Bulgaria, Croatia, Greece, Macedonia, Morocco, Romania, Turkey, Serbia |
|||
|align="right"| 112,130 |
|||
|| 4 (Educational) |
|||
|| 3 (Severely endangered) |
|||
|- |
|||
|| Occitan |
|||
|align="center"| <code>[http://www.ethnologue.com/language/oci oci]</code> |
|||
|| France, Italy |
|||
|align="right"| 2,048,310 |
|||
|| 4 (Educational) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Venetian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/vec vec]</code> |
|||
|| Croatia, Italy, Slovenia |
|||
|align="right"| 3,852,500 |
|||
|| 4 (Educational) |
|||
|| 1 (Vulnerable) |
|||
|- |
|||
|| Mirandese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/mwl mwl]</code> |
|||
|| Portugal |
|||
|align="right"| 15,000 |
|||
|| 2 (Provincial) |
|||
|| - |
|||
|- |
|||
|| Galician |
|||
|align="center"| <code>[http://www.ethnologue.com/language/glg glg]</code> |
|||
|| Spain |
|||
|align="right"| 3,185,000 |
|||
|| 2 (Provincial) |
|||
|| - |
|||
|- |
|||
|| Catalan |
|||
|align="center"| <code>[http://www.ethnologue.com/language/cat cat]</code> |
|||
|| Spain & Italy |
|||
|align="right"| 7,220,420 |
|||
|| 2 (Provincial) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|||
|| Romanian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ron ron]</code> |
|||
|| Romania |
|||
|align="right"| 23,623,890 |
|||
|| 1 (National) |
|||
|| - |
|||
|- |
|||
|| Italian |
|||
|align="center"| <code>[http://www.ethnologue.com/language/ita ita]</code> |
|||
|| Italy |
|||
|align="right"| 61,068,677 |
|||
|| 1 (National) |
|||
|| - |
|||
|- |
|||
|| French |
|||
|align="center"| <code>[http://www.ethnologue.com/language/fra fra]</code> |
|||
|| France |
|||
|align="right"| 68,458,600 |
|||
|| 1 (National) |
|||
|| 3 (Severely endangered) |
|||
|- |
|||
|| Portuguese |
|||
|align="center"| <code>[http://www.ethnologue.com/language/por por]</code> |
|||
|| Portugal |
|||
|align="right"| 202,468,100 |
|||
|| 1 (National) |
|||
|| - |
|||
|- |
|||
|| Spanish |
|||
|align="center"| <code>[http://www.ethnologue.com/language/spa spa]</code> |
|||
|| Spain |
|||
|align="right"| 405,638,110 |
|||
|| 1 (National) |
|||
|| - |
|||
|} |
|||
==Other language pairs== |
|||
;Pairs including a non-Romance language |
;Pairs including a non-Romance language |
||
* [[ |
* [[English and Portuguese]] |
||
* [[ |
* [[French and Esperanto]] |
||
* [[ |
* [[English and Galician]] |
||
* [[ |
* [[English to Catalan]] |
||
* [[ |
* [[English and Spanish]] |
||
* [[ |
* [[Spanish and Esperanto]] |
||
* [[ |
* [[Breton and French]] |
||
* [[ |
* [[Catalan and Esperanto]] |
||
* [[ |
* [[Basque and Spanish]] |
||
==Resources== |
==Resources== |
||
Line 61: | Line 580: | ||
==Samples== |
==Samples== |
||
== See also == |
|||
* [[List of language pairs]] |
|||
[[Category:Languages]] |
[[Category:Languages]] |
||
[[Category:Romance languages]] |
[[Category:Romance languages]] |
||
[[Category:Documentation in English]] |
Latest revision as of 18:25, 18 September 2016
The Romance languages (Wikipedia:Romance languages) include Catalan, Occitan, Asturian, Spanish (es
), French, Galician, Portuguese, Romanian and Italian . The languages are related with varying levels of mutual intelligibility. Many of these languages are included in Apertium already.
Romance languages that are not yet covered in Apertium include Aromanian, Arpitan, Corsican, Friulan, Ladino, Leonese, Lombard, Mirandese, Neapolitan, Piedmontese, Romansh, Sicilian, Venetian and Walloon.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Romance languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
name | Language | native name | ISO 639 | formalism | state | stems | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||
apertium-spa
|
Spanish | castellano | es
|
spa
|
lttoolbox | production | 46,003 | [[Apertium-spa#Current_State|~Apertium-spa/stats/average%]] | ||
apertium-cat
|
Catalan | català | ca
|
cat
|
lttoolbox | production | 95604 | [[Apertium-cat#Current_State|~Apertium-cat/stats/average%]] | ||
apertium-ita
|
Italian | italiano | it
|
ita
|
lttoolbox | production | 25,609 | [[Apertium-ita#Current_State|~Apertium-ita/stats/average%]] | ||
apertium-arg
|
Aragonese | aragonés | an
|
arg
|
lttoolbox | production | 26,068 | [[Apertium-arg#Current_State|~Apertium-arg/stats/average%]] | ||
apertium-ast
|
Asturian | asturianu | -
|
ast
|
lttoolbox | production | 498 | [[Apertium-ast#Current_State|~Apertium-ast/stats/average%]] | ||
apertium-oci
|
Occitan | occitan | oc
|
oci
|
lttoolbox | production | [[Apertium-oci#Current_State|~Apertium-oci/stats/average%]] | |||
apertium-srd
|
Sardinian | sardu | sc
|
srd
|
lttoolbox | production | 46,642 | [[Apertium-srd#Current_State|~Apertium-srd/stats/average%]] | ||
apertium-scn
|
Sicilian | sicilianu | -
|
scn
|
lttoolbox | development | 25,723 | ~84.4% | ||
apertium-fra
|
French | français | fr
|
fra
|
lttoolbox | production | [[Apertium-fra#Current_State|~Apertium-fra/stats/average%]] | |||
apertium-por
|
Portuguese | português | pt
|
por
|
lttoolbox | production | 14,796 | [[Apertium-por#Current_State|~Apertium-por/stats/average%]] | ||
apertium-glg
|
Galician | galego | gl
|
glg
|
lttoolbox | production | 31,916 | [[Apertium-glg#Current_State|~Apertium-glg/stats/average%]] | ||
apertium-ron
|
Romanian | română | ro
|
ron
|
lttoolbox | production | 18,878 | [[Apertium-ron#Current_State|~Apertium-ron/stats/average%]] | ||
apertium-cos
|
Corsican | corsu | co
|
cos
|
lttoolbox | development | 3,618 | ~85.9% | ||
apertium-rup
|
Aromanian | -
|
rup
|
lttoolbox | prototype | 312,005 | [[Apertium-rup#Current_State|~Apertium-rup/stats/average%]] |
Annotated corpora[edit]
Table of existing pairs[edit]
Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.
arg | ast | cat | cos | spa | fra | glg | ita | oci | por | ron | rup | srd | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
arg | - | es-an 17,758 |
|||||||||||
ast | - | es-ast 57,067 |
|||||||||||
cat | - | cat-cos 316 |
es-ca 43,180 |
fr-ca 10,560 |
ca-it 9,773 |
oc-ca 24,877 |
pt-ca 8,034 |
ca-ro 16,431 |
cat-srd 2,803 | ||||
cos | cat-cos 316 |
- | cos-ita 25 |
||||||||||
spa | es-an 17,758 |
es-ast 57,067 |
es-ca 43,180 |
- | fr-es 26,993 |
es-gl 28,052 |
es-it 12,505 |
oc-es 18,774 |
es-pt 15,520 |
es-ro 24,528 |
|||
fra | fr-ca 10,560 |
fr-es 26,993 |
- | fr-it 1 |
oc-fr 10,588 |
fra-por 18,919 |
fr-ro 12,727 |
||||||
glg | es-gl 28,052 |
- | pt-gl 12,330 |
||||||||||
ita | ca-it 9,773 |
cos-ita 25 |
es-it 12,505 |
fr-it 1 |
- | it-pt 1 |
ro-it 10,093 |
ita-srd ? | |||||
oci | oc-ca 24,877 |
oc-es 18,774 |
oc-fr 10,588 |
- | |||||||||
por | pt-ca 8,034 |
es-pt 15,520 |
fra-por 18,919 |
pt-gl 12,330 |
it-pt 1 |
- | sc-pt ? | ||||||
ron | ca-ro 16,431 |
es-ro 24,528 |
fr-ro 12,727 |
ro-it 10,093 |
- | ron-rup 402 |
|||||||
rup | ron-rup 402 |
- | |||||||||||
srd | cat-srd 2,803 |
ita-srd ? |
sc-pt ? |
- | |||||||||
bre | br-es 11,760 |
br-fr 27,988 |
|||||||||||
ces | es-cs 387 |
||||||||||||
cym | cy-es 8,798 |
||||||||||||
deu | es-de 615 |
||||||||||||
eng | en-ca 35,873 |
'en-es ' |
en-fr 13,455 |
en-gl 30,049 |
en-it 21,067 |
en-pt 6,828 |
|||||||
epo | eo-ca 43,160 |
eo-es 48,312 |
eo-fr 43,077 |
eo-it 9,344 |
eo-pt 12,760 |
||||||||
eus | eu-es 19,307 |
eu-fr 7,616 |
|||||||||||
guc | guc-spa 1,077 |
||||||||||||
ina | es-ia 35 |
ron-ina 192 |
|||||||||||
lat | la-es 1,920 |
la-it 65 |
|||||||||||
mlt | mlt-spa 137 |
||||||||||||
nld | fr-nl 1,744 |
||||||||||||
quz | quz-spa 0 |
||||||||||||
qve | spa-qve 0 |
||||||||||||
slv | slv-spa |
slv-ita 1,586 |
|||||||||||
sme | sme-spa 1 |
||||||||||||
ssp | es-ssp 3,397 |
||||||||||||
tet | tet-por 3,075 |
||||||||||||
zho | zho-spa 14,040 |
Many of these are documented in Publications.
Samples[edit]
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Italian | Tutti gli esseri umani nascono liberi ed eguali in dignità e diritti. Essi sono dotati di ragione e di coscienza e devono agire gli uni verso gli altri in spirito di fratellanza. |
Venetian | Tuti i essari Umani nasse liberi e uguaƚi in teƚa dignità e diriti. I xe dotai de raxón e de cosiensa e i gà da agire cò spirito de fraternità lun l’altro. |
French | Tous les êtres humains naissent libres et égaux en dignité et en droits. Ils sont doués de raison et de conscience et doivent agir les uns envers les autres dans un esprit de fraternité. |
Picard | Tos lès-omes vinèt å monde lîbes èt égåls po çou qu'èst d' leû dignité èt d' leûs dreûts. Leû re°zon èt leû consyince elzî fe°t on d'vwér di s'kidûre inte di zèle come dès frès |
Walloon | Tos lès-omes vinèt-st-å monde lîbes, èt so-l'minme pîd po çou qu'ènn'èst d'leu dignité èt d'leus dreûts. I n'sont nin foû rêzon èt-z-ont-i leû consyince po zèls, çou qu'èlzès deût miner a s'kidûre onk' po l'ôte tot come dès frés. |
Friulian | Ducj i oms a nassin libars e compagns come dignitât e derits. A an sintiment e cussience e bisugne che si tratin un culaltri come fradis. |
Romansch | Tuots umans naschan libers ed eguals in dignità e drets. Els sun dotats cun intellet e conscienza e dessan agir tanter per in uin spiert da fraternità. |
Catalan-Valencian-Balear | Tots els éssers humans neixen lliures i iguals en dignitat i en drets. Són dotats de raó i de consciència, i han de comportar-se fraternalment els uns amb els altres. |
Asturian | Tolos seres humanos nacen llibres y iguales en dignidá y drechos y, pola mor de la razón y la conciencia de so, han comportase hermaniblemente los unos colos otros. |
Ladino | Todos los umanos nasen libres i iguales en dinyidad i derechos i, komo estan ekipados de razon i konsensia, deven komportarsen kon ermandad los unos kon los otros. |
Spanish | Todos los seres humanos nacen libres e iguales en dignidad y derechos y, dotados como están de razón y conciencia, deben comportarse fraternalmente los unos con los otros. |
Galician | Tódolos seres humanos nacen libres e iguais en dignidade e dereitos e, dotados como están de razón e conciencia, díbense comportar fraternalmente uns cos outros. |
Portuguese | Todos os seres humanos nascem livres e iguais em dignidade e em direitos. Dotados de razão e de consciência, devem agir uns para com os outros em espírito de fraternidade. |
Corsican | Nascinu tutti l’omi libari è pari di dignità è di diritti. Pussedinu a raghjoni è a cuscenza è li tocca ad agiscia trà elli di modu fraternu. |
Sardinian, Logudorese | Totu sos èsseres umanos naschint lìberos e eguales in dinnidade e in deretos. Issos tenent sa resone e sa cussèntzia e depent operare s'unu cun s'àteru cun ispìritu de fraternidade. |
Vulnerability[edit]
This table summarizes the vulnerability of various Romance languages. Vulnerability data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | Location | Speakers | Status | |
---|---|---|---|---|---|
Ethnologue | UNESCO | ||||
Zarphatic | zrp
|
France | 0 | 10 (Extinct) | - |
Shuadit | sdt
|
France | 0 | 10 (Extinct) | - |
Emilian | egl
|
Italy | 0 | 9 (Dormant) | - |
Romagnol | rgn
|
Italy | 0 | 9 (Dormant) | - |
Minderico | drc
|
Portugal | 500 | 8b (Nearly extinct) | - |
Judeo-Italian | itk
|
Italy | 250 | 8a (Moribund) | - |
Arpitan | frp
|
France & Italy | 137,000 | 8a (Moribund) | 2 (Definitely endangered) |
Romanian, Istro | ruo
|
Croatia | 560 | 7 (Shifting) | 3 (Severely endangered) |
Istriot | ist
|
Croatia | 1,000 | 7 (Shifting) | 3 (Severely endangered) |
Romanian, Megleno | ruq
|
Greece, Macedonia | 5,000 | 7 (Shifting) | 3 (Severely endangered) |
French, Cajun | frc
|
United States | 25,600 | 7 (Shifting) | - |
Extremaduran | ext
|
Spain | 201,500 | 7 (Shifting) | - |
Aragonese | arg
|
Spain | 10,000 | 6b (Threatened) | 2 (Definitely endangered) |
Ladin | lld
|
Italy | 20,000 | 6b (Threatened) | 2 (Definitely endangered) |
Sardinian, Gallurese | sdn
|
Italy | 100,000 | 6b (Threatened) | 2 (Definitely endangered) |
Sardinian, Sassarese | sdc
|
Italy | 100,000 | 6b (Threatened) | 2 (Definitely endangered) |
Asturian | ast
|
Spain | 110,000 | 6b (Threatened) | - |
Aromanian | rup
|
Albania, Bulgaria, Greece, Macedonia, Serbia | 123,300 | 6b (Threatened) | 2 (Definitely endangered) |
Sardinian, Logudorese | src
|
Italy | 500,000 | 6b (Threatened) | 2 (Definitely endangered) |
Walloon | wln
|
Belgium, France, Luxembourg | 600,000 | 6b (Threatened) | 2 (Definitely endangered) |
Spanish, Loreto-Ucayali | spq
|
Peru | 2,800 | 6a (Vigorous) | - |
Fala | fax
|
Spain | 10,500 | 6a (Vigorous) | - |
Sardinian, Campidanese | sro
|
Italy | 500,000 | 6a (Vigorous) | 2 (Definitely endangered) |
Corsican | cos
|
France, Italy | 31,000 | 5 (Developing) | 2 (Definitely endangered) |
Picard | pcd
|
Belgium, France | 200,000 | 5 (Developing) | 3 (Severely endangered) |
Friulian | fur
|
Italy | 300,000 | 5 (Developing) | 2 (Definitely endangered) |
Ligurian | lij
|
France, Italy, Monaco | 505,100 | 5 (Developing) | 2 (Definitely endangered) |
Piemontese | pms
|
Italy | 1,600,000 | 5 (Developing) | 2 (Definitely endangered) |
Lombard | lmo
|
Italy | 3,903,000 | 5 (Developing) | 2 (Definitely endangered) |
Sicilian | scn
|
Italy | 4,700,000 | 5 (Developing) | 1 (Vulnerable) |
Napoletano-Calabrese | nap
|
Italy | 5,700,000 | 5 (Developing) | 1 (Vulnerable) |
Romansch | roh
|
Switzerland | 35,139 | 4 (Educational) | 2 (Definitely endangered) |
Ladino | lad
|
Israel & Albania, Algeria, Bosnia and Herzegovina, Bulgaria, Croatia, Greece, Macedonia, Morocco, Romania, Turkey, Serbia | 112,130 | 4 (Educational) | 3 (Severely endangered) |
Occitan | oci
|
France, Italy | 2,048,310 | 4 (Educational) | 2 (Definitely endangered) |
Venetian | vec
|
Croatia, Italy, Slovenia | 3,852,500 | 4 (Educational) | 1 (Vulnerable) |
Mirandese | mwl
|
Portugal | 15,000 | 2 (Provincial) | - |
Galician | glg
|
Spain | 3,185,000 | 2 (Provincial) | - |
Catalan | cat
|
Spain & Italy | 7,220,420 | 2 (Provincial) | 2 (Definitely endangered) |
Romanian | ron
|
Romania | 23,623,890 | 1 (National) | - |
Italian | ita
|
Italy | 61,068,677 | 1 (National) | - |
French | fra
|
France | 68,458,600 | 1 (National) | 3 (Severely endangered) |
Portuguese | por
|
Portugal | 202,468,100 | 1 (National) | - |
Spanish | spa
|
Spain | 405,638,110 | 1 (National) | - |
Other language pairs[edit]
- Pairs including a non-Romance language
- English and Portuguese
- French and Esperanto
- English and Galician
- English to Catalan
- English and Spanish
- Spanish and Esperanto
- Breton and French
- Catalan and Esperanto
- Basque and Spanish