Difference between revisions of "Celtic languages"
(43 intermediate revisions by 5 users not shown) | |||
Line 1: | Line 1: | ||
[[Langues celtiques|En français]] |
|||
{{TOCD}} |
{{TOCD}} |
||
The '''Celtic languages''' include Welsh (<code>cy</code>), Breton (<code>br</code>), Cornish (<code>kw</code>), Irish (<code>ga</code>) and Scottish Gaelic (<code>gd</code>). |
The '''Celtic languages''' (<code>[http://www.ethnologue.com/subgroups/celtic cel]</code>) include [[Welsh]] (<code>cy</code>), [[Breton]] (<code>br</code>), [[Cornish]] (<code>kw</code>), [[Irish]] (<code>ga</code>), [[Manx]] (<code>gv</code>), and [[Scottish Gaelic]] (<code>gd</code>). Most commonly spoken on the north-western edge of Europe, the languages are related with varying levels of mutual intelligibility. |
||
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below. |
|||
==Existing== |
|||
==Status== |
|||
;Dictionaries |
|||
The ultimate goal is to have multi-purposable transducers for a variety of Celtic languages. These can then be paired for X→Y translation with the addition of a [[Constraint Grammar|CG]] for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs. |
|||
{{see-also|List of dictionaries}} |
|||
===Transducers=== |
|||
Coverage<ref>Note: This is a naïve estimation (words for which an analysis is given / all words), "hidden unknown words" are not taken into account</ref> is calculated over a general corpus of the language, followed by Wikipedia in parentheses. |
|||
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production". |
|||
{|class="wikitable sortable" |
{| class="wikitable sortable" |
||
! Language !! File !! Paradigms !! Lemmata !! Coverage |
|||
|- |
|- |
||
!rowspan=2| name |
|||
| Welsh || [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-cy-en/apertium-cy-en.cy.dix.xml apertium-cy-en.cy.dix] || 689 || 10,058 || 95.8% (86.6%) |
|||
!rowspan=2| language |
|||
!rowspan=2| native name |
|||
!rowspan=2| grouping |
|||
!colspan=2 class="unsortable"| ISO 639 |
|||
!rowspan=2| formalism |
|||
!rowspan=2| state |
|||
!rowspan=2| stems |
|||
!rowspan=2| paradigms |
|||
!rowspan=2| coverage |
|||
!rowspan=2| location |
|||
!rowspan=2 class="unsortable"| primary authors |
|||
|-class="sortbottom" |
|||
! -2 |
|||
! -3 |
|||
|- |
|- |
||
| <code>[[apertium-cym]]</code> |
|||
| Breton || [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-br-fr.br.dix apertium-br-fr.br.dix] || 268 || 7,530 || 86.0% (79.6%) |
|||
|| [[Welsh]] |
|||
|| Cymraeg |
|||
|| [[Brythonic]] |
|||
||<code>cy</code> |
|||
|| <code>cym</code> |
|||
|| [[lttoolbox]] |
|||
|| production |
|||
|align="right"| {{#lst:Apertium-cy-en/stats|cy_stems}} |
|||
|align="right"| {{#lst:Apertium-cy-en/stats|cy_paradigms}} |
|||
|align="center"| ~{{:Apertium-cy-en/stats/average}}% |
|||
|| [[apertium-cy-en]] ([[trunk]]) |
|||
|| [[User:Francis_Tyers|Fran]], [[User:Jimregan|Jim]], donnek |
|||
|- |
|- |
||
| <code>[[apertium-bre]]</code> |
|||
| Cornish || [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/incubator/apertium-cy-kw.kw.dix apertium-cy-kw.kw.dix] || 133 || 322 || - |
|||
|| [[Breton]] |
|||
|| Brezhoneg |
|||
|| [[Brythonic]] |
|||
||<code>br</code> |
|||
|| <code>bre</code> |
|||
|| [[lttoolbox]] |
|||
|| working |
|||
|align="right"| {{#lst:Apertium-br-fr/stats|br_stems}} |
|||
|align="right"| {{#lst:Apertium-br-fr/stats|br_paradigms}} |
|||
|align="center"| ~{{:Apertium-br-fr/stats/average}}% |
|||
|| [[apertium-br-fr]] ([[trunk]]) |
|||
|| [[User:Francis_Tyers|Fran]], fulupjakez, guillaumebzh, drevalan |
|||
|- |
|- |
||
| <code>[[apertium-gle]]</code> |
|||
| Irish || [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-ga-gd/apertium-ga-gd.ga.dix apertium-ga-gd.ga.dix] || 1,425 || 9,083 || - |
|||
|| [[Irish]] |
|||
|| {{#lst:apertium-gle/stats|nativename}} |
|||
|| [[Goidelic]] |
|||
||<code>ga</code> |
|||
|| <code>gle</code> |
|||
|| [[lttoolbox]] |
|||
|| development |
|||
|align="right"| {{#lst:Apertium-gle/stats|stems}} |
|||
|align="right"| {{#lst:Apertium-gle/stats|paradigms}} |
|||
|align="center"| |
|||
|| {{#lst:apertium-gle/stats|location}} |
|||
|| {{#lst:apertium-gle/stats|authors}} |
|||
|- |
|- |
||
| <code>[[apertium-glv]]</code> |
|||
| Scottish Gaelic || [https://apertium.svn.sourceforge.net/svnroot/apertium/trunk/apertium-ga-gd/apertium-ga-gd.gd.dix apertium-ga-gd.gd.dix] || 1,917 || 8,315 || - |
|||
|| [[Manx]] |
|||
|| Gaelg |
|||
|| [[Goidelic]] |
|||
||<code>gv</code> |
|||
|| <code>glv</code> |
|||
|| [[lttoolbox]] |
|||
|| development |
|||
|align="right"| {{#lst:Apertium-ga-gv/stats|gv_stems}} |
|||
|align="right"| {{#lst:Apertium-ga-gv/stats|gv_paradigms}} |
|||
|align="center"| |
|||
|| [[apertium-ga-gv]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], [[User:Jimregan|Jim]], cos, skburke |
|||
|- |
|- |
||
| <code>[[apertium-gla]]</code> |
|||
|| [[Scottish Gaelic]] |
|||
|| Gàidhlig |
|||
|| [[Goidelic]] |
|||
||<code>gd</code> |
|||
|| <code>gla</code> |
|||
|| [[lttoolbox]] |
|||
|| prototype |
|||
|align="right"| {{#lst:Apertium-gla/stats|stems}} |
|||
|align="right"| {{#lst:Apertium-gla/stats|paradigms}} |
|||
|align="center"| |
|||
|| [[apertium-gla]] ([[languages]]) |
|||
|| [[User:Francis_Tyers|Fran]], [[User:Jimregan|Jim]], fulupjakez, jg18, skburke |
|||
|- |
|||
| <code>[[apertium-cor]]</code> |
|||
|| [[Cornish]] |
|||
|| Kernewek |
|||
|| [[Brythonic]] |
|||
||<code>kw</code> |
|||
|| <code>cor</code> |
|||
|| [[lttoolbox]] |
|||
|| prototype |
|||
|align="right"| {{#lst:Apertium-cy-kw/stats|kw_stems}} |
|||
|align="right"| {{#lst:Apertium-cy-kw/stats|kw_paradigms}} |
|||
|align="center"| |
|||
|| [[apertium-cy-kw]] ([[incubator]]) |
|||
|| [[User:Francis_Tyers|Fran]], [[User:Jimregan|Jim]] |
|||
|} |
|||
== Existing language pairs == |
|||
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk. |
|||
{| style="text-align: center;" class="wikitable dixtable" |
|||
|- style="background: #ececec" |
|||
! !! cym !! bre !! cor !! gle !! gla !! glv |
|||
|- |
|||
| '''cym''' || - || ''[[Apertium-br-cy|br-cy]]''<br>{{#lst:Apertium-br-cy/stats|br-cy_stems}} || || ''[[Apertium-cy-ga|cy-ga]]''<br>{{#lst:Apertium-cy-ga/stats|cy-ga_stems}} || || |
|||
|- |
|||
| '''bre''' || ''[[Apertium-br-cy|br-cy]]''<br>{{#lst:Apertium-br-cy/stats|br-cy_stems}} || - || || || || |
|||
|- |
|||
| '''cor''' || || || - || || || |
|||
|- |
|||
| '''gle''' || ''[[Apertium-cy-ga|cy-ga]]''<br>{{#lst:Apertium-cy-ga/stats|cy-ga_stems}} || || || - || [[Apertium-ga-gd|ga-gd]]<br>{{#lst:Apertium-ga-gd/stats|ga-gd_stems}} || ''[[Apertium-ga-gv|ga-gv]]''<br>{{#lst:Apertium-ga-gv/stats|ga-gv_stems}} |
|||
|- |
|||
| '''gla''' || || || || [[Apertium-ga-gd|ga-gd]]<br>{{#lst:Apertium-ga-gd/stats|ga-gd_stems}} || - || |
|||
|- |
|||
| '''glv''' || || || || ''[[Apertium-ga-gv|ga-gv]]''<br>{{#lst:Apertium-ga-gv/stats|ga-gv_stems}} || || - |
|||
|- |
|||
| || || || || || || |
|||
|- |
|||
| '''eng''' || '''[[Apertium-cy-en|cy-en]]'''<br>'''{{#lst:Apertium-cy-en/stats|cy-en_stems}}''' || || || ''[[Apertium-gle-eng|gle-eng]]''<br>{{#lst:Apertium-gle-eng/stats|gle-eng_stems}} || ''[[Apertium-en-gd|en-gd]]''<br>{{#lst:Apertium-en-gd/stats|en-gd_stems}} || ''[[Apertium-en-gv|en-gv]]''<br>{{#lst:Apertium-en-gv/stats|en-gv_stems}} |
|||
|- |
|||
| '''epo''' || || ''[[Apertium-eo-br|eo-br]]''<br>{{#lst:Apertium-eo-br/stats|eo-br_stems}} || || || || |
|||
|- |
|||
| '''fin''' || || || || ''[[Apertium-fin-gle|fin-gle]]''<br>{{#lst:Apertium-fin-gle/stats|fin-gle_stems}} || || |
|||
|- |
|||
| '''fra''' || || '''[[Apertium-br-fr|br-fr]]'''<br>'''{{#lst:Apertium-br-fr/stats|br-fr_stems}}''' || || || || |
|||
|- |
|||
| '''pol''' || || || || ''[[Apertium-pl-ga|pl-ga]]''<br>{{#lst:Apertium-pl-ga/stats|pl-ga_stems}} || || |
|||
|- |
|||
| '''spa''' || ''[[Apertium-cy-es|cy-es]]''<br>{{#lst:Apertium-cy-es/stats|cy-es_stems}} || ''[[Apertium-br-es|br-es]]''<br>{{#lst:Apertium-br-es/stats|br-es_stems}} || || || || |
|||
|} |
|} |
||
==Samples== |
==Samples== |
||
Article 1 of the Universal Declaration of Human Rights: |
|||
''All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.'' |
|||
{|class=wikitable |
{|class=wikitable |
||
! Language !! Text |
! Language !! Text |
||
|- |
|- |
||
|| Irish || Saolaítear na daoine uile saor agus comhionann ina ndínit agus ina gcearta. Tá bua an réasúin agus an choinsiasa acu agus dlíd iad féin d'iompar de mheon bráithreachas i leith a chéile. |
|||
| Welsh || Genir pawb yn rhydd ac yn gydradd â’i gilydd mewn urddas a hawliau. Fe’u cynysgaeddir â rheswm a chydwybod, a dylai pawb ymddwyn y naill at y llall mewn ysbryd cymodlon. |
|||
|- |
|||
|| Manx || Ta dagh ooilley pheiagh ruggit seyr as corrym ayns ard-cheim as kiartyn. Ren Jee feoiltaghey resoon as cooinsheanse orroo as by chair daue ymmyrkey ry cheilley myr braaraghyn. |
|||
|- |
|||
|| Scottish Gaelic || Tha gach uile dhuine air a bhreth saor agus co-ionnan ann an urram 's ann an còirichean. Tha iad air am breth le reusan is le cogais agus mar sin bu chòir dhaibh a bhith beò nam measg fhein ann an spiorad bràthaireil. |
|||
|- |
|||
|| Breton || Dieub ha par en o dellezegezh hag o gwirioù eo ganet an holl dud. Poell ha skiant zo dezho ha dleout a reont bevañ an eil gant egile en ur spered a genvreudeuriezh. |
|||
|- |
|||
|| Cornish || Pub den oll yw genys frank ha kehaval yn dynita ha gwiryow. Yth yns i enduys gans reson ha cowses hag y tal dhedhans gwul dhe udn orth y gila yn spyrys a vredereth. |
|||
|- |
|||
|| Welsh || Genir pawb yn rhydd ac yn gydradd â'i gilydd mewn urddas a hawliau. Fe'u cynysgaeddir â rheswm a chydwybod, a dylai pawb ymddwyn y naill at y llall mewn ysbryd cymodlon. |
|||
|} |
|||
This article uses material from the Wikipedia article [https://en.wikipedia.org/wiki/Celtic_languages "Celtic languages"], which is released under the [http://creativecommons.org/licenses/by-sa/3.0/ Creative Commons Attribution-Share-Alike License 3.0]. |
|||
==Vulnerability== |
|||
This table summarizes the vulnerability of various Celtic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, [http://www.unesco.org/culture/languages-atlas http://www.unesco.org/culture/languages-atlas]’ and [http://www.ethnologue.com/ Ethnologue]. |
|||
{| class="wikitable sortable" |
|||
!rowspan=2| Language |
|||
!rowspan=2| ISO639-3 |
|||
!rowspan=2| Location |
|||
!rowspan=2| Speakers |
|||
!colspan=2|Status |
|||
|-class="sortbottom" |
|||
! Ethnologue |
|||
! UNESCO |
|||
|- |
|- |
||
|| Cornish |
|||
| Cornish<ref>In Kemmyn, not in SWF</ref> || Yma pub den genys frank hag equal yn dynyta hag yn gwyryow. Ymons y enduys gans reson ha keskans hag y tal dhedhans omdhon an eyl orth y gela yn sperys a vredereth. |
|||
|align="center"| <code>[http://www.ethnologue.com/language/cor cor]</code> |
|||
|| United Kingdom of Great Britain and Northern Ireland |
|||
|align="right"| 0 |
|||
|| 9 (Dormant) |
|||
|| 4 (Critically endangered) |
|||
|- |
|- |
||
|| Manx |
|||
| Breton || Dieub ha par en o dellezegezh hag o gwirioù eo ganet an holl dud. Poell ha skiant zo dezho ha dleout a reont bevañ an eil gant egile en ur spered a genvreudeuriezh. |
|||
|align="center"| <code>[http://www.ethnologue.com/language/glv glv]</code> |
|||
|| Isle of Man & United Kingdom of Great Britain and Northern Ireland |
|||
|align="right"| 0 |
|||
|| 8b (Nearly extinct) |
|||
|| 4 (Critically endangered) |
|||
|- |
|- |
||
|| Breton |
|||
|align="center"| <code>[http://www.ethnologue.com/language/bre bre]</code> |
|||
|| France |
|||
|align="right"| 225,000 |
|||
|| 8a (Moribund) |
|||
|| 3 (Severely endangered) |
|||
|- |
|- |
||
|| Irish |
|||
| Irish || Saoláitear na daoine uile saor agus comhionann ina ndínit agus ina gcearta. Tá bauidh an réasúin agus an choinsiasa acu agus dlíd iad féin d’iompar de mheon bhráithreachais i leith a chéile. |
|||
|align="center"| <code>[http://www.ethnologue.com/language/gle gle]</code> |
|||
|| Ireland, United Kingdom of Great Britain and Northern Ireland |
|||
|align="right"| 106,210 |
|||
|| 6b (Threatened) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|- |
||
|| Gaelic, Scottish |
|||
| Scottish Gaelic || Tha gach uile dhuine air a bhreth saor agus co-ionnan ann an urram ’s ann an còirichean. Tha iad air am breth le reusan is le cogais agus mar sin bu chòir dhaibh a bhith beò nam measg fhein ann an spiorad bràthaireil. |
|||
|align="center"| <code>[http://www.ethnologue.com/language/gla gla]</code> |
|||
|| United Kingdom of Great Britain and Northern Ireland |
|||
|align="right"| 63,130 |
|||
|| 4 (Educational) |
|||
|| 2 (Definitely endangered) |
|||
|- |
|- |
||
|| Welsh |
|||
|align="center"| <code>[http://www.ethnologue.com/language/cym cym]</code> |
|||
|| United Kingdom of Great Britain and Northern Ireland |
|||
|align="right"| 536,890 |
|||
|| 2 (Provincial) |
|||
|| 1 (Vulnerable) |
|||
|} |
|} |
||
Line 50: | Line 236: | ||
[[Category:Languages]] |
[[Category:Languages]] |
||
[[Category:Documentation in English]] |
|||
[[Category:Celtic languages]] |
Latest revision as of 06:04, 23 December 2014
The Celtic languages (cel
) include Welsh (cy
), Breton (br
), Cornish (kw
), Irish (ga
), Manx (gv
), and Scottish Gaelic (gd
). Most commonly spoken on the north-western edge of Europe, the languages are related with varying levels of mutual intelligibility.
The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.
Status[edit]
The ultimate goal is to have multi-purposable transducers for a variety of Celtic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.
Transducers[edit]
Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".
name | language | native name | grouping | ISO 639 | formalism | state | stems | paradigms | coverage | location | primary authors | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
-2 | -3 | |||||||||||
apertium-cym
|
Welsh | Cymraeg | Brythonic | cy
|
cym
|
lttoolbox | production | 11,015 | 702 | ~91.2% | apertium-cy-en (trunk) | Fran, Jim, donnek |
apertium-bre
|
Breton | Brezhoneg | Brythonic | br
|
bre
|
lttoolbox | working | 18,353 | 463 | ~88.6% | apertium-br-fr (trunk) | Fran, fulupjakez, guillaumebzh, drevalan |
apertium-gle
|
Irish | Gaeilge | Goidelic | ga
|
gle
|
lttoolbox | development | 8,768 | 1,155 | apertium-ga-gd (nursery), apertium-gle (incubator) | Fran, Jim, fulupjakez, skburke | |
apertium-glv
|
Manx | Gaelg | Goidelic | gv
|
glv
|
lttoolbox | development | 11,353 | 312 | apertium-ga-gv (incubator) | Fran, Jim, cos, skburke | |
apertium-gla
|
Scottish Gaelic | Gàidhlig | Goidelic | gd
|
gla
|
lttoolbox | prototype | 117 | 77 | apertium-gla (languages) | Fran, Jim, fulupjakez, jg18, skburke | |
apertium-cor
|
Cornish | Kernewek | Brythonic | kw
|
cor
|
lttoolbox | prototype | 322 | 132 | apertium-cy-kw (incubator) | Fran, Jim |
Existing language pairs[edit]
Text in italic denotes language pairs under development / in the incubator. Regular text denotes a functioning language pair in staging, while text in bold denotes a stable well-working language pair in trunk.
cym | bre | cor | gle | gla | glv | |
---|---|---|---|---|---|---|
cym | - | br-cy 1,181 |
cy-ga ? |
|||
bre | br-cy 1,181 |
- | ||||
cor | - | |||||
gle | cy-ga ? |
- | ga-gd 7,877 |
ga-gv 22,897 | ||
gla | ga-gd 7,877 |
- | ||||
glv | ga-gv 22,897 |
- | ||||
eng | cy-en 11,608 |
gle-eng 1,598 |
en-gd 862 |
en-gv 40 | ||
epo | eo-br 3,722 |
|||||
fin | fin-gle 181 |
|||||
fra | br-fr 27,988 |
|||||
pol | pl-ga 56 |
|||||
spa | cy-es 8,798 |
br-es 11,760 |
Samples[edit]
Article 1 of the Universal Declaration of Human Rights:
All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Language | Text |
---|---|
Irish | Saolaítear na daoine uile saor agus comhionann ina ndínit agus ina gcearta. Tá bua an réasúin agus an choinsiasa acu agus dlíd iad féin d'iompar de mheon bráithreachas i leith a chéile. |
Manx | Ta dagh ooilley pheiagh ruggit seyr as corrym ayns ard-cheim as kiartyn. Ren Jee feoiltaghey resoon as cooinsheanse orroo as by chair daue ymmyrkey ry cheilley myr braaraghyn. |
Scottish Gaelic | Tha gach uile dhuine air a bhreth saor agus co-ionnan ann an urram 's ann an còirichean. Tha iad air am breth le reusan is le cogais agus mar sin bu chòir dhaibh a bhith beò nam measg fhein ann an spiorad bràthaireil. |
Breton | Dieub ha par en o dellezegezh hag o gwirioù eo ganet an holl dud. Poell ha skiant zo dezho ha dleout a reont bevañ an eil gant egile en ur spered a genvreudeuriezh. |
Cornish | Pub den oll yw genys frank ha kehaval yn dynita ha gwiryow. Yth yns i enduys gans reson ha cowses hag y tal dhedhans gwul dhe udn orth y gila yn spyrys a vredereth. |
Welsh | Genir pawb yn rhydd ac yn gydradd â'i gilydd mewn urddas a hawliau. Fe'u cynysgaeddir â rheswm a chydwybod, a dylai pawb ymddwyn y naill at y llall mewn ysbryd cymodlon. |
This article uses material from the Wikipedia article "Celtic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.
Vulnerability[edit]
This table summarizes the vulnerability of various Celtic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.
Language | ISO639-3 | Location | Speakers | Status | |
---|---|---|---|---|---|
Ethnologue | UNESCO | ||||
Cornish | cor
|
United Kingdom of Great Britain and Northern Ireland | 0 | 9 (Dormant) | 4 (Critically endangered) |
Manx | glv
|
Isle of Man & United Kingdom of Great Britain and Northern Ireland | 0 | 8b (Nearly extinct) | 4 (Critically endangered) |
Breton | bre
|
France | 225,000 | 8a (Moribund) | 3 (Severely endangered) |
Irish | gle
|
Ireland, United Kingdom of Great Britain and Northern Ireland | 106,210 | 6b (Threatened) | 2 (Definitely endangered) |
Gaelic, Scottish | gla
|
United Kingdom of Great Britain and Northern Ireland | 63,130 | 4 (Educational) | 2 (Definitely endangered) |
Welsh | cym
|
United Kingdom of Great Britain and Northern Ireland | 536,890 | 2 (Provincial) | 1 (Vulnerable) |