Apertium has moved from SourceForge to GitHub.
If you have any questions, please come and talk to us on #apertium on irc.freenode.net or contact the GitHub migration team.

Germanic languages

From Apertium
Jump to: navigation, search

Contents

The Germanic languages (gem) constitute a branch of the Indo-European language family spoken primarily in Europe, Anglo-America and Australasia. The common ancestor of all the languages is called Proto-Germanic, which was spoken approximately in the mid-1st millenium BC in Iron Age northern Europe. Of the over 50 different Germanic languages, the most widely spoken are English, German, and Dutch with over 450 million speakers in total.

The master plan involves generating independent finite-state transducers for each language, and then making individual dictionaries and transfer rules for every pair. The current status of these goals is listed below.

[edit] Status

The ultimate goal is to have multi-purposable transducers for a variety of Germanic languages. These can then be paired for X→Y translation with the addition of a CG for language X and transfer rules / dictionary for the pair X→Y. Below is listed development progress for each language's transducers and dictionary pairs.

[edit] Transducers

Once a transducer has ~80% coverage on a range of medium-large corpora we can say it is "working". Over 90% and it can be considered to be "production".

name language native name ISO 639 formalism state stems paradigms coverage location primary authors
-2 -3
apertium-nno Nynorsk nynorsk nn nno lttoolbox production 131,774 1,021 apertium-nno (languages) Fran, Trondtr, Unhammer
apertium-nob Bokmål bokmål nb nob lttoolbox production 192,623 1,084 apertium-nob (languages) Fran, Trondtr, Unhammer
apertium-dan Danish dansk da dan lttoolbox production 52,133 626 apertium-dan (languages) Fran, JacobEo, Jonas
apertium-eng English English en eng lttoolbox production 59536 391 apertium-eng (languages) Fran, marthab08, hrafn65, hloftsson, olafurw
apertium-nld Dutch Nederlands nl nld lttoolbox production 25,079 1,095 apertium-nld (languages) Fran, Teirlynck, Otte, Naudé
apertium-afr Afrikaans Afrikaans af afr lttoolbox production apertium-en-af (staging) Fran, winterstream
apertium-deu German Deutsch de deu lttoolbox working 74,339 1,427 apertium-deu (incubator) Fran, ebenimeli, Jim Regan
apertium-swe Swedish svenska sv swe lttoolbox working 138,490 1,834 apertium-swe (languages)  ?
apertium-isl Icelandic íslenska is isl lttoolbox development 8,770 1,878 apertium-isl (languages) Fran, Loftsson, Brandt, Sigurþórsson
apertium-sco Scots Scots - sco lttoolbox development apertium-eng-sco (incubator) Jim Regan
apertium-fao Faroese føroyskt fo fao lttoolbox development 2,318 278 apertium-fao (languages) Trondtr
apertium-fry West Frisian Frysk fy fry lttoolbox prototype apertium-nld-fry (nursery) Fran
apertium-yid Yiddish ייִדיש yi yid hfst prototype 378 ~62.5% apertium-yid (incubator) Jonathan

[edit] Pairs

Some Germanic languages that are particularly similar to one another (and hence have high levels of mutual intelligibility) include those in the following list:

[edit] Table of Existing Pairs

Text in italics denotes language pairs in the incubator. Regular text denotes a developing language pair in nursery, while text in bold denotes a stable well-working language pair in trunk and text in bold and italics denotes a pair in staging. Bidix stems as counted with dixcounter are displayed below.

dan nor swe fao isl deu nld afr fry eng nob nno sco
dan - dan-nor
54,742
swe-dan
21,791
fao-dan
?
isl-dan
971
deu-dan
8,152
da-en
1,627
nor dan-nor
54,742
- swe-nor
66,856
nor-eng
24,387
swe swe-dan
21,791
swe-nor
66,856
- is-sv
7,729
deu-swe
1,893
fao fao-dan
?
- fo-is
896
fo-nb
?
isl isl-dan
971
is-sv
7,729
fo-is
896
- isl-eng
21,905
deu deu-dan
8,152
deu-swe
1,893
- deu-nld
11,768
en-de
569
nld deu-nld
11,768
- af-nl
6,263
nld-fry
205
en-nl
3,442
afr af-nl
6,263
- en-af
6,590
fry nld-fry
205
-
eng da-en
1,627
nor-eng
24,387
isl-eng
21,905
en-de
569
en-nl
3,442
en-af
6,590
- eng-sco
172
nob fo-nb
?
- nno-nob
69,479
nno nno-nob
69,479
-
sco eng-sco
172
-
asm asm-eng
?
ben bn-en
7,497
bul bg-en
10,242
cat en-ca
35,873
cym cy-en
11,608
ell ell-eng
830
epo eo-sv
4,013
eo-de
28,656
eo-nl
18,027
eo-en
32,337
est est-nor
?
eus eu-en
14,356
fin fin-swe
1,893
fin-isl
1,000
fin-deu
7037
fin-eng
159,257
fra fr-nl
1,744
en-fr
13,455
gla en-gd
862
gle gle-eng
1,598
glg en-gl
30,049
glv en-gv
40
hat ht-en
1,985
hbs hbs-eng
16,228
hin eng-hin
39,242
hun hun-eng
1,253
hye hye-eng
12,218
ita ita-nor
?
en-it
21,067
kaz eng-kaz
16,931
kir ky-en
?
kmr kmr-eng
15,563
lat la-en
131
lav en-lv
?
lit en-lt
96
ltz deu-ltz
?
lvs eng-lvs
2,694
mal mal-eng
9,095
mar mar-eng
102
mfe mfe-en
35
mkd mk-en
33,350
mlt en-mt
814
nep ne-en
163
pes eng-pes
?
pol en-pl
6,541
por en-pt
6,828
rus isl-rus
52
en-ru
?
sah sah-eng
?
sin si-en
80
sjo sjo-eng
?
slv sl-en
313
sma sma-nob
?
sme sme-deu
6
sme-nob
74,763
smj smj-nob
?
spa spa-deu
?
en-es'
'
sqi en-sq
580
swa sw-en
15
tel eng-tel
1
tgk tg-en
?
tha tha-eng
?
tur tr-en
171
vie vi-en
33
zlm zlm-eng
?

[edit] Classification

All living Germanic languages belong either to the West Germanic or to the North Germanic branch:

[edit] Samples

Article 1 of the Universal Declaration of Human Rights:

All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.

Language Text
Danish Alle mennesker er født frie og lige i værdighed og rettigheder. De er udstyret med fornuft og samvittighed, og de bør handle mod hverandre i en broderskabets ånd.
Swedish Alla människor äro födda fria och lika i värde och rättigheter. De äro utrustade med förnuft och samvete och böra handla gentemot varandra i en anda av broderskap.
Faroese Øll menniskju eru fødd fræls og jøvn til virðingar og mannarættindi. Tey hava skil og samvitsku og eiga at fara hvørt um annað í bróðuranda.
Icelandic Hver maður er borinn frjáls og jafn öðrum að virðingu og réttindum. Menn eru gæddir vitsmunum og samvizku, og ber þeim að breyta bróðurlega hverjum við annan.
English All human beings are born free and equal in dignity and rights. They are endowed with reason and conscience and should act towards one another in a spirit of brotherhood.
Scots Aw human sowels is born free and equal in dignity and richts. They are tochered wi mense and conscience and shuld guide theirsels ane til ither in a speirit o britherheid.
Luxembourgeois All Mënsch kënnt fräi a mat deer selwechter Dignitéit an dene selwechte Rechter op d'Welt. Jiddereen huet säi Verstand a säi Gewësse krut an soll an engem Geescht vu Bridderlechkeet denen anere géintiwwer handelen.
Yiddish, Eastern יעדער מענטש װערט געבױרן פֿרײַ און גלײַך אין כּבֿוד און רעכט. יעדער װערט באַשאָנקן מיט פֿאַרשטאַנד און געװיסן; יעדער זאָל זיך פֿירן מיט אַ צװײטן אין אַ געמיט פֿון ברודערשאַפֿט.

(Yeder mentš vert geboyrn fray un glayx in koved un rext. Yeder vert bašonkn mit farštand un gevisn; yeder zol zix firn mit a cvaytn in a gemit fun bruderšaft.)

Afrikaans Alle menslike wesens word vry, met gelyke waardigheid en regte, gebore. Hulle het rede en gewete en behoort in die gees van broederskap teenoor mekaar op te tree.
Dutch Alle mensen worden vrij en gelijk in waardigheid en rechten geboren. Zij zijn begiftigd met verstand en geweten, en behoren zich jegens elkander in een geest van broederschap te gedragen.
Saxon, Low All de Minschen sünd frie un gliek an Wüürd un Rechten baren. Se hebbt Vernunft un een Geweten un se schüllt sik Bröder sien.
Norwegian, Nynorsk Alle menneske er fødde til fridom og med same menneskeverd og menneskerettar. Dei har fått fornuft og samvit og skal leve med kvarandre som brør.
Norwegian, Bokmaal Alle mennesker er født frie og med samme menneskeverd og menneskerettigheter. De er utstyrt med fornuft og samvittighet og bør handle mot hverandre i brorskapets ånd.

[edit] Vulnerability

This table summarizes the vulnerability of various Germanic languages. Data is derived from the ‘Atlas of the World’s Languages in Danger, © UNESCO, http://www.unesco.org/culture/languages-atlas’ and Ethnologue.

Language ISO639-3 Location Speakers Status
Ethnologue UNESCO
Frankish frk Germany 0 10 (Extinct) -
Wymysorys wym Poland 70 8b (Nearly extinct) 3 (Severely endangered)
Frisian, Eastern frs Germany 5,120 8a (Moribund) -
Saterfriesisch stq Germany 1,000 7 (Shifting) 3 (Severely endangered)
German, Colonia Tovar gct Venezuela 1,500 7 (Shifting) -
Yiddish, Western yih Germany 5,400 7 (Shifting) -
Frisian, Northern frr Germany 10,000 7 (Shifting) 3 (Severely endangered)
Hunsrik hrx Brazil 3,000,000 7 (Shifting) -
Walser wae Switzerland 22,780 6b (Threatened) -
Vlaams vls Belgium 1,204,000 6b (Threatened) -
Mócheno mhn Italy 1,900 6a (Vigorous) 2 (Definitely endangered)
Cimbrian cim Italy 2,230 6a (Vigorous) 2 (Definitely endangered)
Silesian, Lower sli Poland 22,900 6a (Vigorous) -
Hutterisch geh Canada 40,000 6a (Vigorous) -
Kölsch ksh Germany 250,000 6a (Vigorous) -
Jutish jut Germany, Denmark  ? 6a (Vigorous) 2 (Definitely endangered)
Pfaelzisch pfl Germany, France  ? 6a (Vigorous) 1 (Vulnerable)
Saxon, Low nds Germany 1,000 5 (Developing) -
German, Pennsylvania pdc United States 133,000 5 (Developing) -
Zeeuws zea Netherlands 220,000 5 (Developing) -
Plautdietsch pdt Canada & Ukraine 394,900 5 (Developing) 2 (Definitely endangered)
Gronings gos Netherlands 592,000 5 (Developing) -
Swabian swg Germany 819,000 5 (Developing) -
Saxon, Upper sxu Germany 2,000,000 5 (Developing) -
Mainfränkisch vmf Germany, Czech Republic 4,910,000 5 (Developing) 1 (Vulnerable)
German, Swiss gsw Switzerland 6,469,000 5 (Developing) -
Bavarian bar Germany, Austria, Hungary, Italy, Switzerland, Czech Republic 13,259,000 5 (Developing) 1 (Vulnerable)
Achterhoeks act Netherlands  ? 5 (Developing) -
Drents drt Netherlands  ? 5 (Developing) -
Sallands sdz Netherlands  ? 5 (Developing) -
Stellingwerfs stl Netherlands  ? 5 (Developing) -
Twents twd Netherlands  ? 5 (Developing) -
Veluws vel Netherlands  ? 5 (Developing) -
Westphalien wep Germany  ? 5 (Developing) -
Scots sco United Kingdom of Great Britain and Northern Ireland 100,000 4 (Educational) 1 (Vulnerable)
Luxembourgeois ltz Germany, Belgium, France, Luxembourg 320,710 4 (Educational) 1 (Vulnerable)
Yiddish, Eastern ydd Israel & Germany, Austria, Belarus, Belgium, Denmark, Estonia, Finland, France, Hungary, Italy, Latvia, Lithuania, Luxembourg, Republic of Moldova, Norway, Netherlands, Poland, Romania, United Kingdom of Great Britain and Northern Ireland, Russian Federation, Slovakia, Sweden, Switzerland, Czech Republic, Ukraine 1,505,030 4 (Educational) 2 (Definitely endangered)
Faroese fao Denmark & Faroe Islands 66,150 2 (Provincial) 1 (Vulnerable)
Frisian, Western fry Netherlands 467,000 2 (Provincial) 1 (Vulnerable)
Limburgish lim Netherlands 1,300,000 2 (Provincial) -
Icelandic isl Iceland 243,840 1 (National) -
Norwegian nor Norway 4,741,780 1 (National) -
Afrikaans afr South Africa 4,949,410 1 (National) -
Danish dan Denmark 5,592,490 1 (National) -
Swedish swe Sweden 8,381,829 1 (National) 2 (Definitely endangered)
Dutch nld Netherlands 22,984,690 1 (National) -
German, Standard deu Germany 83,812,810 1 (National) -
English eng United Kingdom 334,800,758 1 (National) -

This article uses material from the Wikipedia article "Germanic languages", which is released under the Creative Commons Attribution-Share-Alike License 3.0.

[edit] See also

Personal tools