Difference between revisions of "Swedish and Danish"

From Apertium
Jump to navigation Jump to search
 
(29 intermediate revisions by 7 users not shown)
Line 1: Line 1:
{{TOCD}}
{{TOCD}}

Se introduktion til Apertium på dansk her: [[Dansk introduktion]].

== Pressemeddelese ==
From: Jacob Nordfalk <jacob.nordfalk@gm...> - 2009-10-12 13:29
http://sourceforge.net/mailarchive/message.php?msg_name=20cf28cd0910120629o572ede0i13542ee2737f2deb%40mail.gmail.com

Første Open Source maskinoversættelse mellem svensk og dansk

First open source machine translation between Swedish and Danish


=== Dansk (english version below) ===

Vi har netop frigivet version 0.5 af svensk-dansk
til open source maskinoversættessystemet Apertium.

Det er det første frie maskinoversættelsesystem mellem svensk og dansk.

Det kan allerede nu bruges fra http://apertium.org/, men forhåbentlig vil
fællesskabet omkring fri software tage det til sig og snart gøre det
tilgængeligt på bl.a. alle Linux-arbejdsstationer.

Til udviklingen har vi brugt et antal frit tilgængelige kilder, bl.a.
open source stavekontrollen Aspell, Den stora svenska ordlistan,
http://dsso.se og den svenske og danske Wikipedia og Wiktionary.


Udviklingen er sponsoreret af Google Summer of Code (GSOC) og foretaget
af student Michael Kristensen. Mentorer på projektet er Francis Tyers
(Universitat d'Alacant og Prompsit Language Engineering) og
Jacob Nordfalk (Ingeniørhøjskolen i København).

For nærmere oplysninger om udviklingen af sprogparret, se
http://wiki.apertium.org/wiki/Swedish_and_Danish

For mere information om apertium og GSOC, se
http://socghop.appspot.com/org/home/google/gsoc2009/apertium.


Tekniske specifikationer

Svensk morfologisk ordbog 5.230 ordrødder
Tosproget ordbog 6.854 ordrødder
Dansk morfologisk ordbog 10.694 ordrødder

Dækningen på Wikipedia-tekst er p.t. 72% og korpuset Europarl 80%.

Vi anvender 1-trins "shallow transfer" med 17 transferregler.

Vi har foretaget en sammenlignende vurdering med andre tilgængelige
maskinoversættelsessystemer på 65 sætninger fra Wikipedia.

Resultaterne findes nedenfor (lavest tal er bedst)

System Edit distance WER (Word Error Rate)
Apertium 353 31 %
Gramtrans 308 26 %
Google 415 35 %


Yderligere oplysninger kan findes i artiklen "Shallow-transfer rule-based
machine translation for Swedish to Danish" som vi vil præsentere på
First International Workshop on Free/Open-Source Rule-Based Machine
Translation (http://xixona.dlsi.ua.es/freerbmt09/).


For mere information, kontakt Jacob Nordfalk, Ingeniørhøjskolen i
København (jano@ih...), telefon 26206512.


=== English ===

A new language pair, Swedish-Danish, has been released for the
the free and open-source Apertium machine translator engine.

It's the first open source machine translator for Swedish and Danish.

The pair is immediately available for testing at http://apertium.org/,
but will hopefully adopted by the free-software community and be available
on i.a. the Linux desktop.

In developing this system, we used a number of freely available sources
of information for constructing the system, i.a. high coverage
spell-checkers
available in the aspell project, Den stora svenska ordlistan, http://dsso.se
and the Swedish and Danish Wikipedia and Wiktionaries.


This language pair was developed as part of a Google Summer of Code (GsoC)
project by Michael Kristensen, mentored by Francis Tyers (Universitat
d'Alacant and Prompsit Language Engineering) and Jacob Nordfalk
(Ingeniørhøjskolen i København).
For more information on Apertium and GsoC, see
http://socghop.appspot.com/org/home/google/gsoc2009/apertium .

Many thanks to Thyge Larsen for his assistance with post-edition and
evaluation.

For more details on development and the language pair, see
http://wiki.apertium.org/wiki/Swedish_and_Danish


Technical details

Swedish monolingual dictionary 5,230 lemmas
Bilingual dictionary 6,854 lemmas
Danish monolingual dictionary 10,694 lemmas

We measured coverage on Wikipedia to 72 % and the EuroParl corpus to 80 %.

The system used 1-stage shallow transfer with 17 transfer rules.

We have made a comparative evaluation to other available MT systems.
The results for 65 Wikipedia sentences can be found below

System Edit distance WER (Word Error Rate)
Apertium 353 31 %
Gramtrans 308 26 %
Google 415 35 %


Further details can be found in the article "Shallow-transfer rule-based
machine translation for Swedish to Danish" to be presented during the
First International Workshop on Free/Open-Source Rule-Based Machine
Translation (http://xixona.dlsi.ua.es/freerbmt09/).




For more information, pls. contact Jacob Nordfalk, Ingeniørhøjskolen i
København (jano@ih...), phone 26206512.

--
Jacob Nordfalk
एस्पेरान्तो के हो? http://www.esperanto.org.np/.
Memoraĵoj de KEF -. http://kef.saluton.dk/memorajoj/


=Swedish and Danish=

Swedish and Danish are closely related languages. Their differences are mainly found on the morphological level, the main lexicon is identical (or rather, very similar, with systematic differences), and the syntax is very similar. There are some differences, though.
Swedish and Danish are closely related languages. Their differences are mainly found on the morphological level, the main lexicon is identical (or rather, very similar, with systematic differences), and the syntax is very similar. There are some differences, though.


Line 14: Line 155:
:(sv) I går ''tvättade'' '''sig''' Peter äntligen
:(sv) I går ''tvättade'' '''sig''' Peter äntligen
:(da) I går ''vaskede'' Peter '''sig''' endelig
:(da) I går ''vaskede'' Peter '''sig''' endelig



=== NP structure ===
=== NP structure ===
Line 21: Line 161:
:(sv) Vita huset
:(sv) Vita huset
:(da) Det hvide hus
:(da) Det hvide hus
:(nb) Det hvite hus(et)
:(nb) Det hvite huset
:(nn) Det kvite huset
:(nn) Det kvite huset


Line 27: Line 167:


:(sv) Den stora utmaningen är att göra det rätta. Utmaningen er svår.
:(sv) Den stora utmaningen är att göra det rätta. Utmaningen er svår.
:(da) Den store udfordring er at gøre det rætte. Udfordringen er vanskelig.
:(da) Den store udfordring er at gøre det rette. Udfordringen er vanskelig.


=== Existential sentences ===
=== Existential sentences ===
Line 39: Line 179:
In N + RC constructions, where the relativised constituent is subject, Danish uses either ''som'' or ''der'' as relativiser, whereas Swedish has ''som'':
In N + RC constructions, where the relativised constituent is subject, Danish uses either ''som'' or ''der'' as relativiser, whereas Swedish has ''som'':


:(da) manden somer her (the man who is here)
:(da) manden '''som''' er her (the man who is here)
:(da) manden der er her (the man who is here)
:(da) manden '''der''' er her (the man who is here)
:(sv) mannen som är här
:(sv) mannen '''som''' är här


When the relativised constituent is the object, on the other hand, the relativiser must be ''som'', also in Danish:
When the relativised constituent is the object, on the other hand, the relativiser must be ''som'', also in Danish:


:(da) manden som jeg så (the man who I saw)
:(da) manden '''som''' jeg så (the man who I saw)
:(sv) mannen som jag såg (the man who I saw)
:(sv) mannen '''som''' jag såg (the man who I saw)

=== Passives ===

:(sv) Ytterligare prov '''kommer att tas''' under måndagen. (further test will be taken some time Monday)
:(da) Yderligere prøve '''vil blive taget''' i løbet af mandagen


== Grammatical words ==
== Grammatical words ==
Line 53: Line 198:
Danish and Swedish use more or less the same set of modal verbs, but with different meaning.
Danish and Swedish use more or less the same set of modal verbs, but with different meaning.


<pre>
:(allow) sv: Man får inte röka här = da: man må ikke ryge her = en: one is not allowed to smoke here
(allow) sv: Man får inte röka här
da: Man må ikke ryge her
en: one is not allowed to smoke here
</pre>

Some verbs also take different modals,

{|class="wikitable"
! Swedish !! Danish !! Modal !! Gloss !! Example
|-
| åka || tage || ha → være || to go ||
|-
| föra || føre || ha → være || to take || Två personer har förts → To personer er ført
|-
| komma || komme || ha → være || to come || Min fru har kommit → Min kone er kommet hjem
|-
|}

There is a list of the most frequent 250 verbs with the modal they take [http://fjern-uv.dk/250.pdf here].

== Morphology ==

=== Supine ===


==Resources==
* http://spraakbanken.gu.se/sal/eng/ -- GPL morph. for Swedish
* http://w3.msi.vxu.se/~nivre/research/Talbanken05.html (A 300,000-word tree-bank: it is in XML, all words are nicely tagged with PAROLE-style tags, and it should be easy to build a morphological analyser and a PoS tagger from it; authors are likely be happy to let us use it if we cite them).
* http://www.isv.cbs.dk/~mbk/treebank/ (Danish tree bank, 100,000-word, as above, under the GPL)
* http://www.ling.su.se/staff/sofia/suc/suc.html (Stockholm Umeå Corpus: 1,000,000 Swedish words, tagged; a license has to be granted by authors - it was used for apertium-sv-da)

* http://www.woxikon.se Ordbok for svenska<->engelsk, tysk, nederlandsk...
* http://ordbok.nada.kth.se/ "Tvärslå är en nordisk ordbok bestående av många sammanslagna ordböcker"
* http://www.klid.dk/dansk/ordlister/samling.html - Kelds samling a resurser

==See also==


* [[/Pending tests]]
* [[/Regression tests]]
* http://gramtrans.com/ - state of the art MT between Swedish and Danish (closed source)


==Further reading==
==Further reading==


* LUNDIN AKESSON Katarina (2003) "Constructions with låta, LET, reflexives and passive-s: a comment on some differences, similarities and related phenomena". ''Working papers in Scandinavian syntax'' ISSN 1100-097X
* LUNDIN AKESSON Katarina (2003) "Constructions with låta, LET, reflexives and passive-s: a comment on some differences, similarities and related phenomena". ''Working papers in Scandinavian syntax'' ISSN 1100-097X
* TYERS, Francis M.; NORDFALK, Jacob. [http://rua.ua.es/dspace/handle/10045/12024 "Shallow-transfer rule-based machine translation for Swedish to Danish"]. En: Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation / Edited by Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M. Tyers. Alicante : Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, 2009, pp. 27-33


[[Category:Language pairs]]
[[Category:Swedish and Danish]]

Latest revision as of 20:34, 29 October 2010

Se introduktion til Apertium på dansk her: Dansk introduktion.

Pressemeddelese[edit]

From: Jacob Nordfalk <jacob.nordfalk@gm...> - 2009-10-12 13:29 http://sourceforge.net/mailarchive/message.php?msg_name=20cf28cd0910120629o572ede0i13542ee2737f2deb%40mail.gmail.com

Første Open Source maskinoversættelse mellem svensk og dansk

First open source machine translation between Swedish and Danish


Dansk (english version below)[edit]

Vi har netop frigivet version 0.5 af svensk-dansk til open source maskinoversættessystemet Apertium.

Det er det første frie maskinoversættelsesystem mellem svensk og dansk.

Det kan allerede nu bruges fra http://apertium.org/, men forhåbentlig vil fællesskabet omkring fri software tage det til sig og snart gøre det tilgængeligt på bl.a. alle Linux-arbejdsstationer.

Til udviklingen har vi brugt et antal frit tilgængelige kilder, bl.a. open source stavekontrollen Aspell, Den stora svenska ordlistan, http://dsso.se og den svenske og danske Wikipedia og Wiktionary.


Udviklingen er sponsoreret af Google Summer of Code (GSOC) og foretaget af student Michael Kristensen. Mentorer på projektet er Francis Tyers (Universitat d'Alacant og Prompsit Language Engineering) og Jacob Nordfalk (Ingeniørhøjskolen i København).

For nærmere oplysninger om udviklingen af sprogparret, se http://wiki.apertium.org/wiki/Swedish_and_Danish

For mere information om apertium og GSOC, se http://socghop.appspot.com/org/home/google/gsoc2009/apertium.


Tekniske specifikationer

Svensk morfologisk ordbog       5.230 ordrødder
Tosproget ordbog                6.854 ordrødder
Dansk morfologisk ordbog       10.694 ordrødder

Dækningen på Wikipedia-tekst er p.t. 72% og korpuset Europarl 80%.

Vi anvender 1-trins "shallow transfer" med 17 transferregler.

Vi har foretaget en sammenlignende vurdering med andre tilgængelige maskinoversættelsessystemer på 65 sætninger fra Wikipedia.

Resultaterne findes nedenfor (lavest tal er bedst)

System     Edit distance      WER (Word Error Rate)
Apertium    353                31 %
Gramtrans   308                26 %
Google      415                35 %


Yderligere oplysninger kan findes i artiklen "Shallow-transfer rule-based machine translation for Swedish to Danish" som vi vil præsentere på First International Workshop on Free/Open-Source Rule-Based Machine Translation (http://xixona.dlsi.ua.es/freerbmt09/).


For mere information, kontakt Jacob Nordfalk, Ingeniørhøjskolen i København (jano@ih...), telefon 26206512.


English[edit]

A new language pair, Swedish-Danish, has been released for the the free and open-source Apertium machine translator engine.

It's the first open source machine translator for Swedish and Danish.

The pair is immediately available for testing at http://apertium.org/, but will hopefully adopted by the free-software community and be available on i.a. the Linux desktop.

In developing this system, we used a number of freely available sources of information for constructing the system, i.a. high coverage spell-checkers available in the aspell project, Den stora svenska ordlistan, http://dsso.se and the Swedish and Danish Wikipedia and Wiktionaries.


This language pair was developed as part of a Google Summer of Code (GsoC) project by Michael Kristensen, mentored by Francis Tyers (Universitat d'Alacant and Prompsit Language Engineering) and Jacob Nordfalk (Ingeniørhøjskolen i København). For more information on Apertium and GsoC, see http://socghop.appspot.com/org/home/google/gsoc2009/apertium .

Many thanks to Thyge Larsen for his assistance with post-edition and evaluation.

For more details on development and the language pair, see http://wiki.apertium.org/wiki/Swedish_and_Danish


Technical details

Swedish monolingual dictionary           5,230 lemmas
Bilingual dictionary                     6,854 lemmas
Danish monolingual dictionary           10,694 lemmas

We measured coverage on Wikipedia to 72 % and the EuroParl corpus to 80 %.

The system used 1-stage shallow transfer with 17 transfer rules.

We have made a comparative evaluation to other available MT systems. The results for 65 Wikipedia sentences can be found below

System     Edit distance      WER (Word Error Rate)
Apertium    353                31 %
Gramtrans   308                26 %
Google      415                35 %


Further details can be found in the article "Shallow-transfer rule-based machine translation for Swedish to Danish" to be presented during the First International Workshop on Free/Open-Source Rule-Based Machine Translation (http://xixona.dlsi.ua.es/freerbmt09/).



For more information, pls. contact Jacob Nordfalk, Ingeniørhøjskolen i København (jano@ih...), phone 26206512.

-- 
Jacob Nordfalk
एस्पेरान्तो के हो? http://www.esperanto.org.np/.
Memoraĵoj de KEF -. http://kef.saluton.dk/memorajoj/


Swedish and Danish[edit]

Swedish and Danish are closely related languages. Their differences are mainly found on the morphological level, the main lexicon is identical (or rather, very similar, with systematic differences), and the syntax is very similar. There are some differences, though.

Syntax[edit]

Particle order[edit]

Swedish keeps the verb together with a conjoined adverbial particle, where Danish separates them.

(sv) Vill du köra in bilen
(da) Vil du køre bilen ind

Swedish moves the reflexive pronoun sig along with the verb to V2 position, where Danish leaves it behind:

(sv) I går tvättade sig Peter äntligen
(da) I går vaskede Peter sig endelig

NP structure[edit]

Danish and Swedish have different NP patterns.

(sv) Vita huset
(da) Det hvide hus
(nb) Det hvite huset
(nn) Det kvite huset

In most NPs, Swedish has both the determiner den and the definite form of the noun. Danish, as always, cannot have both. Here, nn patterns with sv and nb with both sv and da (beware of non-idiomatic da, sv word choices, but the patterns are correct).

(sv) Den stora utmaningen är att göra det rätta. Utmaningen er svår.
(da) Den store udfordring er at gøre det rette. Udfordringen er vanskelig.

Existential sentences[edit]

Swedish can use "det" as an equivalent to the English "there", where Danish prefers "der",

(sv) Det kommer en bil
(da) Der kommer en bil

Relative clauses[edit]

In N + RC constructions, where the relativised constituent is subject, Danish uses either som or der as relativiser, whereas Swedish has som:

(da) manden som er her (the man who is here)
(da) manden der er her (the man who is here)
(sv) mannen som är här

When the relativised constituent is the object, on the other hand, the relativiser must be som, also in Danish:

(da) manden som jeg så (the man who I saw)
(sv) mannen som jag såg (the man who I saw)

Passives[edit]

(sv) Ytterligare prov kommer att tas under måndagen. (further test will be taken some time Monday)
(da) Yderligere prøve vil blive taget i løbet af mandagen

Grammatical words[edit]

Modal verbs[edit]

Danish and Swedish use more or less the same set of modal verbs, but with different meaning.

(allow) sv: Man får inte röka här 
        da: Man må  ikke ryge her 
        en: one is not allowed to smoke here

Some verbs also take different modals,

Swedish Danish Modal Gloss Example
åka tage ha → være to go
föra føre ha → være to take Två personer har förts → To personer er ført
komma komme ha → være to come Min fru har kommit → Min kone er kommet hjem

There is a list of the most frequent 250 verbs with the modal they take here.

Morphology[edit]

Supine[edit]

Resources[edit]

See also[edit]

Further reading[edit]

  • LUNDIN AKESSON Katarina (2003) "Constructions with låta, LET, reflexives and passive-s: a comment on some differences, similarities and related phenomena". Working papers in Scandinavian syntax ISSN 1100-097X
  • TYERS, Francis M.; NORDFALK, Jacob. "Shallow-transfer rule-based machine translation for Swedish to Danish". En: Proceedings of the First International Workshop on Free/Open-Source Rule-Based Machine Translation / Edited by Juan Antonio Pérez-Ortiz, Felipe Sánchez-Martínez, Francis M. Tyers. Alicante : Universidad de Alicante. Departamento de Lenguajes y Sistemas Informáticos, 2009, pp. 27-33