Difference between revisions of "Talk:German to English"

From Apertium
Jump to navigation Jump to search
Line 30: Line 30:
   
 
::Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - [[User:Francis Tyers|Francis Tyers]] 07:15, 18 October 2011 (UTC)
 
::Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - [[User:Francis Tyers|Francis Tyers]] 07:15, 18 October 2011 (UTC)
  +
  +
==Order of symbols==
   
 
Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:
 
Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:
Line 50: Line 52:
 
There are fewer nouns than expected because my script only got the ones with the [http://en.wiktionary.org/wiki/Template:de-noun de-noun template] and not the <code>infl|de|noun</code> template.
 
There are fewer nouns than expected because my script only got the ones with the [http://en.wiktionary.org/wiki/Template:de-noun de-noun template] and not the <code>infl|de|noun</code> template.
   
  +
==Auxiliary verbs==
<hr>
 
  +
 
Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this:
 
Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this:
 
<pre>
 
<pre>
Line 71: Line 74:
 
:Does "werden" classify as <code>vbaux</code>?
 
:Does "werden" classify as <code>vbaux</code>?
   
  +
:We use <code>vaux</code>, but it can be I guess. - [[User:Francis Tyers|Francis Tyers]] 22:03, 25 October 2011 (UTC)
<hr>
 
  +
  +
==Personal pronouns==
  +
 
Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this:
 
Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this:
 
<pre>
 
<pre>
Line 96: Line 102:
 
is this correct?
 
is this correct?
   
  +
:Yes, that's fine. This stuff is really easy to change later anyway. - [[User:Francis Tyers|Francis Tyers]] 22:03, 25 October 2011 (UTC)
<hr>
 
  +
  +
  +
==Adjectives==
  +
 
German adjectives are confusing, any tips on how to treat them would be appreciated :)
 
German adjectives are confusing, any tips on how to treat them would be appreciated :)
   
  +
==Ordinals==
<hr>
 
 
Question: should ordinal numbers be treated as determiners or adjectives? Some examples in other languages:
 
Question: should ordinal numbers be treated as determiners or adjectives? Some examples in other languages:
 
<pre>
 
<pre>
Line 108: Line 118:
 
</pre>
 
</pre>
   
  +
:It doesn't really matter either way. What do the traditional grammars say ? - [[User:Francis Tyers|Francis Tyers]] 22:03, 25 October 2011 (UTC)
<hr>
 
  +
  +
==Contractions==
  +
 
Question: how should prepositional articles (preposition + article) be treated? e.g.
 
Question: how should prepositional articles (preposition + article) be treated? e.g.
 
<pre>
 
<pre>
Line 124: Line 137:
 
</pre>
 
</pre>
 
is this correct?
 
is this correct?
  +
  +
:Yes, this is correct. But I wouldn't bother to do this in the speling file. Use the speling file mainly for the open categories. - [[User:Francis Tyers|Francis Tyers]] 22:03, 25 October 2011 (UTC)

Revision as of 22:03, 25 October 2011

What's the best approach to start adding entries to the German monodix?

A good way would be to stat writing a script to download Wiktionary entries for German nouns and converting them into speling format, e.g.
http://en.wiktionary.org/wiki/Bett#Declension
http://en.wiktionary.org/wiki/Haus#Declension
Bett; Bett; sg.nom; n.nt
Bett; Bettes; sg.gen; n.nt
Bett; Betts; sg.gen; n.nt
Bett; Bett; sg.dat; n.nt
Bett; Bett; sg.acc; n.nt
Bett; Betten; pl.nom; n.nt
Bett; Betten; pl.gen; n.nt 
Bett; Betten; pl.dat; n.nt
Bett; Betten; pl.acc; n.nt
Haus; Haus; sg.nom; n.nt
Haus; Hauses; sg.gen; n.nt
Haus; Haus; sg.gen; n.nt
Haus; Haus; sg.dat; n.nt
Haus; Haus; sg.acc; n.nt
Haus; Häuser; pl.nom; n.nt
Haus; Häuser; pl.gen; n.nt
Haus; Häusern; pl.dat; n.nt
Haus; Häuser; pl.acc; n.nt
There are around 15,000 entries in the category German nouns, so that should be a good start. - Francis Tyers 07:13, 18 October 2011 (UTC)
Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - Francis Tyers 07:15, 18 October 2011 (UTC)

Order of symbols

Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:

Apfel<n><m><nom><sg>
Apfel<n><m><acc><sg>
Apfel<n><m><dat><sg>
<PoS><gender><number><case> - for lack of a better phrase, that's the order of inherency, plus it's easier to work with. Much easier. -- Jimregan 15:39, 19 October 2011 (UTC)
Also, listing 'viele' as the plural of 'ein' is dubious, and will more than likely cause problems. Treat them as separate words -- Jimregan 15:53, 19 October 2011 (UTC)
I started a stub at Tag_order on this, but it's not very complete. --unhammer 07:05, 20 October 2011 (UTC)

Here's a repository with some initial progress (sorry for the delay, I was out of town last week):

https://github.com/elaichi/apertium-de-en-dev

There are fewer nouns than expected because my script only got the ones with the de-noun template and not the infl|de|noun template.

Auxiliary verbs

Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this:

bin/sein<vbser><pres><p1><sg>
bist/sein<vbser><pres><p2><sg>
ist/sein<vbser><pres><p3><sg>
sind/sein<vbser><pres><p1><pl>
sind/sein<vbser><pres><p3><pl>
habe/haben<vbhaver><pres><p1><sg>
hast/haben<vbhaver><pres><p2><sg>
hat/haben<vbhaver><pres><p3><sg>
haben/haben<vbhaver><pres><p1><pl>
haben/haben<vbhaver><pres><p3><pl>
haben/haben<vbhaver><inf>
...

and mark those six modal verbs with the vbmod tag. But doing this would leave the vbaux tag unused, is this correct?

Does "werden" classify as vbaux?
We use vaux, but it can be I guess. - Francis Tyers 22:03, 25 October 2011 (UTC)

Personal pronouns

Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this:

ich/ich<prn><p1><mf><sg><nom>  
mich/ich<prn><p1><mf><sg><acc> 
mir/ich<prn><p1><mf><sg><dat>  

du/du<prn><p2><mf><sg><nom>    
dich/du<prn><p2><mf><sg><acc>  
dir/du<prn><p2><mf><sg><dat>   

er/er<prn><p3><m><sg><nom>     
ihn/er<prn><p3><m><sg><acc>    
ihm/er<prn><p3><m><sg><dat>    
...

this is, as opposed to using prpers as in:

I/prpers<prn><subj><p1><mf><sg>
me/prpers<prn><obj><p1><mf><sg>

is this correct?

Yes, that's fine. This stuff is really easy to change later anyway. - Francis Tyers 22:03, 25 October 2011 (UTC)


Adjectives

German adjectives are confusing, any tips on how to treat them would be appreciated :)

Ordinals

Question: should ordinal numbers be treated as determiners or adjectives? Some examples in other languages:

fifth<det><ord><sp>  # english
quinto<det><ord><m><sg>  # spanish
fimmti<adj><ord><m><sg><nom>  # icelandic
vijfde<det><ord><sp>  # dutch
It doesn't really matter either way. What do the traditional grammars say ? - Francis Tyers 22:03, 25 October 2011 (UTC)

Contractions

Question: how should prepositional articles (preposition + article) be treated? e.g.

am = an + dem
aufs = auf + das
beim = bei + dem
im = in + dem
vom = von + dem

my guess is that it should follow the treatment in other languages, e.g.

al/a<pr>+el<det><def><m><sg>  # spanish
del/de<pr>+el<det><def><m><sg>  # spanish
au/à<pr>+le<det><def><m><sg>  # french

is this correct?

Yes, this is correct. But I wouldn't bother to do this in the speling file. Use the speling file mainly for the open categories. - Francis Tyers 22:03, 25 October 2011 (UTC)