Talk:German to English

From Apertium
Revision as of 22:03, 25 October 2011 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Getting started

What's the best approach to start adding entries to the German monodix?

A good way would be to stat writing a script to download Wiktionary entries for German nouns and converting them into speling format, e.g.
http://en.wiktionary.org/wiki/Bett#Declension
http://en.wiktionary.org/wiki/Haus#Declension
Bett; Bett; sg.nom; n.nt
Bett; Bettes; sg.gen; n.nt
Bett; Betts; sg.gen; n.nt
Bett; Bett; sg.dat; n.nt
Bett; Bett; sg.acc; n.nt
Bett; Betten; pl.nom; n.nt
Bett; Betten; pl.gen; n.nt 
Bett; Betten; pl.dat; n.nt
Bett; Betten; pl.acc; n.nt
Haus; Haus; sg.nom; n.nt
Haus; Hauses; sg.gen; n.nt
Haus; Haus; sg.gen; n.nt
Haus; Haus; sg.dat; n.nt
Haus; Haus; sg.acc; n.nt
Haus; Häuser; pl.nom; n.nt
Haus; Häuser; pl.gen; n.nt
Haus; Häusern; pl.dat; n.nt
Haus; Häuser; pl.acc; n.nt
There are around 15,000 entries in the category German nouns, so that should be a good start. - Francis Tyers 07:13, 18 October 2011 (UTC)
Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - Francis Tyers 07:15, 18 October 2011 (UTC)

Order of symbols

Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:

Apfel<n><m><nom><sg>
Apfel<n><m><acc><sg>
Apfel<n><m><dat><sg>
<PoS><gender><number><case> - for lack of a better phrase, that's the order of inherency, plus it's easier to work with. Much easier. -- Jimregan 15:39, 19 October 2011 (UTC)
Also, listing 'viele' as the plural of 'ein' is dubious, and will more than likely cause problems. Treat them as separate words -- Jimregan 15:53, 19 October 2011 (UTC)
I started a stub at Tag_order on this, but it's not very complete. --unhammer 07:05, 20 October 2011 (UTC)

Here's a repository with some initial progress (sorry for the delay, I was out of town last week):

https://github.com/elaichi/apertium-de-en-dev

There are fewer nouns than expected because my script only got the ones with the de-noun template and not the infl|de|noun template.

Auxiliary verbs

Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this:

bin/sein<vbser><pres><p1><sg>
bist/sein<vbser><pres><p2><sg>
ist/sein<vbser><pres><p3><sg>
sind/sein<vbser><pres><p1><pl>
sind/sein<vbser><pres><p3><pl>
habe/haben<vbhaver><pres><p1><sg>
hast/haben<vbhaver><pres><p2><sg>
hat/haben<vbhaver><pres><p3><sg>
haben/haben<vbhaver><pres><p1><pl>
haben/haben<vbhaver><pres><p3><pl>
haben/haben<vbhaver><inf>
...

and mark those six modal verbs with the vbmod tag. But doing this would leave the vbaux tag unused, is this correct?

Does "werden" classify as vbaux?
We use vaux, but it can be I guess. - Francis Tyers 22:03, 25 October 2011 (UTC)

Personal pronouns

Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this:

ich/ich<prn><p1><mf><sg><nom>  
mich/ich<prn><p1><mf><sg><acc> 
mir/ich<prn><p1><mf><sg><dat>  

du/du<prn><p2><mf><sg><nom>    
dich/du<prn><p2><mf><sg><acc>  
dir/du<prn><p2><mf><sg><dat>   

er/er<prn><p3><m><sg><nom>     
ihn/er<prn><p3><m><sg><acc>    
ihm/er<prn><p3><m><sg><dat>    
...

this is, as opposed to using prpers as in:

I/prpers<prn><subj><p1><mf><sg>
me/prpers<prn><obj><p1><mf><sg>

is this correct?

Yes, that's fine. This stuff is really easy to change later anyway. - Francis Tyers 22:03, 25 October 2011 (UTC)


Adjectives

German adjectives are confusing, any tips on how to treat them would be appreciated :)

Ordinals

Question: should ordinal numbers be treated as determiners or adjectives? Some examples in other languages:

fifth<det><ord><sp>  # english
quinto<det><ord><m><sg>  # spanish
fimmti<adj><ord><m><sg><nom>  # icelandic
vijfde<det><ord><sp>  # dutch
It doesn't really matter either way. What do the traditional grammars say ? - Francis Tyers 22:03, 25 October 2011 (UTC)

Contractions

Question: how should prepositional articles (preposition + article) be treated? e.g.

am = an + dem
aufs = auf + das
beim = bei + dem
im = in + dem
vom = von + dem

my guess is that it should follow the treatment in other languages, e.g.

al/a<pr>+el<det><def><m><sg>  # spanish
del/de<pr>+el<det><def><m><sg>  # spanish
au/à<pr>+le<det><def><m><sg>  # french

is this correct?

Yes, this is correct. But I wouldn't bother to do this in the speling file. Use the speling file mainly for the open categories. - Francis Tyers 22:03, 25 October 2011 (UTC)