Talk:German to English

From Apertium
Revision as of 17:23, 25 October 2011 by Elaichi (talk | contribs)
Jump to navigation Jump to search

What's the best approach to start adding entries to the German monodix?

A good way would be to stat writing a script to download Wiktionary entries for German nouns and converting them into speling format, e.g.
http://en.wiktionary.org/wiki/Bett#Declension
http://en.wiktionary.org/wiki/Haus#Declension
Bett; Bett; sg.nom; n.nt
Bett; Bettes; sg.gen; n.nt
Bett; Betts; sg.gen; n.nt
Bett; Bett; sg.dat; n.nt
Bett; Bett; sg.acc; n.nt
Bett; Betten; pl.nom; n.nt
Bett; Betten; pl.gen; n.nt 
Bett; Betten; pl.dat; n.nt
Bett; Betten; pl.acc; n.nt
Haus; Haus; sg.nom; n.nt
Haus; Hauses; sg.gen; n.nt
Haus; Haus; sg.gen; n.nt
Haus; Haus; sg.dat; n.nt
Haus; Haus; sg.acc; n.nt
Haus; Häuser; pl.nom; n.nt
Haus; Häuser; pl.gen; n.nt
Haus; Häusern; pl.dat; n.nt
Haus; Häuser; pl.acc; n.nt
There are around 15,000 entries in the category German nouns, so that should be a good start. - Francis Tyers 07:13, 18 October 2011 (UTC)
Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - Francis Tyers 07:15, 18 October 2011 (UTC)

Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:

Apfel<n><m><nom><sg>
Apfel<n><m><acc><sg>
Apfel<n><m><dat><sg>
<PoS><gender><number><case> - for lack of a better phrase, that's the order of inherency, plus it's easier to work with. Much easier. -- Jimregan 15:39, 19 October 2011 (UTC)
Also, listing 'viele' as the plural of 'ein' is dubious, and will more than likely cause problems. Treat them as separate words -- Jimregan 15:53, 19 October 2011 (UTC)
I started a stub at Tag_order on this, but it's not very complete. --unhammer 07:05, 20 October 2011 (UTC)

Here's a repository with some initial progress (sorry for the delay, I was out of town last week):

https://github.com/elaichi/apertium-de-en-dev

There are fewer nouns than expected because my script only got the ones with the de-noun template and not the infl|de|noun template.


Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this:

bin/sein<vbser><pres><p1><sg>
bist/sein<vbser><pres><p2><sg>
ist/sein<vbser><pres><p3><sg>
sind/sein<vbser><pres><p1><pl>
sind/sein<vbser><pres><p3><pl>
habe/haben<vbhaver><pres><p1><sg>
hast/haben<vbhaver><pres><p2><sg>
hat/haben<vbhaver><pres><p3><sg>
haben/haben<vbhaver><pres><p1><pl>
haben/haben<vbhaver><pres><p3><pl>
haben/haben<vbhaver><inf>
...

and mark those six modal verbs with the vbmod tag. But doing this would leave the vbaux tag unused, is this correct?

Does "werden" classify as vbaux?

Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this:

ich/ich<prn><p1><mf><sg><nom>  
mich/ich<prn><p1><mf><sg><acc> 
mir/ich<prn><p1><mf><sg><dat>  

du/du<prn><p2><mf><sg><nom>    
dich/du<prn><p2><mf><sg><acc>  
dir/du<prn><p2><mf><sg><dat>   

er/er<prn><p3><m><sg><nom>     
ihn/er<prn><p3><m><sg><acc>    
ihm/er<prn><p3><m><sg><dat>    
...

this is, as opposed to using prpers as in:

I/prpers<prn><subj><p1><mf><sg>
me/prpers<prn><obj><p1><mf><sg>

is this correct?


German adjectives are confusing, any tips on how to treat them would be appreciated :)


Question: should ordinal numbers be treated as determiners or adjectives? Some examples in other languages:

fifth<det><ord><sp>  # english
quinto<det><ord><m><sg>  # spanish
fimmti<adj><ord><m><sg><nom>  # icelandic
vijfde<det><ord><sp>  # dutch