Talk:German to English

From Apertium
Revision as of 14:40, 19 October 2011 by Elaichi (talk | contribs)
Jump to navigation Jump to search

What's the best approach to start adding entries to the German monodix?

A good way would be to stat writing a script to download Wiktionary entries for German nouns and converting them into speling format, e.g.
http://en.wiktionary.org/wiki/Bett#Declension
http://en.wiktionary.org/wiki/Haus#Declension
Bett; Bett; sg.nom; n.nt
Bett; Bettes; sg.gen; n.nt
Bett; Betts; sg.gen; n.nt
Bett; Bett; sg.dat; n.nt
Bett; Bett; sg.acc; n.nt
Bett; Betten; pl.nom; n.nt
Bett; Betten; pl.gen; n.nt 
Bett; Betten; pl.dat; n.nt
Bett; Betten; pl.acc; n.nt
Haus; Haus; sg.nom; n.nt
Haus; Hauses; sg.gen; n.nt
Haus; Haus; sg.gen; n.nt
Haus; Haus; sg.dat; n.nt
Haus; Haus; sg.acc; n.nt
Haus; Häuser; pl.nom; n.nt
Haus; Häuser; pl.gen; n.nt
Haus; Häusern; pl.dat; n.nt
Haus; Häuser; pl.acc; n.nt
There are around 15,000 entries in the category German nouns, so that should be a good start. - Francis Tyers 07:13, 18 October 2011 (UTC)
Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - Francis Tyers 07:15, 18 October 2011 (UTC)

Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:

Apfel<n><m><nom><sg>
Apfel<n><m><acc><sg>
Apfel<n><m><dat><sg>