Difference between revisions of "Talk:German to English"
Line 43: | Line 43: | ||
::: I started a stub at [[Tag_order]] on this, but it's not very complete. --[[User:Unhammer|unhammer]] 07:05, 20 October 2011 (UTC) |
::: I started a stub at [[Tag_order]] on this, but it's not very complete. --[[User:Unhammer|unhammer]] 07:05, 20 October 2011 (UTC) |
||
<hr> |
|||
Here's a repository with some initial progress (sorry for the delay, I was out of town last week): |
Here's a repository with some initial progress (sorry for the delay, I was out of town last week): |
||
Line 48: | Line 49: | ||
There are fewer nouns than expected because my script only got the ones with the [http://en.wiktionary.org/wiki/Template:de-noun de-noun template] and not the <code>infl|de|noun</code> template. |
There are fewer nouns than expected because my script only got the ones with the [http://en.wiktionary.org/wiki/Template:de-noun de-noun template] and not the <code>infl|de|noun</code> template. |
||
<hr> |
|||
Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this: |
|||
<pre> |
|||
bin/sein<vbser><pres><p1><sg> |
|||
bist/sein<vbser><pres><p2><sg> |
|||
ist/sein<vbser><pres><p3><sg> |
|||
sind/sein<vbser><pres><p1><pl> |
|||
sind/sein<vbser><pres><p3><pl> |
|||
habe/haben<vbhaver><pres><p1><sg> |
|||
hast/haben<vbhaver><pres><p2><sg> |
|||
hat/haben<vbhaver><pres><p3><sg> |
|||
haben/haben<vbhaver><pres><p1><pl> |
|||
haben/haben<vbhaver><pres><p3><pl> |
|||
haben/haben<vbhaver><inf> |
|||
... |
|||
</pre> |
|||
and mark those six modal verbs with the <code>vbmod</code> tag. But doing this would leave the <code>vbaux</code> tag unused, is this correct? |
|||
<hr> |
|||
Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this: |
|||
<pre> |
|||
ich/ich<prn><p1><mf><sg><nom> |
|||
mich/ich<prn><p1><mf><sg><acc> |
|||
mir/ich<prn><p1><mf><sg><dat> |
|||
du/du<prn><p2><mf><sg><nom> |
|||
dich/du<prn><p2><mf><sg><acc> |
|||
dir/du<prn><p2><mf><sg><dat> |
|||
er/er<prn><p3><m><sg><nom> |
|||
ihn/er<prn><p3><m><sg><acc> |
|||
ihm/er<prn><p3><m><sg><dat> |
|||
... |
|||
</pre> |
|||
this is, in opposition of using <code>prpers</code> as in: |
|||
<pre> |
|||
I/prpers<prn><subj><p1><mf><sg> |
|||
me/prpers<prn><obj><p1><mf><sg> |
|||
</pre> |
|||
is this correct? |
Revision as of 12:17, 25 October 2011
What's the best approach to start adding entries to the German monodix?
- A good way would be to stat writing a script to download Wiktionary entries for German nouns and converting them into speling format, e.g.
Bett; Bett; sg.nom; n.nt Bett; Bettes; sg.gen; n.nt Bett; Betts; sg.gen; n.nt Bett; Bett; sg.dat; n.nt Bett; Bett; sg.acc; n.nt Bett; Betten; pl.nom; n.nt Bett; Betten; pl.gen; n.nt Bett; Betten; pl.dat; n.nt Bett; Betten; pl.acc; n.nt Haus; Haus; sg.nom; n.nt Haus; Hauses; sg.gen; n.nt Haus; Haus; sg.gen; n.nt Haus; Haus; sg.dat; n.nt Haus; Haus; sg.acc; n.nt Haus; Häuser; pl.nom; n.nt Haus; Häuser; pl.gen; n.nt Haus; Häusern; pl.dat; n.nt Haus; Häuser; pl.acc; n.nt
- There are around 15,000 entries in the category German nouns, so that should be a good start. - Francis Tyers 07:13, 18 October 2011 (UTC)
- Another thing you can do is make lists of closed category words that don't inflect (E.g. prepositions, conjunctions) and also of abbreviations. - Francis Tyers 07:15, 18 October 2011 (UTC)
Francis, what should be the expected order of the symbols in the morphological analysis? Let's say we are analyzing "Apfel", is it <POS><gender><case><number> or <POS><gender><number><case>? I guess it should also output all the possible cases, e.g.:
Apfel<n><m><nom><sg> Apfel<n><m><acc><sg> Apfel<n><m><dat><sg>
- <PoS><gender><number><case> - for lack of a better phrase, that's the order of inherency, plus it's easier to work with. Much easier. -- Jimregan 15:39, 19 October 2011 (UTC)
- Also, listing 'viele' as the plural of 'ein' is dubious, and will more than likely cause problems. Treat them as separate words -- Jimregan 15:53, 19 October 2011 (UTC)
Here's a repository with some initial progress (sorry for the delay, I was out of town last week):
https://github.com/elaichi/apertium-de-en-dev
There are fewer nouns than expected because my script only got the ones with the de-noun template and not the infl|de|noun
template.
Question: In "Basic German" by Schenke the only two auxiliary verbs are "sein" and "haben", while 'the six modal verbs in German' are "dürfen", "können", "müssen", "sollen", "wollen", "mögen". I guess that the correct treatment in Apertium is something like this:
bin/sein<vbser><pres><p1><sg> bist/sein<vbser><pres><p2><sg> ist/sein<vbser><pres><p3><sg> sind/sein<vbser><pres><p1><pl> sind/sein<vbser><pres><p3><pl> habe/haben<vbhaver><pres><p1><sg> hast/haben<vbhaver><pres><p2><sg> hat/haben<vbhaver><pres><p3><sg> haben/haben<vbhaver><pres><p1><pl> haben/haben<vbhaver><pres><p3><pl> haben/haben<vbhaver><inf> ...
and mark those six modal verbs with the vbmod
tag. But doing this would leave the vbaux
tag unused, is this correct?
Question: Regarding personal pronouns, I looked at the way it's done in Icelandic and it seems that the correct treatment would be something like this:
ich/ich<prn><p1><mf><sg><nom> mich/ich<prn><p1><mf><sg><acc> mir/ich<prn><p1><mf><sg><dat> du/du<prn><p2><mf><sg><nom> dich/du<prn><p2><mf><sg><acc> dir/du<prn><p2><mf><sg><dat> er/er<prn><p3><m><sg><nom> ihn/er<prn><p3><m><sg><acc> ihm/er<prn><p3><m><sg><dat> ...
this is, in opposition of using prpers
as in:
I/prpers<prn><subj><p1><mf><sg> me/prpers<prn><obj><p1><mf><sg>
is this correct?