Difference between revisions of "Attribute dictionary"
Line 27: | Line 27: | ||
<pre> |
<pre> |
||
<entry> |
<entry> |
||
<sl><cat-item lemma="drive" tags="v.tv.*"/></sl> |
|||
<tl><cat-item lemma="conducir" tags="v.tv.*"/></tl> |
<tl><cat-item lemma="conducir" tags="v.tv.*"/></tl> |
||
<let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let> |
<let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let> |
||
</entry> |
</entry> |
Revision as of 10:28, 5 February 2014
This page describes the idea of an attribute dictionary for Apertium.
So far, what we do is:
- Get a load of tags out of our morphological analyser (and CG in some cases)
- Define attributes which match these tags.
However, in some cases we might like attributes to be available to transfer that we don't want to put in our morphological analyser or CG.
Examples:
- Countability for nouns.
- Animacy for nouns.
- Is human or not for nouns.
- Does the noun have some "extra" case forms?
- Default prepositions for nouns given certain cases (loc → "on" vs "in", etc)
- Valency stuff for verbs
- Prepositions,
- Does it take -ing or inf ?
- Cases for arguments
This information is pretty lexicalised, and we might want it for a good number of words, so many in fact that having lists would be impractical.
So the idea is that we have a separate file (or it can go in the transfer file), which allows us to define arbitrary attributes and have them filled according to lexicalised patterns. There will be no magic. It'll just be patterns and attributes.
A possible rule:
<entry> <sl><cat-item lemma="drive" tags="v.tv.*"/></sl> <tl><cat-item lemma="conducir" tags="v.tv.*"/></tl> <let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let> </entry>
One benefit will be that it won't involve changing the output of the morphological analyser, which could make transfer rules or the bilingual dictionary break.
A drawback will be that it is an extra file (or not), and it could easily get unmanageable.