Difference between revisions of "Attribute dictionary"
(Created page with "This page describes the idea of an '''attribute dictionary''' for Apertium. So far, what we do is: * Get a load of tags out of our morphological analyser (and CG in some cas...") |
|||
(9 intermediate revisions by 2 users not shown) | |||
Line 6: | Line 6: | ||
* Define attributes which match these tags. |
* Define attributes which match these tags. |
||
However, in some cases we might like attributes to be available to transfer that we don't want to put in our morphological analyser. |
However, in some cases we might like attributes to be available to transfer that we don't want to put in our morphological analyser or CG. |
||
Examples: |
Examples: |
||
Line 14: | Line 14: | ||
* Is human or not for nouns. |
* Is human or not for nouns. |
||
* Does the noun have some "extra" case forms? |
* Does the noun have some "extra" case forms? |
||
* Default prepositions for nouns given certain cases (loc → "on" vs "in", etc) |
|||
* Valency stuff for verbs |
* Valency stuff for verbs |
||
** Prepositions, |
** Prepositions, |
||
Line 21: | Line 22: | ||
This information is pretty lexicalised, and we might want it for a good number of words, so many in fact that having lists would be impractical. |
This information is pretty lexicalised, and we might want it for a good number of words, so many in fact that having lists would be impractical. |
||
So the idea is that we have a separate file |
So the idea is that we have a separate file (or it can go in the transfer file), which allows us to define arbitrary attributes and have them filled according to lexicalised patterns. There will be no magic. It'll just be patterns and attributes. |
||
A possible rule: |
|||
<pre> |
|||
<e> |
|||
<sl><cat-item lemma="drive" tags="v.tv.*"/></sl> |
|||
<tl><cat-item lemma="conducir" tags="v.tv.*"/></tl> |
|||
<let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let> |
|||
</e> |
|||
</pre> |
|||
This would fill the clip part a_arg1_case of this target word, which could then be used as normal in the transfer rules: |
|||
<pre> |
|||
<choose><when><test> |
|||
<not><equal><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></equal></not></test> |
|||
<let><var n="chunk_governs_arg1"/><lit-tag v="acc"/></let> |
|||
</when></choose> |
|||
</pre> |
|||
One benefit will be that it won't involve changing the output of the morphological analyser, which could make transfer rules or the bilingual dictionary break. |
One benefit will be that it won't involve changing the output of the morphological analyser, which could make transfer rules or the bilingual dictionary break. |
||
A drawback will be that it is an extra file (or not), and it could easily get unmanageable. |
A drawback will be that it is an extra file (or not), and it could easily get unmanageable. |
||
==Notes== |
|||
<references/> |
|||
Latest revision as of 12:25, 5 February 2014
This page describes the idea of an attribute dictionary for Apertium.
So far, what we do is:
- Get a load of tags out of our morphological analyser (and CG in some cases)
- Define attributes which match these tags.
However, in some cases we might like attributes to be available to transfer that we don't want to put in our morphological analyser or CG.
Examples:
- Countability for nouns.
- Animacy for nouns.
- Is human or not for nouns.
- Does the noun have some "extra" case forms?
- Default prepositions for nouns given certain cases (loc → "on" vs "in", etc)
- Valency stuff for verbs
- Prepositions,
- Does it take -ing or inf ?
- Cases for arguments
This information is pretty lexicalised, and we might want it for a good number of words, so many in fact that having lists would be impractical.
So the idea is that we have a separate file (or it can go in the transfer file), which allows us to define arbitrary attributes and have them filled according to lexicalised patterns. There will be no magic. It'll just be patterns and attributes.
A possible rule:
<e> <sl><cat-item lemma="drive" tags="v.tv.*"/></sl> <tl><cat-item lemma="conducir" tags="v.tv.*"/></tl> <let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let> </e>
This would fill the clip part a_arg1_case of this target word, which could then be used as normal in the transfer rules:
<choose><when><test> <not><equal><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></equal></not></test> <let><var n="chunk_governs_arg1"/><lit-tag v="acc"/></let> </when></choose>
One benefit will be that it won't involve changing the output of the morphological analyser, which could make transfer rules or the bilingual dictionary break.
A drawback will be that it is an extra file (or not), and it could easily get unmanageable.