Difference between revisions of "Attribute dictionary"

From Apertium
Jump to navigation Jump to search
 
(4 intermediate revisions by one other user not shown)
Line 26: Line 26:
A possible rule:
A possible rule:
<pre>
<pre>
<e>
<spectie> <entry> [12:29]
<spectie> <sl><cat-item lemma="drive" tags="v.tv.*"/></sl><tl><cat-item
<sl><cat-item lemma="drive" tags="v.tv.*"/></sl>
lemma="conducir" tags="v.tv.*"/></tl>
<tl><cat-item lemma="conducir" tags="v.tv.*"/></tl>
<spectie> <let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag
<let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let>
</e>
v="gen"/></let>
</pre>
<spectie> </entry>
This would fill the clip part a_arg1_case of this target word, which could then be used as normal in the transfer rules:

<pre>
<choose><when><test>
<not><equal><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></equal></not></test>
<let><var n="chunk_governs_arg1"/><lit-tag v="acc"/></let>
</when></choose>
</pre>
</pre>



Latest revision as of 12:25, 5 February 2014

This page describes the idea of an attribute dictionary for Apertium.

So far, what we do is:

  • Get a load of tags out of our morphological analyser (and CG in some cases)
  • Define attributes which match these tags.

However, in some cases we might like attributes to be available to transfer that we don't want to put in our morphological analyser or CG.

Examples:

  • Countability for nouns.
  • Animacy for nouns.
  • Is human or not for nouns.
  • Does the noun have some "extra" case forms?
  • Default prepositions for nouns given certain cases (loc → "on" vs "in", etc)
  • Valency stuff for verbs
    • Prepositions,
    • Does it take -ing or inf ?
    • Cases for arguments

This information is pretty lexicalised, and we might want it for a good number of words, so many in fact that having lists would be impractical.

So the idea is that we have a separate file (or it can go in the transfer file), which allows us to define arbitrary attributes and have them filled according to lexicalised patterns. There will be no magic. It'll just be patterns and attributes.

A possible rule:

 <e> 
   <sl><cat-item lemma="drive" tags="v.tv.*"/></sl>
   <tl><cat-item lemma="conducir" tags="v.tv.*"/></tl>
   <let><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></let>
 </e>

This would fill the clip part a_arg1_case of this target word, which could then be used as normal in the transfer rules:

<choose><when><test>
          <not><equal><clip pos="1" side="tl" part="a_arg1_case"/><lit-tag v="gen"/></equal></not></test>
          <let><var n="chunk_governs_arg1"/><lit-tag v="acc"/></let>
        </when></choose>

One benefit will be that it won't involve changing the output of the morphological analyser, which could make transfer rules or the bilingual dictionary break.

A drawback will be that it is an extra file (or not), and it could easily get unmanageable.