Difference between revisions of "Placeholder attributes"

From Apertium
Jump to navigation Jump to search
Line 13: Line 13:
* Take the information from a context variable.
* Take the information from a context variable.
* Select one of the forms by default(?) and hope that it is the most frequent.
* Select one of the forms by default(?) and hope that it is the most frequent.

[[Category:Documentation]]

Revision as of 12:19, 30 January 2008

Placeholder attributes (or synthetic attributes, atributos sintéticos, etc.) are those which only exist in the bilingual dictionaries and which are used to resolve changes in lexical properties in some translations. For example, if we translate estudiant, `student' (which has the same form for masculine and feminine) from Catalan to French, we have to decide if we want to generate étudiant (the masculine form) or étudiante (the feminine form). Because of this it is necessary to tag the gender of the word in the bilingual dictionary in a way which shows that there is a decision to be made in the transfer rules where the lexical information of the word doesn't directly allow a decision to be made.

in Apertium

In the bilingual dictionaries of many Apertium translators (es-ca, es-pt, fr-ca, etc.) there are two attributes, called <GD> (género por determinar — gender to be determined) and <ND> (número por determinar — number to be determined) which delegate the choice of which gender or number to the transfer module. These tags must be changed into these others in the output of the transfer module:

  • <GD> can be changed into <m> (masculine) or <f> (feminine) but not into <mf> (masculine and feminine)
  • <ND> can be changed into <sg> (singular) or <pl> (plural), but not into <sp> (singular and plural).

In order to make these decisions the following strategies are used:

  • Look at the morphological information of the surrounding words, searching for those which might have some useful linguistic information. For example, a determiner might mark the gender of a form in the original language, although this may not be marked in the word in question.
  • Take the information from a context variable.
  • Select one of the forms by default(?) and hope that it is the most frequent.