Metadix

From Apertium
Jump to: navigation, search

En français

Metadix (as seen in files containing the extension .metadix) is a still poorly-documented, minor extension of the .dix format of Apertium dictionaries that allows a certain level of parameterisation of paradigms in monodixes. Metadixes are converted to the standard .dix format during compilation using XSLT stylesheets.

Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan.

[edit] From the documentation

When developing the dictionaries for the Occitan translator, we were faced with a new need: we wanted to be able to specify paradigms for verbs that had a same inflection pattern but whose root changed in the different inflected forms. With the existing paradigm system, a new paradigm had to be created for each of these verbs, since it was only possible to specify an inflection regularity pattern for a group of verbs with invariable root. With metaparadigms, it is possible to specify the inflection regularity as well as verb root variations.

At the same time, metaparadigms allow the specification, in a single paradigm, of variations in the grammatical symbols of a lemma. That is, several lemmas can refer to a same metaparadigm even if they have different grammatical symbols. Whereas for Occitan, metaparadigms have allowed having a same paradigm for entries with root variations, for English, these have allowed having a same paradigm for entries with variations in their grammatical symbols.

Related with this, we created the concept of metadictionary: it is a dictionary which contains metaparadigms as well as the normal paradigms used so far. The name of a metadictionary is apertium-PAIR.L1.metadix (for example, for the English monolingual dictionary in the Apertium-en-ca system, apertium-en-ca.en.metadix). When linguistic data are compiled these dictionaries are pre-processed, so that they have the appropriate format for the dictionary compiler.

Metaparadigms are defined in the <pardefs> section of the monolingual dictionary, the same section where also the rest of the dictionary paradigms are defined. A metaparadigm, just like a paradigm, has a name specified in the attribute n. This name will have the same characteristics as in the other paradigms, with the difference that the variable part of the lemma root will be in brackets and in capital letters, as you can see in this example:

<pardef n="m/é[T]er__vblex">

This is the definition of a verb paradigm, where the inflection endings have a variable part in the root. The inflection paradigms specified inside this metaparadigm have to present inflection only in the part at the right of the brackets, for example like the one specified in the paradigm:

<par n="mét/er__vblex"/>

In conclusion, a complete example of metaparadigm definition would be:

<pardef n="m/é[T]er__vblex">
  <e>
    <p>
      <l>e</l>
      <r>é</r>
    </p>
    <i><prm/><i>
    <par n="sent/eria__vblex"/>
  </e>
  <e>
    <i>é<prm/></i>
    <par n="mét/er__vblex"/>
  </e>
</pardef>


The tag <prm/> is the marker that is used to place the variable text part (the root variation) in the paradigm definition.

Once a metaparadigm is defined, we may want that a verb uses it. To do so, in the verb entry (inside a <e> element) we must indicate the suitable metaparadigm and, through the attribute prm, define with which letters we want to replace the variable part specified in brackets. For example:

<e lm="acuélher">
  <i>acu</i>
  <par n="m/é[T]er__vblex" prm="lh"/>
</e>

This entry defines the Occitan verb acuélher ("to receive") and specifies that its inflection paradigm is the one defined by the metaparadigm m/é[T]er__vblex, but replacing T with lh; that is, the letters following acu will be élher instead of éter.

As mentioned before, metaparadigms can also be used for entries which have some variation in their grammatical symbols. The way to specify them is basically the same: the variable part must be specified in the entry with the attribute sa, whereas in the paradigm the tag <sa> has to be placed where the optional grammatical symbol should appear.

[edit] See also

Personal tools