Difference between revisions of "Metadix"

From Apertium
Jump to navigation Jump to search
Line 70: Line 70:
 
</e>
 
</e>
 
</pardef>
 
</pardef>
 
 
</pre>
 
</pre>
   
   
The tag \texttt{<\textbf{prm}/>} is the marker that is used to place
+
The tag <code><prm/></code> is the marker that is used to place
 
the variable text part (the root variation) in the paradigm
 
the variable text part (the root variation) in the paradigm
 
definition.
 
definition.
 
   
 
Once a metaparadigm is defined, we may want that a verb uses it. To do
 
Once a metaparadigm is defined, we may want that a verb uses it. To do
so, in the verb entry (inside a \texttt{<\textbf{e}>} element) we must
+
so, in the verb entry (inside a <code><e></code> element) we must
 
indicate the suitable metaparadigm and, through the attribute
 
indicate the suitable metaparadigm and, through the attribute
\texttt{prm}, define with which letters we want to replace the
+
<code>prm</code>, define with which letters we want to replace the
 
variable part specified in brackets. For example:
 
variable part specified in brackets. For example:
   
  +
<pre>
\begin{alltt}
 
<\textbf{e} lm="acuélher">
+
<e lm="acuélher">
<\textbf{i}>acu</\textbf{i}>
+
<i>acu</i>
<\textbf{par} n="m/é[T]er__vblex" prm="lh"/>
+
<par n="m/é[T]er__vblex" prm="lh"/>
</\textbf{e}>
+
</e>
  +
</pre>
   
 
This entry defines the Occitan verb ''acuélher'' ("to receive") and
\end{alltt}
 
 
This entry defines the Occitan verb \emph{acuélher} ("to receive") and
 
 
specifies that its inflection paradigm is the one defined by the
 
specifies that its inflection paradigm is the one defined by the
metaparadigm \texttt{m/é[T]er\_\_vblex}, but replacing \texttt{T} with
+
metaparadigm <code>m/é[T]er__vblex</code>, but replacing <code>T</code> with
\texttt{lh}; that is, the letters following \emph{acu} will be
+
<code>lh</code>; that is, the letters following <code>acu</code> will be
\emph{élher} instead of \emph{éter}.
+
''élher'' instead of ''éter''.
 
 
   
 
As mentioned before, metaparadigms can also be used for entries which
 
As mentioned before, metaparadigms can also be used for entries which
 
have some variation in their grammatical symbols. The way to specify
 
have some variation in their grammatical symbols. The way to specify
 
them is basically the same: the variable part must be specified in the
 
them is basically the same: the variable part must be specified in the
entry with the attribute \texttt{sa}, whereas in the paradigm the tag
+
entry with the attribute <code>sa</code>, whereas in the paradigm the tag
\texttt{<\textbf{sa}>} has to be placed where the optional grammatical
+
<code><sa></code> has to be placed where the optional grammatical
 
symbol should appear.
 
symbol should appear.
   

Revision as of 09:45, 4 June 2007

Metadix (as seen in files containing the extension .metadix) is a still poorly-documented, minor extension of the .dix format of Apertium dictionaries that allows a certain level of parameterization of paradigms in monodixes. Metadixes are converted to the standard .dix format during compilation using XSLT stylesheets.

Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan.

From the documentation

When developing the dictionaries for the Occitan translator, we were faced with a new need: we wanted to be able to specify paradigms for verbs that had a same inflection pattern but whose root changed in the different inflected forms. With the existing paradigm system, a new paradigm had to be created for each of these verbs, since it was only possible to specify an inflection regularity pattern for a group of verbs with invariable root. With metaparadigms, it is possible to specify the inflection regularity as well as verb root variations.

At the same time, metaparadigms allow the specification, in a single paradigm, of variations in the grammatical symbols of a lemma. That is, several lemmas can refer to a same metaparadigm even if they have different grammatical symbols. Whereas for Occitan, metaparadigms have allowed having a same paradigm for entries with root variations, for English, these have allowed having a same paradigm for entries with variations in their grammatical symbols.

Related with this, we created the concept of metadictionary: it is a dictionary which contains metaparadigms as well as the normal paradigms used so far. The name of a metadictionary is apertium-PAIR.L1.metadix (for example, for the English monolingual dictionary in the Apertium-en-ca system, apertium-en-ca.en.metadix). When linguistic data are compiled these dictionaries are pre-processed, so that they have the appropriate format for the dictionary compiler.

Metaparadigms are defined in the <pardefs> section of the monolingual dictionary, the same section where also the rest of the dictionary paradigms are defined. A metaparadigm, just like a paradigm, has a name specified in the attribute n. This name will have the same characteristics as in the other paradigms, with the difference that the variable part of the lemma root will be in brackets and in capital letters, as you can see in this example:

<pardef n="m/é[T]er__vblex">

This is the definition of a verb paradigm, where the inflection endings have a variable part in the root. The inflection paradigms specified inside this metaparadigm have to present inflection only in the part at the right of the brackets, for example like the one specified in the paradigm:

<par n="mét/er__vblex"/>

In conclusion, a complete example of metaparadigm definition would be:

<pardef n="m/é[T]er__vblex">
  <e>
    <p>
      <l>e</l>
      <r>é</r>
    </p>
    <i><prm/><i>
    <par n="sent/eria__vblex"/>
  </e>
  <e>
    <i>é<prm/></i>
    <par n="mét/er__vblex"/>
  </e>
</pardef>


The tag <prm/> is the marker that is used to place the variable text part (the root variation) in the paradigm definition.

Once a metaparadigm is defined, we may want that a verb uses it. To do so, in the verb entry (inside a <e> element) we must indicate the suitable metaparadigm and, through the attribute prm, define with which letters we want to replace the variable part specified in brackets. For example:

<e lm="acuélher">
  <i>acu</i>
  <par n="m/é[T]er__vblex" prm="lh"/>
</e>

This entry defines the Occitan verb acuélher ("to receive") and specifies that its inflection paradigm is the one defined by the metaparadigm m/é[T]er__vblex, but replacing T with lh; that is, the letters following acu will be élher instead of éter.

As mentioned before, metaparadigms can also be used for entries which have some variation in their grammatical symbols. The way to specify them is basically the same: the variable part must be specified in the entry with the attribute sa, whereas in the paradigm the tag <sa> has to be placed where the optional grammatical symbol should appear.