Difference between revisions of "Metadix"

From Apertium
Jump to navigation Jump to search
 
(11 intermediate revisions by 4 users not shown)
Line 1: Line 1:
[[Fichiers metadix et métaparadigmes|En français]]
''Metadix'' (as in files containing the extension <code>.metadix</code>) is a poorly-documented, minor extension of the <code>.dix</code> format of Apertium dictionaries that allows a certain level of parameterization of paradigms in [[monodix | monodixes]]. Metadixes are converted to the standard <code>.dix</code> format during compilation using XSLT stylesheets.

''Metadix'' (as seen in files containing the extension <code>.metadix</code>) is a still poorly-documented, minor extension of the <code>.dix</code> format of Apertium dictionaries that allows a certain level of parameterisation of paradigms in [[monodix | monodixes]]. Metadixes are converted to the standard <code>.dix</code> format during compilation using XSLT stylesheets.


Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan.
Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan.


==From the documentation==
This article should be expanded with an example.

When developing the dictionaries for the Occitan translator, we were
faced with a new need: we wanted to be able to specify paradigms for
verbs that had a same inflection pattern but whose root changed in the
different inflected forms. With the existing paradigm system, a new
paradigm had to be created for each of these verbs, since it was only
possible to specify an inflection regularity pattern for a group of
verbs with invariable root. With metaparadigms, it is possible to
specify the inflection regularity as well as verb root variations.

At the same time, metaparadigms allow the specification, in a single
paradigm, of variations in the grammatical symbols of a lemma. That
is, several lemmas can refer to a same metaparadigm even if they have
different grammatical symbols. Whereas for Occitan, metaparadigms have
allowed having a same paradigm for entries with root variations, for
English, these have allowed having a same paradigm for entries with
variations in their grammatical symbols.

Related with this, we created the concept of metadictionary: it is a
dictionary which contains metaparadigms as well as the normal
paradigms used so far. The name of a metadictionary is
<code>apertium-PAIR.L<sub>1</sub>.metadix</code>
(for example, for the English monolingual dictionary in the
Apertium-en-ca system, <code>apertium-en-ca.en.metadix</code>). When
linguistic data are compiled these dictionaries are pre-processed, so
that they have the appropriate format for the dictionary compiler.

Metaparadigms are defined in the <code><pardefs></code> section
of the monolingual dictionary, the same section where also the rest of
the dictionary paradigms are defined. A metaparadigm, just like a
paradigm, has a name specified in the attribute <code>n</code>. This name
will have the same characteristics as in the other paradigms, with the
difference that the variable part of the lemma root will be in brackets and
in capital letters, as you can see in this example:

<code>
<pardef n="m/é[T]er__vblex">
</code>

This is the definition of a verb paradigm, where the inflection
endings have a variable part in the root. The inflection paradigms
specified inside this metaparadigm have to present inflection only in
the part at the right of the brackets, for example like the one
specified in the paradigm:

<code>
<par n="mét/er__vblex"/>
</code>

In conclusion, a complete example of metaparadigm definition would be:

<pre>
<pardef n="m/é[T]er__vblex">
<e>
<p>
<l>e</l>
<r>é</r>
</p>
<i><prm/><i>
<par n="sent/eria__vblex"/>
</e>
<e>
<i>é<prm/></i>
<par n="mét/er__vblex"/>
</e>
</pardef>
</pre>


The tag <code><prm/></code> is the marker that is used to place
the variable text part (the root variation) in the paradigm
definition.

Once a metaparadigm is defined, we may want that a verb uses it. To do
so, in the verb entry (inside a <code><e></code> element) we must
indicate the suitable metaparadigm and, through the attribute
<code>prm</code>, define with which letters we want to replace the
variable part specified in brackets. For example:

<pre>
<e lm="acuélher">
<i>acu</i>
<par n="m/é[T]er__vblex" prm="lh"/>
</e>
</pre>

This entry defines the Occitan verb ''acuélher'' ("to receive") and
specifies that its inflection paradigm is the one defined by the
metaparadigm <code>m/é[T]er__vblex</code>, but replacing <code>T</code> with
<code>lh</code>; that is, the letters following <code>acu</code> will be
''élher'' instead of ''éter''.

As mentioned before, metaparadigms can also be used for entries which
have some variation in their grammatical symbols. The way to specify
them is basically the same: the variable part must be specified in the
entry with the attribute <code>sa</code>, whereas in the paradigm the tag
<code><sa></code> has to be placed where the optional grammatical
symbol should appear.

==See also==
* [[Unification of metadix and parametrized dictionaries]]
* [[Talk:Metadix]] proposal for multiple prm's
* [[Prefixes and infixes]]


[[Category:Terminology]]
[[Category:Terminology]]
[[Category:Writing dictionaries]]
[[Category:Documentation in English]]

Latest revision as of 08:26, 25 April 2016

En français

Metadix (as seen in files containing the extension .metadix) is a still poorly-documented, minor extension of the .dix format of Apertium dictionaries that allows a certain level of parameterisation of paradigms in monodixes. Metadixes are converted to the standard .dix format during compilation using XSLT stylesheets.

Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan.

From the documentation[edit]

When developing the dictionaries for the Occitan translator, we were faced with a new need: we wanted to be able to specify paradigms for verbs that had a same inflection pattern but whose root changed in the different inflected forms. With the existing paradigm system, a new paradigm had to be created for each of these verbs, since it was only possible to specify an inflection regularity pattern for a group of verbs with invariable root. With metaparadigms, it is possible to specify the inflection regularity as well as verb root variations.

At the same time, metaparadigms allow the specification, in a single paradigm, of variations in the grammatical symbols of a lemma. That is, several lemmas can refer to a same metaparadigm even if they have different grammatical symbols. Whereas for Occitan, metaparadigms have allowed having a same paradigm for entries with root variations, for English, these have allowed having a same paradigm for entries with variations in their grammatical symbols.

Related with this, we created the concept of metadictionary: it is a dictionary which contains metaparadigms as well as the normal paradigms used so far. The name of a metadictionary is apertium-PAIR.L1.metadix (for example, for the English monolingual dictionary in the Apertium-en-ca system, apertium-en-ca.en.metadix). When linguistic data are compiled these dictionaries are pre-processed, so that they have the appropriate format for the dictionary compiler.

Metaparadigms are defined in the <pardefs> section of the monolingual dictionary, the same section where also the rest of the dictionary paradigms are defined. A metaparadigm, just like a paradigm, has a name specified in the attribute n. This name will have the same characteristics as in the other paradigms, with the difference that the variable part of the lemma root will be in brackets and in capital letters, as you can see in this example:

<pardef n="m/é[T]er__vblex">

This is the definition of a verb paradigm, where the inflection endings have a variable part in the root. The inflection paradigms specified inside this metaparadigm have to present inflection only in the part at the right of the brackets, for example like the one specified in the paradigm:

<par n="mét/er__vblex"/>

In conclusion, a complete example of metaparadigm definition would be:

<pardef n="m/é[T]er__vblex">
  <e>
    <p>
      <l>e</l>
      <r>é</r>
    </p>
    <i><prm/><i>
    <par n="sent/eria__vblex"/>
  </e>
  <e>
    <i>é<prm/></i>
    <par n="mét/er__vblex"/>
  </e>
</pardef>


The tag <prm/> is the marker that is used to place the variable text part (the root variation) in the paradigm definition.

Once a metaparadigm is defined, we may want that a verb uses it. To do so, in the verb entry (inside a <e> element) we must indicate the suitable metaparadigm and, through the attribute prm, define with which letters we want to replace the variable part specified in brackets. For example:

<e lm="acuélher">
  <i>acu</i>
  <par n="m/é[T]er__vblex" prm="lh"/>
</e>

This entry defines the Occitan verb acuélher ("to receive") and specifies that its inflection paradigm is the one defined by the metaparadigm m/é[T]er__vblex, but replacing T with lh; that is, the letters following acu will be élher instead of éter.

As mentioned before, metaparadigms can also be used for entries which have some variation in their grammatical symbols. The way to specify them is basically the same: the variable part must be specified in the entry with the attribute sa, whereas in the paradigm the tag <sa> has to be placed where the optional grammatical symbol should appear.

See also[edit]