Difference between revisions of "Metadix"
(Link to French page) |
|||
Line 108: | Line 108: | ||
[[Category:Terminology]] |
[[Category:Terminology]] |
||
[[Category: |
[[Category:Writing dictionaries]] |
||
[[Category:Documentation in English]] |
[[Category:Documentation in English]] |
Revision as of 06:41, 20 October 2014
Metadix (as seen in files containing the extension .metadix
) is a still poorly-documented, minor extension of the .dix
format of Apertium dictionaries that allows a certain level of parameterisation of paradigms in monodixes. Metadixes are converted to the standard .dix
format during compilation using XSLT stylesheets.
Metadixes are currently used in some language pairs, such as English-Catalan and Occitan-Catalan.
From the documentation
When developing the dictionaries for the Occitan translator, we were faced with a new need: we wanted to be able to specify paradigms for verbs that had a same inflection pattern but whose root changed in the different inflected forms. With the existing paradigm system, a new paradigm had to be created for each of these verbs, since it was only possible to specify an inflection regularity pattern for a group of verbs with invariable root. With metaparadigms, it is possible to specify the inflection regularity as well as verb root variations.
At the same time, metaparadigms allow the specification, in a single paradigm, of variations in the grammatical symbols of a lemma. That is, several lemmas can refer to a same metaparadigm even if they have different grammatical symbols. Whereas for Occitan, metaparadigms have allowed having a same paradigm for entries with root variations, for English, these have allowed having a same paradigm for entries with variations in their grammatical symbols.
Related with this, we created the concept of metadictionary: it is a
dictionary which contains metaparadigms as well as the normal
paradigms used so far. The name of a metadictionary is
apertium-PAIR.L1.metadix
(for example, for the English monolingual dictionary in the
Apertium-en-ca system, apertium-en-ca.en.metadix
). When
linguistic data are compiled these dictionaries are pre-processed, so
that they have the appropriate format for the dictionary compiler.
Metaparadigms are defined in the <pardefs>
section
of the monolingual dictionary, the same section where also the rest of
the dictionary paradigms are defined. A metaparadigm, just like a
paradigm, has a name specified in the attribute n
. This name
will have the same characteristics as in the other paradigms, with the
difference that the variable part of the lemma root will be in brackets and
in capital letters, as you can see in this example:
<pardef n="m/é[T]er__vblex">
This is the definition of a verb paradigm, where the inflection endings have a variable part in the root. The inflection paradigms specified inside this metaparadigm have to present inflection only in the part at the right of the brackets, for example like the one specified in the paradigm:
<par n="mét/er__vblex"/>
In conclusion, a complete example of metaparadigm definition would be:
<pardef n="m/é[T]er__vblex"> <e> <p> <l>e</l> <r>é</r> </p> <i><prm/><i> <par n="sent/eria__vblex"/> </e> <e> <i>é<prm/></i> <par n="mét/er__vblex"/> </e> </pardef>
The tag <prm/>
is the marker that is used to place
the variable text part (the root variation) in the paradigm
definition.
Once a metaparadigm is defined, we may want that a verb uses it. To do
so, in the verb entry (inside a <e>
element) we must
indicate the suitable metaparadigm and, through the attribute
prm
, define with which letters we want to replace the
variable part specified in brackets. For example:
<e lm="acuélher"> <i>acu</i> <par n="m/é[T]er__vblex" prm="lh"/> </e>
This entry defines the Occitan verb acuélher ("to receive") and
specifies that its inflection paradigm is the one defined by the
metaparadigm m/é[T]er__vblex
, but replacing T
with
lh
; that is, the letters following acu
will be
élher instead of éter.
As mentioned before, metaparadigms can also be used for entries which
have some variation in their grammatical symbols. The way to specify
them is basically the same: the variable part must be specified in the
entry with the attribute sa
, whereas in the paradigm the tag
<sa>
has to be placed where the optional grammatical
symbol should appear.
See also Unification of metadix and parametrized dictionaries