Talk:Metadix

From Apertium
Revision as of 08:18, 25 April 2016 by Unhammer (talk | contribs)
Jump to navigation Jump to search

Proposal for several prm's

At the Apertium EGM at FreeRBMT, Sergio proposed to integrate an enhanced version of metadix into the compiler; as I understood it, the new syntax will look similar to this:

  <par n="v/[a][ter]__n">
    <e>
      <p>
        <l><prm n="1"/><prm n="3"/></l>
        <r><prm n="1"/><prm n="3"/><s n="n"/><s n="sg"/><s n="nom"/></r>
      </p>
    </e>
    <e>
      <p>
        <l><prm n="2"/><prm n="3"/></l>
        <r><prm n="1"/><prm n="3"/><s n="n"/><s n="pl"/><s n="nom"/></r>
      </p>
    </e>
    <e>
      <p>
        <l><prm n="1"/><prm n="3"/>s</l>
        <r><prm n="1"/><prm n="3"/><s n="n"/><s n="sg"/><s n="gen"/></r>
      </p>
    </e>
    <e>
      <p>
        <l><prm n="2"/><prm n="3"/>n</l>
        <r><prm n="1"/><prm n="3"/><s n="n"/><s n="sg"/><s n="dat"/></r>
      </p>
    </e>
  </pardef>

  <e lm="vater"><i>v</i><par n="v/[a][ter]__n" prm="a" prm="ä" prm="ter"/></e>

That is; <prm> elements will be numbered (so they may be freely placed anywhere within the pardef), while prm attributes will be specified with an implicit order; the above paradigm with the specified parameters will expand to this:

  <par n="v/[a][ter]__n">
    <e>
      <p>
        <l>ater</l>
        <r>ater<s n="n"/><s n="sg"/><s n="nom"/></r>
      </p>
    </e>
    <e>
      <p>
        <l>äter</l>
        <r>ater<s n="n"/><s n="pl"/><s n="nom"/></r>
      </p>
    </e>
    <e>
      <p>
        <l>aters</l>
        <r>ater<s n="n"/><s n="sg"/><s n="gen"/></r>
      </p>
    </e>
    <e>
      <p>
        <l>ätern</l>
        <r>ater<s n="n"/><s n="sg"/><s n="dat"/></r>
      </p>
    </e>
  </pardef>


This'd be great to have for e.g. Kurdish, where we have e..g

      <!-- parastin; ; parast; parêz -->
    <e lm="parastin"><p><l>parast</l><r>parastin</r></p><par n="kir/__vblex_tv"/></e>
    <e lm="parastin"><p><l>diparêz</l><r>parastin</r></p><par n="dik/e__vblex_tv"/></e>
    <e lm="parastin"><p><l>biparêz</l><r>parastin</r></p><par n="bik/e__vblex_tv"/></e>
    <e lm="parastin"><p><l>neparast</l><r>parastin</r></p><par n="nekir/__vblex_tv"/></e>
    <e lm="parastin"><p><l>naparêz</l><r>parastin</r></p><par n="nak/e__vblex_tv"/></e>
    <e lm="parastin"><p><l>neparêz</l><r>parastin</r></p><par n="nek/e__vblex_tv"/></e>

The above could instead be as simple as

    <e lm="parastin"><par n="[got]in_di[bej]__vblex_tv" prm="parast" prm="parêz"/></e>

where the pardef has

  <pardef n="[got]in_di[bej]__vblex_tv" nprm="2"/>
    <e><p><l><prm n="1"/></l><r><prm n="1"/>in</r></p><par n="kir/__vblex_tv"/></e>
    <e><p><l>di<prm n="2"/></l><r><prm n="1"/>in</r></p><par n="dik/e__vblex_tv"/></e>
    <e><p><l>bi<prm n="2"/></l><r><prm n="1"/>in</r></p><par n="bik/e__vblex_tv"/></e>
    <e><p><l>ne<prm n="1"/></l><r><prm n="1"/>in</r></p><par n="nekir/__vblex_tv"/></e>
    <e><p><l>na<prm n="2"/></l><r><prm n="1"/>in</r></p><par n="nak/e__vblex_tv"/></e>
    <e><p><l>ne<prm n="2"/></l><r><prm n="1"/>in</r></p><par n="nek/e__vblex_tv"/></e>
  </pardef>

Identical attributes invalid?

To be valid XML, wouldn't it have to be prm1="foo", prm2="bar" etc? (We could simply put prm's up to some high-enough-but-finite number in the DTD.)


The alternative, more XML-like, would be to allow child elements to the par, ie.

    <e lm="parastin"><par n="[got]in_di[bej]__vblex_tv"><prm="parast"/><prm="parêz"/></par></e>

– that might actually be simpler all round?