Multiwords

From Apertium
Revision as of 15:23, 25 November 2007 by Francis Tyers (talk | contribs)
Jump to navigation Jump to search

Its possible to have pretty complex multiword combinations.

    <e lm="zračna luka">
      <i>zračn</i>
      <par n="zračn/a__adj"/>
      <p>
        <l><b/>luk</l>
        <r><g><b/>luk</g></r>
      </p>
      <par n="stolic/a__n"/>
    </e>
$ echo "zračna luka" |  lt-proc sh-mk.automorf.bin 
^zračna luka/zračna<adj><f><sg><nom># luka<n><f><gen><pl>/zračna<adj><f><sg><nom># luka<n><f><nom><sg>$

$ echo "zračna luka" |  lt-proc sh-mk.automorf.bin  | apertium-tagger -g sh-mk.prob 
^zračna<adj><f><sg><nom># luka<n><f><gen><pl>$

$ echo "zračna luka" |  lt-proc sh-mk.automorf.bin  | apertium-tagger -g sh-mk.prob  | apertium-pretransfer
^zračna# luka<adj><f><sg><nom><n><f><gen><pl>$
Need to consider
  • Analysis
  • Transfer (e.g. in the bidix)
  • Generation
Problems
  • How to resolve ^zračna# luka<adj><f><sg><nom><n><f><gen><pl>$ in the bidix?
Solutions
  • Have two paradigms for each adjective, one with tags, one without. (bad)
This would leave us with: ^zračna luka<n><f><gen><pl>$ (basically an orthographic paradigm).
  • Have more than one entry per multi-word — this is done in apertium-es-ca, see "dirección general", "direcciones generales". (bad)
  • Have a parameterised paradigm, that when called one way outputs a paradigm with symbols, and another way outputs a paradigm without symbols.
This would only be one way, the problem would come when we try and generate. How do we get the adjective to agree with the noun?