Difference between revisions of "Multiwords"
Jump to navigation
Jump to search
Line 41: | Line 41: | ||
* Have a parameterised paradigm, that when called one way outputs a paradigm with symbols, and another way outputs a paradigm without symbols. |
* Have a parameterised paradigm, that when called one way outputs a paradigm with symbols, and another way outputs a paradigm without symbols. |
||
::This would only be one way, the problem would come when we try and generate. How do we get the adjective to agree with the noun? |
::This would only be one way, the problem would come when we try and generate. How do we get the adjective to agree with the noun? |
||
==The hack== |
|||
This is how it is taken care of in the current <code>apertium-es-ca</code> pair, which is tenable just about for Spanish, but for Slavic languages no chance. |
|||
<pre> |
|||
<e lm="dirección general"> |
|||
<p> |
|||
<l>dirección<b/>general</l> |
|||
<r>dirección<b/>general<s n="n"/><s n="f"/><s n="sg"/></r> |
|||
</p> |
|||
</e> |
|||
<e lm="dirección general"> |
|||
<p> |
|||
<l>direcciones<b/>generales</l> |
|||
<r>dirección<b/>general<s n="n"/><s n="f"/><s n="pl"/></r> |
|||
</p> |
|||
</e> |
|||
</pre> |
Revision as of 23:15, 19 December 2007
Its possible to have pretty complex multiword combinations.
<e lm="zračna luka"> <i>zračn</i> <par n="zračn/a__adj"/> <p> <l><b/>luk</l> <r><g><b/>luk</g></r> </p> <par n="stolic/a__n"/> </e>
$ echo "zračna luka" | lt-proc sh-mk.automorf.bin ^zračna luka/zračna<adj><f><sg><nom># luka<n><f><gen><pl>/zračna<adj><f><sg><nom># luka<n><f><nom><sg>$ $ echo "zračna luka" | lt-proc sh-mk.automorf.bin | apertium-tagger -g sh-mk.prob ^zračna<adj><f><sg><nom># luka<n><f><gen><pl>$ $ echo "zračna luka" | lt-proc sh-mk.automorf.bin | apertium-tagger -g sh-mk.prob | apertium-pretransfer ^zračna# luka<adj><f><sg><nom><n><f><gen><pl>$
- Need to consider
- Analysis
- Transfer (e.g. in the bidix)
- Generation
- Problems
- How to resolve
^zračna# luka<adj><f><sg><nom><n><f><gen><pl>$
in the bidix?
- Solutions
- Have two paradigms for each adjective, one with tags, one without. (bad)
- This would leave us with: ^zračna luka<n><f><gen><pl>$ (basically an orthographic paradigm).
- Have more than one entry per multi-word — this is done in
apertium-es-ca
, see "dirección general", "direcciones generales". (bad) - Have a parameterised paradigm, that when called one way outputs a paradigm with symbols, and another way outputs a paradigm without symbols.
- This would only be one way, the problem would come when we try and generate. How do we get the adjective to agree with the noun?
The hack
This is how it is taken care of in the current apertium-es-ca
pair, which is tenable just about for Spanish, but for Slavic languages no chance.
<e lm="dirección general"> <p> <l>dirección<b/>general</l> <r>dirección<b/>general<s n="n"/><s n="f"/><s n="sg"/></r> </p> </e> <e lm="dirección general"> <p> <l>direcciones<b/>generales</l> <r>dirección<b/>general<s n="n"/><s n="f"/><s n="pl"/></r> </p> </e>