Difference between revisions of "Apertium-pretransfer"
Jump to navigation
Jump to search
Line 10: | Line 10: | ||
^arbeidsmiljø<n><nt><sg><ind><ep-Ø>$ ^lov<n><m><sg><def>$ |
^arbeidsmiljø<n><nt><sg><ind><ep-Ø>$ ^lov<n><m><sg><def>$ |
||
</pre> |
</pre> |
||
: ''Note: There have been discussions to change the ''' |
: ''Note: There have been discussions to change the '''+''' symbol for compounds into '''~''', since we typically do not want a space there.'' |
||
Revision as of 08:27, 31 January 2011
apertium-pretransfer
(installed as part of the apertium
package) does certain operations to multiword units before bidix lookup. Input is expected to be disambiguated, and have no surface forms (just analyses).
Compound multiwords (eg. a contraction in Romance languages, with <j/> in the monodix, or compound nominal in North Germanic languages) are split into two at the + sign:
$ echo '^de<pr>+el<det><def><m><sg>$' | apertium-pretransfer ^de<pr>$ ^el<det><def><m><sg>$ $ echo '^arbeidsmiljø<n><nt><sg><ind><ep-Ø>+lov<n><m><sg><def>$' | apertium-pretransfer ^arbeidsmiljø<n><nt><sg><ind><ep-Ø>$ ^lov<n><m><sg><def>$
- Note: There have been discussions to change the + symbol for compounds into ~, since we typically do not want a space there.
Multiwords with inner inflection (using the <g/> in monodix) get the uninflected part moved (from behind the tags) onto the lemma:
$ echo '^poner<vblex><inf># a prueba$' | apertium-pretransfer ^poner# a prueba<vblex><inf>$