Difference between revisions of "Apertium-pretransfer"

Latest revision as of 14:37, 7 October 2014

apertium-pretransfer (installed as part of the apertium package) does certain operations to multiword units before bidix lookup. Input is expected to be disambiguated, and have no surface forms (just analyses).

Compound multiwords (eg. a contraction in Romance languages, with <j/> in the monodix, or compound nominal in North Germanic languages) are split into two at the + sign:

$ echo '^de<pr>+el<det><def><m><sg>$' | apertium-pretransfer 
^de<pr>$ ^el<det><def><m><sg>$
$ echo '^arbeidsmiljø<n><nt><sg><ind><ep-Ø>+lov<n><m><sg><def>$' | apertium-pretransfer 
^arbeidsmiljø<n><nt><sg><ind><ep-Ø>$ ^lov<n><m><sg><def>$

Note: There have been discussions to change the + symbol for compounds into ~, since we typically do not want a space there.

Multiwords with inner inflection (using the <g/> in monodix) get the uninflected part, the stuff after the # sign, moved (from behind the tags) onto the lemma:

$ echo '^poner<vblex><inf># a prueba$' | apertium-pretransfer 
^poner# a prueba<vblex><inf>$

Note: The moving of the multiword queue (the part after #) is also done by cg-proc

@@ Line 1: / Line 1: @@
+[[Apertium-pretransfer (français)|En français]]
-<code>apertium-pretransfer</code> does certain operations to [[multiwords|multiword]] units before [[bidix]] lookup.
+<code>apertium-pretransfer</code> (installed as part of the <code>apertium</code> package) does certain operations to [[multiwords|multiword]] units before [[bidix]] lookup. Input is expected to be disambiguated, and have no surface forms (just analyses).
-Compound multiwords (eg. a contraction in [[Romance languages]], with &ltj/&gt; in the monodix, or compound nominal in [[North Germanic languages]]) are split into two at the + sign:
+Compound multiwords (eg. a contraction in [[Romance languages]], with '''&lt;j/&gt;''' in the monodix, or compound nominal in [[North Germanic languages]]) are split into two at the '''+''' sign:
 <pre>
@@ Line 9: / Line 12: @@
 ^arbeidsmiljø<n><nt><sg><ind><ep-Ø>$ ^lov<n><m><sg><def>$
 </pre>
+: ''Note: There have been discussions to change the '''+''' symbol for compounds into '''~''', since we typically do not want a space there.''
-Multiwords with inner inflection (using the &lt;g/&gt; in monodix) get the uninflected part moved (from behind the tags) onto the lemma:
+Multiwords with inner inflection (using the '''&lt;g/&gt;''' in monodix) get the uninflected part, the stuff after the '''#''' sign, moved (from behind the tags) onto the lemma:
 <pre>
@@ Line 16: / Line 20: @@
 ^poner# a prueba<vblex><inf>$
 </pre>
+: ''Note: The moving of the multiword queue (the part after '''#''') is also done by <code>cg-proc</code>''
 [[Category:Documentation]]
+[[Category:Documentation in English]]

Difference between revisions of "Apertium-pretransfer"

Latest revision as of 14:37, 7 October 2014

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools